Reducing MongoDB database file size

Question

I ve got a MongoDB database that was once large   3GB   Since then  documents have been deleted and I was expecting the size of the database files to decrease accordingly   But since MongoDB keeps allocated space  the files are still large   I read here and there that the admin command mongod --repair is used to free the unused space  but I don t have enough space on the disk to run this command   Do you know a way I can freed up unused space

User · Answer

When i had the same problem  i stoped my mongo server and started it again with command  mongod --repair   Before running repair operation you should check do you have enough free space on your HDD  min - is the size of your database

User · Answer

Database files cannot be reduced in size  While  repairing  database  it is only possible for mongo server to delete some of its files  If large amount of data has been deleted  mongo server will  release   delete   during repair  some of its existing files

User · Answer

Mongodb 3 0 and higher has a new storage engine - WiredTiger   In my case switching engine reduced disk usage from 100 Gb to 25Gb

User · Answer

It looks like Mongo v1 9  has support for the compact in place    gt  db runCommand    compact    mycollectionname        See the docs here  http   docs mongodb org manual reference command compact    Unlike repairDatabase  the compact command does not require double disk space to do its work  It does require a small amount of additional space while working  Additionally  compact is faster

User · Answer

Starting with 2 8 version of Mongo  you can use compression  You will have 3 levels of compression with WiredTiger engine  mmap  which is default in 2 6 does not provide compression     None snappy  by default  zlib   Here is an example of how much space will you be able to save for 16 GB of data     data is taken from this article

User · Answer

mongoDB -repair is not recommended in case of sharded cluster    If using replica set sharded cluster  use compact command  it will rewrites and defragments all data and index files of all collections   syntax   db runCommand    compact    collection name        when used with force true  compact runs on primary of replica set   e g  db runCommand     command    collection name   force   true      Other points to consider  -It blocks the operations  so recommended to execute in maintenance window  -If replica sets running on different servers  needs to be execute on each member separately - In case of sharded cluster  compact needs to execute on each shard member separately  Cannot execute against mongos instance

User · Answer

If you need to run a full repair  use the repairpath option  Point it to a disk with more available space   For example  on my Mac I ve used   mongod --config  usr local etc mongod conf --repair --repairpath  Volumes X mongo repair   Update  Per MongoDB Core Server Ticket 4266  you may need to add --nojournal to avoid an error   mongod --config  usr local etc mongod conf --repair --repairpath  Volumes X mongo repair --nojournal

User · Answer

In general compact is preferable to repairDatabase  But one advantage of repair over compact is you can issue repair to the whole cluster  compact you have to log into each shard  which is kind of annoying

User · Answer

Compact all collections in current database  db getCollectionNames   forEach function  collectionName        print  Compacting      collectionName       db runCommand   compact  collectionName

User · Answer

UPDATE  with the compact command and WiredTiger it looks like the extra disk space will actually be released to the OS     UPDATE  as of v1 9  there is a compact command   This command will perform a compaction  in-line   It will still need some extra space  but not as much     MongoDB compresses the files by    copying the files to a new location looping through the documents and  re-ordering   re-solving them replacing the original files with the new files   You can do this  compression  by running mongod --repair or by connecting directly and running db repairDatabase     In either case you need the space somewhere to copy the files  Now I don t know why you don t have enough space to perform a compress  however  you do have some options if you have another computer with more space    Export the database to another computer with Mongo installed  using mongoexport  and then you can Import that same database  using mongoimport   This will result in a new database that is more compressed  Now you can stop the original mongod replace with the new database files and you re good to go  Stop the current mongod and copy the database files to a bigger computer and run the repair on that computer  You can then move the new database files back to the original computer    There is not currently a good way to  compact in place  using Mongo  And Mongo can definitely suck up a lot of space    The best strategy right now for compaction is to run a Master-Slave setup  You can then compact the Slave  let it catch up and switch them over  I know still a little hairy  Maybe the Mongo team will come up with better in place compaction  but I don t think it s  high on their list  Drive space is currently assumed to be cheap  and it usually is

User · Answer

For standalone mode you could use compact or repair   For sharded cluster or replica set  in my experience  after you running compact on the primary  followed by compact the secondary  the size of primary database reduced  but not the secondary  You might want to do resync member to reduce the size of secondary database  and by doing this you might find that the size of secondary database is even more reduced than the primary  i guess the compact command not really compacting the collection  So  i ended up switching the primary and secondary of the replica set and doing resync member again   my conclusion is  the best way to reduce the size of sharded replica set is by doing resync member  switch primary secondary  and resync again

User · Answer

In case a large chunk of data is deleted from a collection and the collection never uses the deleted space for new documents  this space needs to be returned to the operating system so that it can be used by other databases or collections  You will need to run a compact or repair operation in order to defragment the disk space and regain the usable free space   Behavior of compaction process is dependent on MongoDB engine as follows  db runCommand  compact  collection-name      MMAPv1  Compaction operation defragments data files  amp  indexes  However  it does not release space to the operating system  The operation is still useful to defragment and create more contiguous space for reuse by MongoDB  However  it is of no use though when the free disk space is very low   An additional disk space up to 2GB  is required during the compaction operation   A database level lock is held during the compaction operation   WiredTiger  The WiredTiger engine provides compression by default which consumes less disk space than MMAPv1   The compact process releases the free space to the operating system  Minimal disk space is required to run the compact operation  WiredTiger also blocks all operations on the database as it needs database level lock   For MMAPv1 engine  compact doest not return the space to operating system  You require to run repair operation to release the unused space   db runCommand  repairDatabase  1

User · Answer

Just one way that I was able to do it   No guarantee on the safety of your existing data   Try with your own risk   Delete the data files directly and restart mongod   For example  with ubuntu  default path to data   var lib mongodb   I had couple files with name like  collection     I keep the collection 0 and deleted all others     Seems an easier way if you don t have serious data in database

User · Answer

We need solve 2 ways  based on StorageEngine   1  MMAP   engine   command  db repairDatabase     NOTE  repairDatabase requires free disk space equal to the size of your current data set plus 2 gigabytes  If the volume that holds dbpath lacks sufficient space  you can mount a separate volume and use that for the repair  When mounting a separate volume for repairDatabase you must run repairDatabase from the command line and use the --repairpath switch to specify the folder in which to store temporary repair files   eg  Imagine DB size is 120 GB means   120 2  2   242 GB Hard Disk space required   another way you do collection wise  command   db runCommand  compact   collectionName     2  WiredTiger           Its automatically resolved it-self

User · Answer

There has been some considerable confusion over space reclamation in MongoDB  and some recommended practice are downright dangerous to do in certain deployment types  More details below  TL DR repairDatabase attempts to salvage data from a standalone MongoDB deployments that is trying to recover from a disk corruption  If it recovers space  it is purely a side effect  Recovering space should never be the primary consideration of running repairDatabase  Recover space in a standalone node WiredTiger  For a standalone node with WiredTiger  running compact will release space to the OS  with one caveat  The compact command on WiredTiger on MongoDB 3 0 x was affected by this bug  SERVER-21833 which was fixed in MongoDB 3 2 3  Prior to this version  compact on WiredTiger could silently fail  MMAPv1  Due to the way MMAPv1 works  there is no safe and supported method to recover space using the MMAPv1 storage engine  compact in MMAPv1 will defragment the data files  potentially making more space available for new documents  but it will not release space back to the OS  You may be able to run repairDatabase if you fully understand the consequences of this potentially dangerous command  see below   since repairDatabase essentially rewrites the whole database by discarding corrupt documents  As a side effect  this will create new MMAPv1 data files without any fragmentation on it and release space back to the OS  For a less adventurous method  running mongodump and mongorestore may be possible as well in an MMAPv1 deployment  subject to the size of your deployment  Recover space in a replica set For replica set configurations  the best and the safest method to recover space is to perform an initial sync  for both WiredTiger and MMAPv1  If you need to recover space from all nodes in the set  you can perform a rolling initial sync  That is  perform initial sync on each of the secondaries  before finally stepping down the primary and perform initial sync on it  Rolling initial sync method is the safest method to perform replica set maintenance  and it also involves no downtime as a bonus  Please note that the feasibility of doing a rolling initial sync also depends on the size of your deployment  For extremely large deployments  it may not be feasible to do an initial sync  and thus your options are somewhat more limited  If WiredTiger is used  you may be able to take one secondary out of the set  start it as a standalone  run compact on it  and rejoin it to the set  Regarding repairDatabase Please don t run repairDatabase on replica set nodes  This is very dangerous  as mentioned in the repairDatabase page and described in more details below  The name repairDatabase is a bit misleading  since the command doesn t attempt to repair anything  The command was intended to be used when there s disk corruption on a standalone node  which could lead to corrupt documents  The repairDatabase command could be more accurately described as  quot salvage database quot   That is  it recreates the databases by discarding corrupt documents in an attempt to get the database into a state where you can start it and salvage intact document from it  In MMAPv1 deployments  this rebuilding of the database files releases space to the OS as a side effect  Releasing space to the OS was never the purpose  Consequences of repairDatabase on a replica set In a replica set  MongoDB expects all nodes in the set to contain identical data  If you run repairDatabase on a replica set node  there is a chance that the node contains undetected corruption  and repairDatabase will dutifully remove the corrupt documents for you  Predictably  this makes that node contains a different dataset from the rest of the set  If an update happens to hit that single document  the whole set could crash  To make matters worse  it is entirely possible that this situation could stay dormant for a long time  only to strike suddenly with no apparent reason

User · Answer

I had the same problem  and solved by simply doing this at the command line   mongodump -d databasename echo  db dropDatabase        mongo databasename mongorestore dump databasename

[mongodb] Reducing MongoDB database file size

Examples related to mongodb