{"componentChunkName":"component---src-pages-author-author-yaml-id-js","path":"/author/chinmaya-pati/","result":{"data":{"allMarkdownRemark":{"edges":[{"node":{"id":"8d1c6c7b-3313-5137-89fa-d096a463287e","html":"<p>The goal of this post is to learn about the various ways of data migration in MongoDB that can help us to write scripts that change your database by adding new documents, modifying existing ones.</p>\n<p>If you're coming here for the first time, please take a look at the prequel <a href=\"https://www.loginradius.com/blog/engineering/self-hosted-mongo/\">Self-Hosted MongoDB</a>.</p>\n<p>Alright then, picking from where we left off, let's get started with the data migration in MongoDB.</p>\n<p>Now, the basic steps to migrate data from one MongoDB to another would be:</p>\n<ol>\n<li>Create a zipped backup of the existing data</li>\n<li>Dump the data in a new DB</li>\n</ol>\n<p>This is very straight forward when the source database is not online because we know that there won't be any new documents created/updated during the migration process.\nLet's look at simple migration first before diving into the live scenario.</p>\n<hr />\n<h1 id=\"migrating-from-an-offline-database-in-mongodb\" style=\"position:relative;\"><a href=\"#migrating-from-an-offline-database-in-mongodb\" aria-label=\"migrating from an offline database in mongodb permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Migrating from an offline database in MongoDB</h1>\n<h2 id=\"creating-a-backup\" style=\"position:relative;\"><a href=\"#creating-a-backup\" aria-label=\"creating a backup permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Creating a backup</h2>\n<p>We're going to use an existing utility program <a href=\"https://docs.mongodb.com/database-tools/mongodump/\">mongodump</a> for creating the database backup.</p>\n<p>Run this command in the source database server</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"0\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">mongodump --host=&quot;hostname:port&quot; \\</span>\n<span class=\"grvsc-line\">  --username=&quot;username&quot; --password=&quot;password&quot; \\</span>\n<span class=\"grvsc-line\">  --authenticationDatabase &quot;admin&quot; \\</span>\n<span class=\"grvsc-line\">  --db=&quot;db name&quot; --collection=&quot;collection name&quot; --query=&#39;json&#39; \\</span>\n<span class=\"grvsc-line\">  --forceTableScan -v --gzip --out ./dump</span></code></pre>\n<p><strong><code>--host</code></strong>: The source MongoDB hostname along with the port. It defaults to <code>localhost:27017</code>. If it is a connection string you can use this option <code>—-uri=\"mongodb://username:password@host1[:port1]...\"</code></p>\n<p><strong><code>--username</code></strong>: Specifies a username to authenticate to a MongoDB database that uses authentication.</p>\n<p><strong><code>--password</code></strong>: Specifies a password to authenticate to a MongoDB database that uses authentication.</p>\n<p><strong><code>--authenticationDatabase</code></strong>: Specifies the authentication database where the specified <code>--username</code> has been created.</p>\n<blockquote>\n<p>If you do not specify an authentication database or a database to export, mongodump assumes the admin database holds the user's credentials.</p>\n</blockquote>\n<p><strong><code>--db</code></strong>: Specifies the database to take a backup from. If you do not specify a database, mongodump collects from all databases in this instance.</p>\n<blockquote>\n<p>Alternatively, you can also specify the database directly in the <a href=\"https://docs.mongodb.com/database-tools/mongodump/#cmdoption-mongodump-uri\">URI connection string</a> i.e. <code>mongodb://username:password@uri/dbname</code>. <br /> Providing a connection string while also using <code>--db</code> and specifying conflicting information <strong>will result in an error</strong>.</p>\n</blockquote>\n<p><strong><code>--collection</code></strong>: Specifies a collection to backup. If you do not specify a collection, this option copies all collections in the specified database or instance to the dump files.</p>\n<p><strong><code>--query</code></strong> : Provides a <a href=\"https://docs.mongodb.com/manual/reference/glossary/#term-json-document\">JSON document</a> as a query that optionally limits the documents included in the output of mongodump. <br />\nYou must enclose the query document in single quotes <code>('{ ... }')</code> to ensure that it does not interact with your  environment.<br />\nThe query must be in <a href=\"https://docs.mongodb.com/manual/reference/mongodb-extended-json\">Extended JSON v2 format (either relaxed or canonical/strict mode)</a>, including enclosing the field names and operators in quotes e.g. <code>'{ \"created_at\": { \"\\$gte\": ISODate(...) } }'</code>.</p>\n<blockquote>\n<p>To use the <code>--query</code> option, you must also specify the <a href=\"https://docs.mongodb.com/database-tools/mongodump/#cmdoption-mongodump-collection\"><code>--collection</code></a> option.</p>\n</blockquote>\n<p><strong><code>--forceTableScan</code></strong>: Forces mongodump to scan the data store directly. Typically, mongodump saves entries as they appear in the index of the <code>_id</code> field. <br /></p>\n<blockquote>\n<p>If you specify a query <code>--query</code>, mongodump will use the most appropriate index to support that query. <br /><strong>Hence , you cannot use <a href=\"https://docs.mongodb.com/database-tools/mongodump/#cmdoption-mongodump-forcetablescan\"><code>--forceTableScan</code></a> with the <a href=\"https://docs.mongodb.com/database-tools/mongodump/#cmdoption-mongodump-query\"><code>--query</code></a> option</strong>.</p>\n</blockquote>\n<p><strong><code>--gzip</code></strong>: Compresses the output. If mongodump outputs to the dump directory, the new feature compresses the individual files. The files have the suffix <code>.gz</code>.</p>\n<p><strong><code>--out</code></strong>: Specifies the directory where mongodump will write <a href=\"https://docs.mongodb.com/manual/reference/glossary/#term-bson\"><code>BSON</code></a> files for the dumped databases. By default, mongodump saves output files in a directory named dump in the current working directory.</p>\n<h2 id=\"restoring-the-backup\" style=\"position:relative;\"><a href=\"#restoring-the-backup\" aria-label=\"restoring the backup permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Restoring the backup</h2>\n<p>We will use a utility program called <a href=\"https://docs.mongodb.com/database-tools/mongorestore/\"><code>mongorestore</code></a> for restoring the database backup.</p>\n<p>Copy the backup directory dump to the new Database instance and run the following command:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"1\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">mongorestore --uri=&quot;mongodb://user:password@host:port/?authSource=admin&quot; \\</span>\n<span class=\"grvsc-line\">  --drop --noIndexRestore --gzip -v ./dump</span></code></pre>\n<p>Replace the credentials with the new database credentials. Unline in the previous step, the <code>--authenticationDatabase</code> option is specified in the URI string.</p>\n<p>Also, use <code>--gzip</code> if used while creating the backup.</p>\n<p><strong><code>--drop</code></strong>: Before restoring the collections from the dumped backup, drops the collections from the target database. It does not drop collections that are not in the backup.\n<strong><code>--noIndexRestore</code></strong>: Prevents mongorestore from restoring and building indexes as specified in the corresponding mongodump output.</p>\n<blockquote>\n<p>If you want to change name of the database while restoring, you can do so using <br /><code>--nsFrom=\"old_name.*\" --nsTo=\"new_name.*\"</code> options.<br /><br />However, it won’t work if you were to migrate with <code>oplogs</code> which is a requirement in migration from an online instance.</p>\n</blockquote>\n<hr />\n<h1 id=\"migrating-from-an-online-database-in-mongodb\" style=\"position:relative;\"><a href=\"#migrating-from-an-online-database-in-mongodb\" aria-label=\"migrating from an online database in mongodb permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Migrating from an online database in MongoDB</h1>\n<p>The only challenge with migrating from an online database is not able to pause the updates during migration. So here is the overview of the steps,</p>\n<ol>\n<li>Run an initial bulk migration with <code>oplogs</code> capture</li>\n<li>Run a sync job to mitigate the database connection switch latency</li>\n</ol>\n<blockquote>\n<p>Now, to capture <code>oplogs</code>, a replica set must be initialized in the source and destination databases. This is because the <code>oplogs</code> are captured from <strong><code>local.oplog.rs</code></strong> namespace, which is created after initializing a replica set. <br /><br />You can follow <a href=\"https://medium.com/swlh/self-hosted-mongodb-deployment-7f1b6fb4973f#1cdf\">this guide</a> to configure a replica set.</p>\n</blockquote>\n<h2 id=\"initial-migration-with-oplog-capture\" style=\"position:relative;\"><a href=\"#initial-migration-with-oplog-capture\" aria-label=\"initial migration with oplog capture permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Initial Migration with Oplog Capture</h2>\n<p>Oplogs, in simple words, are the operation logs created per operation in the database. They represent a partial document state or, in other words, the database state. So we are going to capture any updates in our old database during the migration process using these <code>oplogs</code>.</p>\n<p>Run the mongodump program with the following options,</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"2\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">mongodump --uri=&quot;.../?authSource=admin&quot; \\</span>\n<span class=\"grvsc-line\">  --forceTableScan --oplog \\</span>\n<span class=\"grvsc-line\">  --gzip -v --out ./dump</span></code></pre>\n<p><strong><code>--oplog</code></strong>: Creates a file named <code>oplog.bson</code> as part of the <code>mongodump</code> output. The <code>oplog.bson</code> file, located in the top level of the output directory, contains <code>oplog</code> entries that occur during the mongodump operation. This file provides an effective point-in-time snapshot of the state of our database instance.</p>\n<h2 id=\"restore-the-data-with-oplog-replay\" style=\"position:relative;\"><a href=\"#restore-the-data-with-oplog-replay\" aria-label=\"restore the data with oplog replay permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Restore the data with oplog replay</h2>\n<p>In order to replay the oplogs, a special role is required. Let's create and assign the role to the database user being used for migration.</p>\n<h3 id=\"create-the-role\" style=\"position:relative;\"><a href=\"#create-the-role\" aria-label=\"create the role permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Create the role</h3>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"3\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">db.createRole({</span>\n<span class=\"grvsc-line\">  role: &quot;interalUseOnlyOplogRestore&quot;,</span>\n<span class=\"grvsc-line\">  privileges: [</span>\n<span class=\"grvsc-line\">    {</span>\n<span class=\"grvsc-line\">      resource: { anyResource: true },</span>\n<span class=\"grvsc-line\">      actions: [ &quot;anyAction&quot; ] </span>\n<span class=\"grvsc-line\">    }</span>\n<span class=\"grvsc-line\">  ],</span>\n<span class=\"grvsc-line\">  roles: []</span>\n<span class=\"grvsc-line\">})</span></code></pre>\n<h3 id=\"assign-the-role\" style=\"position:relative;\"><a href=\"#assign-the-role\" aria-label=\"assign the role permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Assign the role</h3>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"4\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">db.grantRolesToUser(</span>\n<span class=\"grvsc-line\">  &quot;admin&quot;,</span>\n<span class=\"grvsc-line\">  [{ role:&quot;interalUseOnlyOplogRestore&quot;, db:&quot;admin&quot; }]</span>\n<span class=\"grvsc-line\">);</span></code></pre>\n<p>Now you can restore using the mongorestore program with the following options,</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"5\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">mongorestore --uri=&quot;mongodb://admin:.../?authSource=admin&quot; \\</span>\n<span class=\"grvsc-line\">  --oplogReplay </span>\n<span class=\"grvsc-line\">  --gzip -v ./dump</span></code></pre>\n<p>In the above command, using the same user <strong><code>admin</code></strong> with whom the role was associated.</p>\n<p><strong><code>--oplogReplay</code></strong>: After restoring the database dump, replays the oplog entries from a bson file and restores the database to the point-in-time backup captured with the mongodump <code>--oplog</code> command.</p>\n<h2 id=\"mitigating-database-connection-switch-latency\" style=\"position:relative;\"><a href=\"#mitigating-database-connection-switch-latency\" aria-label=\"mitigating database connection switch latency permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Mitigating database connection switch latency</h2>\n<p>Alright, so far we are done with most of the heavy lifting. The only thing that remains is maintaining consistency between the databases during the connection switch in our application servers.</p>\n<blockquote>\n<p>If you're running MongoDB version 3.6+, it's better to go for the Change Stream approach, which is a event-based mechanism introduced to capture changes in your database in an optimized way. Here is an article that covers it : <a href=\"https://www.mongodb.com/blog/post/an-introduction-to-change-streams\">An Introduction to Change Streams</a></p>\n</blockquote>\n<p>Check out the <a href=\"https://gist.github.com/cnp96/7be1756f7eb76ea78c9b832966e84dbf#file-delta-sync-sh\">generic sync script</a>, which you can run as a CRON job every minute.</p>\n<p>Update the variables in this script and run as</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"6\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">$ ./delta-sync.sh from_epoch_in_milliseconds</span>\n<span class=\"grvsc-line\"></span>\n<span class=\"grvsc-line\"># from_epoch_in_milliseconds is automatically picked with every iteration if not supplied</span></code></pre>\n<p>Or you can set up a cron job to run this every minute.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"7\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">* * * * * ~/delta-sync.sh</span></code></pre>\n<p>The output can be monitored with the following command (I'm running RHEL 8, refer to your OS guide for cron output)</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"8\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">$ tail -f /var/log/cron | grep CRON</span></code></pre>\n<p>This is a sample sync log.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"9\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">CMD (~/cron/dsync.sh)</span>\n<span class=\"grvsc-line\">CMDOUT (INFO: Updated log registry to use new timestamp on next run.)</span>\n<span class=\"grvsc-line\">CMDOUT (INFO: Created sync directory: /home/ec2-user/cron/dump/2020-11-03T19:01:01Z)</span>\n<span class=\"grvsc-line\">CMDOUT (Fetching oplog in range [2020-11-03T19:00:01Z - 2020-11-03T19:01:01Z])</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:02.319+0000#011dumping up to 1 collections in parallel)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:02.334+0000#011writing local.oplog.rs to /home/ec2-user/cron/dump/2020-11-03T19:01:01Z/local/oplog.rs.bson.gz)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:04.943+0000#011local.oplog.rs  0)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:04.964+0000#011local.oplog.rs  0)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:04.964+0000#011done dumping local.oplog.rs (0 documents))</span>\n<span class=\"grvsc-line\">CMDOUT (INFO: Dump success!)</span>\n<span class=\"grvsc-line\">CMDOUT (INFO: Replaying oplogs...)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:05.030+0000#011using write concern: &{majority false 0})</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:05.054+0000#011will listen for SIGTERM, SIGINT, and SIGKILL)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:05.055+0000#011connected to node type: standalone)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:05.055+0000#011mongorestore target is a directory, not a file)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:05.055+0000#011preparing collections to restore from)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:05.055+0000#011found collection local.oplog.rs bson to restore to local.oplog.rs)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:05.055+0000#011found collection metadata from local.oplog.rs to restore to local.oplog.rs)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:05.055+0000#011restoring up to 4 collections in parallel)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:05.055+0000#011replaying oplog)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:05.055+0000#011applied 0 oplog entries)</span>\n<span class=\"grvsc-line\">CMDOUT (2020-11-03T19:01:05.055+0000#0110 document(s) restored successfully. 0 document(s) failed to restore.)</span>\n<span class=\"grvsc-line\">CMDOUT (INFO: Restore success!)</span></code></pre>\n<p>You can stop this script after verifying that no more <code>oplogs</code> are being created, i.e., when source DB went offline.</p>\n<p>This concludes the complete self-hosted MongoDB data migration guide. If you want to learn more about MongoDB here is a useful resource on <a href=\"https://www.loginradius.com/blog/engineering/mongodb-as-datasource-in-golang/\">how to use MongoDB as datasource in goLang</a>.</p>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n  .dark-default-dark {\n    background-color: #1E1E1E;\n    color: #D4D4D4;\n  }\n</style>","frontmatter":{"title":"How to Migrate Data In MongoDB","author":{"id":"Chinmaya Pati","github":"cnp96","avatar":null},"date":"December 14, 2020","updated_date":null,"tags":["MongoDB"],"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":1.4184397163120568,"src":"/static/90f92104129cbc152fa1d9a64c58d7be/ee604/index.png","srcSet":"/static/90f92104129cbc152fa1d9a64c58d7be/69585/index.png 200w,\n/static/90f92104129cbc152fa1d9a64c58d7be/497c6/index.png 400w,\n/static/90f92104129cbc152fa1d9a64c58d7be/ee604/index.png 800w,\n/static/90f92104129cbc152fa1d9a64c58d7be/31987/index.png 1000w","sizes":"(max-width: 800px) 100vw, 800px"}}}},"fields":{"authorId":"Chinmaya Pati","slug":"/engineering/live-data-migration-mongodb/"}}},{"node":{"id":"abd9eda6-e048-505e-acf5-f578d763cf94","html":"<p>You’re probably hosting your MongoDB on a reliable cloud service provider say <a href=\"https://cloud.mongodb.com\">Atlas</a> for instance because you really want to focus on your idea and delegate all the subtle key management areas such as networking, storage, access, etc.</p>\n<p>It all looks good initially until your small idea starts turning into a business and the cost starts skyrocketing. Even if that is not the case, this post will still give you a general overview of the technical complexities involved (and bucks saved!) if you were to migrate to a self-hosted solution.</p>\n<p>BTW, how much savings are we talking about? Let’s do a quick comparison between an <strong>Atlas</strong> instance and a self-hosted MongoDB on <strong>AWS</strong>.</p>\n<p><strong>Atlas (~$166/month)</strong></p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 768px; \"\n    >\n      <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 42%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAICAIAAAB2/0i6AAAACXBIWXMAAAsTAAALEwEAmpwYAAABDUlEQVQY031Qy24DIQzM//9YL1VvVXto02y0kdhHAAMGzLPejdJDGxWNMBiPPcPhWR9f3TjQ9cWcLqQwExAuSlJJvtIMUqjVF4LoxkVINAKu78ORcVbzQRjpQlDGfAwnqSGnjNZNej0r4RxSpFrqjpJT6r23WjkZfdDeHiYmozeIYK3znp9jjF9GPIm3EEMkChQ3xICIrfdaKxcYDatR22QMQSzLMI7aGIeotebaXDpSibn2P6vtu01+l+09y5YA3AUcTtOMPqTaKNdc2n/km2zLfAAWz95SSq094DyebNmzx4sQq1IsGwDanc3hp1Hr7Td5tkpq/TkMN892W44D7P75opTm8w2c5w/rd/I3OuXMieG2KrwAAAAASUVORK5CYII='); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"&quot;Atlas Pricing&quot;\"\n        title=\"Atlas Pricing\"\n        src=\"/static/37ae695d63da0d8e11b315ac53da7162/e5715/atlas.png\"\n        srcset=\"/static/37ae695d63da0d8e11b315ac53da7162/a6d36/atlas.png 650w,\n/static/37ae695d63da0d8e11b315ac53da7162/e5715/atlas.png 768w,\n/static/37ae695d63da0d8e11b315ac53da7162/cc8d6/atlas.png 791w\"\n        sizes=\"(max-width: 768px) 100vw, 768px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n      />\n    </span>\n$0.23/hour based on the above-selected requirements (~ <a href=\"https://cloud.mongodb.com\">cloud.mongodb.com</a>)</p>\n<p><br><br><strong>AWS (~$36/month)</strong>\n<span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 768px; \"\n    >\n      <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 27.846153846153847%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAGCAIAAABM9SnKAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAu0lEQVQY021QywqDMBDM//9Sv6HnXgo9VMGoZPMg72g6RhRpO2yGIQmzs8sez1c/cEGktNbGgKVSX4XLcZqPmmZBYuTqfmMpxlxKrdU51/U9SQldSlmW5WQhKISQck4p4bMgqZWqemZo50Pw3ltriQjt/QHYbew9HHcNwAVRlDE2RPbuOmtdaIgxhgtOl6v27Qkt4cgkyYIo/4DM9QdIvq4rBF7ZwDn2sW2rsWwLw4pwMEhuc16B9LtA/w8AoVktzOpq7wAAAABJRU5ErkJggg=='); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"&quot;AWS Pricing&quot;\"\n        title=\"AWS Pricing\"\n        src=\"/static/919463e3199f2098ff4d7388991b988f/e5715/aws.png\"\n        srcset=\"/static/919463e3199f2098ff4d7388991b988f/a6d36/aws.png 650w,\n/static/919463e3199f2098ff4d7388991b988f/e5715/aws.png 768w,\n/static/919463e3199f2098ff4d7388991b988f/2cefc/aws.png 1400w\"\n        sizes=\"(max-width: 768px) 100vw, 768px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n      />\n    </span></p>\n<p>$0.0416/hour for the instance and additional pricing based on the EBS type and storage (~ <a href=\"https://calculator.aws/\">calculator.aws</a>)</p>\n<blockquote>\n<p>It is almost 4.5x savings just in terms of the infrastructure!</p>\n</blockquote>\n<p>Now that you know the major why(s) and are still reading this post, without further ado, let’s dive into the tech.</p>\n<hr />\n<h1 id=\"outline\" style=\"position:relative;\"><a href=\"#outline\" aria-label=\"outline permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Outline</h1>\n<ol>\n<li>Setting up the infrastructure</li>\n<li>Setting up MongoDB</li>\n<li>Bulk migration</li>\n<li>Delta-sync to bridge the connection switch latency (not applicable to stale clusters)</li>\n</ol>\n<p>Since the entire content can be a bit exhausting in one place, I’m going to divide this into 2 related posts.</p>\n<h1 id=\"1-setting-up-the-infrastructure\" style=\"position:relative;\"><a href=\"#1-setting-up-the-infrastructure\" aria-label=\"1 setting up the infrastructure permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>1. Setting up the Infrastructure</h1>\n<p>I’m going to mention below the guide for setting up an instance running <strong>RedHat Enterprise Linux 8</strong> on AWS. This is because MongoDB generally performs better with the xfs file-system.</p>\n<h2 id=\"spin-up-an-ec2-instance\" style=\"position:relative;\"><a href=\"#spin-up-an-ec2-instance\" aria-label=\"spin up an ec2 instance permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Spin up an EC2 Instance</h2>\n<p>I’ve used a <code>t3.small</code> instance that comes with <strong>2 vCPUs</strong> and <strong>2Gb of RAM</strong> albeit you can select any instance of your choice.</p>\n<blockquote>\n<p>It is recommended that your DB should have access to at least <strong>1GB of RAM</strong> and <strong>2 real cores</strong> as that directly affects the performance during caching &#x26; concurrency mechanisms as handled by the default engine <a href=\"https://docs.mongodb.com/manual/core/wiredtiger/\"><strong>WiredTiger</strong></a>. You can read more about the <a href=\"https://docs.mongodb.com/manual/administration/production-notes/#allocate-sufficient-ram-and-cpu\"><strong>production notes related to the RAM and CPU requirements here</strong></a>.</p>\n</blockquote>\n<p><em>Configuration Overview:</em></p>\n<ul>\n<li>OS: <strong>Redhat Enterprise Linux 8 (x64 intel-based)</strong></li>\n<li>Instance Type: <strong>t3.small</strong></li>\n<li>Storage: <strong>10GB</strong>(os) + <strong>30GB</strong>(data) + <strong>3GB</strong>(logs) of <strong>EBS</strong> i.e. 3 separate volumes</li>\n</ul>\n<p>I’m assuming that you’re familiar with creating a VM on AWS.</p>\n<p>Now, once the instance is in running state, assign an <strong>Elastic IP</strong> to it and then simply do a remote login into the machine.</p>\n<blockquote>\n<p>We'll need the <strong>Elastic IP</strong> to setup public hostname for the instance</p>\n</blockquote>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"0\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">$ ssh -i &lt;PEM_FILE&gt; ec2-user@&lt;ELASTIC_IP&gt;</span></span></code></pre>\n<h2 id=\"mount-additional-volumes\" style=\"position:relative;\"><a href=\"#mount-additional-volumes\" aria-label=\"mount additional volumes permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Mount Additional Volumes</h2>\n<p>We have added 2 additional EBS volumes other than the Root FS for Data and Logs which are yet to be mounted (<em>Remember the 30Gb and 3Gb?</em>). You can list the volume blocks using,</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"1\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">$ sudo lsblk</span></span></code></pre>\n<p>The additional volumes will be listed right after the root block (refer to the arrows)</p>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 353px; \"\n    >\n      <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 25.21246458923513%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAFCAIAAADKYVtkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAA8klEQVQY0x2PyW6DMBCGeYKeUEmhbFJDCeBQFKAsYTFmSykW4lBVqtRz3/8J+sffYTRj+5vxSNM0DcMgy3Jd133fM8auAuQ4ybIMsaoqPMPVOI4oNU17FkhN00DWDb2l7crX4Exc1w3DsBEkSYIuaZoahmHbtmmalmXpgrvs+z6ldP34vA0zX/jri+McHUJIWZYYAnmeZ8iY9iQ4HA6Koqiqepcv8WXshmPlVT9s/LqdaeQGLgkI5xwyNHw4z/M4jqMoeheg9DwPjaSOdexKH7JH8pflv/TtO3Ojk3/y8LooCkTsiaQVLMuCptu27fuO7f4B7lI+vFGhNDMAAAAASUVORK5CYII='); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"&quot;sudo lsblk&quot;\"\n        title=\"List volume blocks\"\n        src=\"/static/e89cd4d9897ca07cc7ffb0edcda7119e/6c115/lsblk.png\"\n        srcset=\"/static/e89cd4d9897ca07cc7ffb0edcda7119e/6c115/lsblk.png 353w\"\n        sizes=\"(max-width: 353px) 100vw, 353px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n      />\n    </span></p>\n<p>In the image above, you can see that the additional volumes are named</p>\n<ol>\n<li><strong>xvdb</strong> (30Gb space to store data)</li>\n<li><strong>xvdc</strong> (3Gb space to store logs)</li>\n</ol>\n<p>Now, let’s create the file-systems in those volumes.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"2\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">$ sudo mkfs.xfs -L mongodata /dev/xvdb</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">$ sudo mkfs.xfs -L mongologs /dev/xvdc</span></span></code></pre>\n<blockquote>\n<p><code>-L</code> is an alias option for setting the <strong>volume label</strong></p>\n</blockquote>\n<p>And then mount the volumes.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"3\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">$ sudo mount -t xfs /dev/xvdb /var/lib/mongo</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">$ sudo mount -t xfs /dev/xvdc /var/log/mongodb</span></span></code></pre>\n<p>In order for these changes to reflect, the system must be rebooted. Hence, now we also need the partition persistence so that in case of an unintentional reboot we don’t lose the Database storage.</p>\n<p>We can achieve this by specifying the mount rules in the fstab file. <a href=\"https://geek-university.com/linux/etc-fstab-file/\">You can read more about it here</a>.</p>\n<p>Before that let's copy the UUID of the above partitions(<em>because they are unique and won't change over a system restart</em>)</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"4\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">$ sudo blkid</span></span></code></pre>\n<p><span\n      class=\"gatsby-resp-image-wrapper\"\n      style=\"position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 768px; \"\n    >\n      <span\n    class=\"gatsby-resp-image-background-image\"\n    style=\"padding-bottom: 9.846153846153847%; position: relative; bottom: 0; left: 0; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAACCAIAAADXZGvcAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAaUlEQVQI1x2KvQrFIAyF+xiCk5ODg1i7FBJsoWkrGlO47/8uN/gNh/O3EFEp5ZkMHnAi/A7+uLUmIsysB/W991rrO1GTc3bOLbohoma66Rty8b0LrpuueZ2klFQBQJsQQozRe2+Msdb+AZEjHOgCyMRoAAAAAElFTkSuQmCC'); background-size: cover; display: block;\"\n  ></span>\n  <img\n        class=\"gatsby-resp-image-image\"\n        alt=\"&quot;sudo blkid&quot;\"\n        title=\"List attached block info\"\n        src=\"/static/a37188eaa209c947db818b61fd19dfbb/e5715/blkid.png\"\n        srcset=\"/static/a37188eaa209c947db818b61fd19dfbb/a6d36/blkid.png 650w,\n/static/a37188eaa209c947db818b61fd19dfbb/e5715/blkid.png 768w,\n/static/a37188eaa209c947db818b61fd19dfbb/20c85/blkid.png 999w\"\n        sizes=\"(max-width: 768px) 100vw, 768px\"\n        style=\"width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;\"\n        loading=\"lazy\"\n      />\n    </span></p>\n<p>Copy the UUIDs listed for <strong>/dev/xvdb</strong> and <strong>/dev/xvdc</strong>. Refer to the <strong>“LABEL”</strong> for block identification</p>\n<p>Now open the <code>/etc/fstab</code> file and paste the configuration in the following format.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"5\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">UUID=&lt;COPIED_UUID_FOR_DATA&gt; /var/lib/mongo xfs defaults,nofail 0 0</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">UUID=&lt;COPIED_UUID_FOR_LOGS&gt; /var/log/mongodb xfs defaults,nofail 0 0</span></span></code></pre>\n<h2 id=\"update-hostname\" style=\"position:relative;\"><a href=\"#update-hostname\" aria-label=\"update hostname permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Update Hostname</h2>\n<p>The hostname will be used to identify your database server on the network. You can either use the above assigned <strong>Elastic IP</strong> or Domain name (if available). Open the <code>/etc/hostname</code> file and append the entry. For e.g.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"6\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">ip-xx.us-east-2.compute.internal **&lt;ELASTIC_IP&gt; &lt;DOMAIN_1&gt; &lt;DOMAIN_2&gt;** ...</span></span></code></pre>\n<h2 id=\"update-the-process-limits-optional\" style=\"position:relative;\"><a href=\"#update-the-process-limits-optional\" aria-label=\"update the process limits optional permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Update the Process Limits (Optional)</h2>\n<p>This is optionally required in order to control the maximum number of acceptable connections while keeping the system stable.\nOpen the <code>/etc/security/limits.conf</code> file and add the following entries.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"7\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">* soft nofile 64000</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">* hard nofile 64000</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">* soft nproc 32000</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">* hard nproc 32000</span></span></code></pre>\n<p>Now that all of the infra related prerequisites are sorted, <strong>reboot</strong> the instance, and let’s proceed to MongoDB installation.</p>\n<hr />\n<h1 id=\"1-setting-up-mongodb\" style=\"position:relative;\"><a href=\"#1-setting-up-mongodb\" aria-label=\"1 setting up mongodb permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>1. Setting up MongoDB</h1>\n<h2 id=\"add-the-repo-source\" style=\"position:relative;\"><a href=\"#add-the-repo-source\" aria-label=\"add the repo source permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Add the Repo Source</h2>\n<p>Create a file <code>/etc/yum.repos.d/mongodb-org.4.2.repo</code> and add the following package repository details.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"8\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">[mongodb-org-4.2]</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">name=MongoDB Repository</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">baseurl=https://repo.mongodb.org/yum/redhat/</span><span class=\"mtk12\">$releasever</span><span class=\"mtk1\">/mongodb-org/4.2/x86_64/</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">gpgcheck=1</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">enabled=1</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">gpgkey=https://www.mongodb.org/static/pgp/server-4.2.asc</span></span></code></pre>\n<p>Now let’s install MongoDB.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"9\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">$ sudo yum -y install mongodb-org</span></span></code></pre>\n<h2 id=\"create-directories-and-setup-permissions\" style=\"position:relative;\"><a href=\"#create-directories-and-setup-permissions\" aria-label=\"create directories and setup permissions permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Create directories and setup permissions</h2>\n<p>MongoDB by default uses the following paths to store the data and the internal logs:</p>\n<blockquote>\n<p><strong>/var/lib/mongo</strong> → Data<br><strong>/var/log/mongodb</strong> → Logs</p>\n</blockquote>\n<p><strong>Create the directories</strong></p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"10\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">$ sudo mkdir /var/lib/mongo</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">$ sudo mkdir /var/log/mongodb</span></span></code></pre>\n<p><strong>Change user &#x26; group permissions</strong></p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"11\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">$ sudo chown mongod:mongod /var/lib/mongo</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">$ sudo chown mongod:mongod /var/log/mongod</span></span></code></pre>\n<h2 id=\"create-an-admin-user\" style=\"position:relative;\"><a href=\"#create-an-admin-user\" aria-label=\"create an admin user permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Create an Admin User</h2>\n<p>The <strong>mongod daemon/service</strong> must be first running before we proceed to create a user.\nLet’s use the default config(stored in <code>/etc/mongod.conf</code>) for now and start the daemon process.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"12\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">$ sudo -u mongod mongod -f /etc/mongod.conf</span></span></code></pre>\n<p>The above command will start the mongod daemon in fork mode (default).</p>\n<p>Now, let’s login to the server and create our first admin user.</p>\n<p><strong>Open a mongo shell</strong></p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"13\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">$ mongo</span></span></code></pre>\n<p><strong>Use the \"admin\" database to create the root-admin</strong></p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"14\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">&gt; use admin</span></span></code></pre>\n<p><strong>Create the admin user</strong></p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"15\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">&gt; db.createUser({user: </span><span class=\"mtk8\">&quot;admin&quot;</span><span class=\"mtk1\">, pwd: </span><span class=\"mtk8\">&quot;password&quot;</span><span class=\"mtk1\">, roles: [{role: </span><span class=\"mtk8\">&quot;root&quot;</span><span class=\"mtk1\">, db: </span><span class=\"mtk8\">&quot;admin&quot;</span><span class=\"mtk1\">}]})</span></span></code></pre>\n<p><strong>Create a regular user</strong></p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"16\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">&gt; db.createUser({user: </span><span class=\"mtk8\">&quot;normal_user&quot;</span><span class=\"mtk1\">, pwd: </span><span class=\"mtk8\">&quot;password&quot;</span><span class=\"mtk1\">, roles: [{role: </span><span class=\"mtk8\">&quot;readWriteAnyDatabase&quot;</span><span class=\"mtk1\">, db: </span><span class=\"mtk8\">&quot;admin&quot;</span><span class=\"mtk1\">}]})</span></span></code></pre>\n<p><strong>Shutdown the server for now. We'll restart again with the modified config</strong></p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"17\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">&gt; </span><span class=\"mtk11\">db.shutDownServer</span><span class=\"mtk1\">()</span></span></code></pre>\n<h2 id=\"setting-up-authentication\" style=\"position:relative;\"><a href=\"#setting-up-authentication\" aria-label=\"setting up authentication permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Setting up Authentication</h2>\n<p>Here, we’ll enable the database authentication and modify the bind-address for our server to be accessible in the public domain.\nOpen <code>/etc/mongod.conf</code> and make the below changes.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"18\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk3\"># network interfaces</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">net:</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  port: 27017</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  bindIp: 0.0.0.0 </span><span class=\"mtk3\"># accessible on the network address</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">security:</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">  authorization: enabled </span><span class=\"mtk3\"># creds will be required for making db operations</span></span></code></pre>\n<p>Save the config and <strong>restart</strong> the server.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"19\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">$ sudo -u mongod mongod -f /etc/mongod.conf</span></span></code></pre>\n<h2 id=\"test-login\" style=\"position:relative;\"><a href=\"#test-login\" aria-label=\"test login permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Test Login</h2>\n<p>You can verify if the credentials work using,</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"bash\" data-index=\"20\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">$ mongo -u admin -p password</span></span></code></pre>\n<p>That is it about the initial setup! Please stay tuned for my next related post on the detailed migration process and tips on keeping the DB production-ready.</p>\n<p>P.S. Thanks to <a href=\"https://twitter.com/MrEnvoy17\">Piyush Kumar</a> for helping curate this post!</p>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n  .dark-default-dark {\n    background-color: #1E1E1E;\n    color: #D4D4D4;\n  }\n  .dark-default-dark .mtk1 { color: #D4D4D4; }\n  .dark-default-dark .mtk12 { color: #9CDCFE; }\n  .dark-default-dark .mtk8 { color: #CE9178; }\n  .dark-default-dark .mtk11 { color: #DCDCAA; }\n  .dark-default-dark .mtk3 { color: #6A9955; }\n</style>","frontmatter":{"title":"Self-Hosted MongoDB","author":{"id":"Chinmaya Pati","github":"cnp96","avatar":null},"date":"June 30, 2020","updated_date":null,"tags":["MongoDB","Mongo","AWS","Atlas"],"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":2.0408163265306123,"src":"/static/504642ca3a1f7d78dea8509436faa4c6/14b42/cover.jpg","srcSet":"/static/504642ca3a1f7d78dea8509436faa4c6/f836f/cover.jpg 200w,\n/static/504642ca3a1f7d78dea8509436faa4c6/2244e/cover.jpg 400w,\n/static/504642ca3a1f7d78dea8509436faa4c6/14b42/cover.jpg 800w,\n/static/504642ca3a1f7d78dea8509436faa4c6/d8343/cover.jpg 925w","sizes":"(max-width: 800px) 100vw, 800px"}}}},"fields":{"authorId":"Chinmaya Pati","slug":"/engineering/self-hosted-mongo/"}}}]},"authorYaml":{"id":"Chinmaya Pati","bio":"I'm an avid FOSS enthusiast and contributor interested in system design, web-dev, UI/UX, data-driven technologies, and DevOps.","github":"cnp96","stackoverflow":"5790355","linkedin":"chinmayapati","medium":"@chinmaya-cp","twitter":"chiku__p","avatar":null}},"pageContext":{"id":"Chinmaya Pati","__params":{"id":"chinmaya-pati"}}},"staticQueryHashes":["1171199041","1384082988","2100481360","23180105","528864852"]}