Saturday, May 2, 2026
All posts
Lv.2 BeginnerMongoDB
22 min readLv.2 Beginner
SeriesMongoDB Backup & Recovery Guide · Part 1/3View series hub

MongoDB Backup & Recovery Guide Part 1 — From RTO, RPO & Oplog to mongodump/mongorestore in Practice

MongoDB Backup & Recovery Guide Part 1 — From RTO, RPO & Oplog to mongodump/mongorestore in Practice

MongoDB backup strategy starts with recovery objectives, not tool selection. You need to define RTO (Recovery Time Objective) and RPO (Recovery Point Objective) before you can choose backup frequency or method. Part 1 covers the core options of mongodump and mongorestore with hands-on examples, and explains how to combine Secondary-targeted backups, permission separation, checksum verification, and retention management into a real operations routine. mongodump is a portable logical backup tool, but it has limits with large databases and sharded clusters — limits that define the boundary into Part 2 (filesystem snapshots, PBM) and Part 3 (Atlas cloud backup, PITR).

Series outline

  • Part 1 — From RTO, RPO & Oplog to mongodump/mongorestore in Practice (this post)
  • Part 2 — Filesystem Snapshots (LVM·EBS), Percona Backup for MongoDB (PBM), Automation Pipelines (coming soon)
  • Part 3 — MongoDB Atlas Cloud Backup, Point-in-Time Recovery (PITR), Disaster Recovery Checklist (coming soon)

Table of Contents

  1. Introduction — Backup starts with recovery objectives, not tools
  2. Core Concepts: RTO, RPO & Oplog
  3. Backup Strategy Types at a Glance
  4. mongodump Deep Dive
  5. mongorestore Deep Dive
  6. Production Operations Routine
  7. Limitations and Caveats of mongodump
  8. Closing — Restore Testing and What's Next in Part 2

1. Introduction — Backup starts with recovery objectives, not tools

The most common mistake when talking about MongoDB backups is starting with "which tool should I use?"

The right starting point is two questions:

  • How quickly must the service be back online? (RTO)
  • How much data loss is acceptable? (RPO)

Without answering these first, you cannot properly design backup frequency, tool choice, or retention policy.

So why does backup matter? In production environments, data loss occurs through more pathways than most teams expect.

  • Accidental deletion: running db.collection.drop() against the wrong environment
  • Schema migration errors: data transformation scripts that corrupt documents at scale
  • Ransomware and infrastructure attacks: situations where data becomes inaccessible without an offline copy
  • Hardware failure: a single node going down with no replica set
  • Compliance requirements: GDPR, HIPAA, SOC 2, and similar regulations mandate backup retention

Without backups, a straightforward technical failure can escalate into a regulatory violation and financial penalty.


2. Core Concepts: RTO, RPO & Oplog

There are three concepts you must understand before designing a backup strategy.

RTO (Recovery Time Objective)

How quickly must the service resume after a failure?

For example, "RTO = 1 hour" means the system must be back to normal operation within one hour of an incident.

RPO (Recovery Point Objective)

How far back in time can you afford to lose data?

"RPO = 6 hours" means up to six hours of data loss is acceptable. The shorter the RPO, the higher the backup frequency required — and the higher the cost.

Oplog (Operation Log)

A special capped collection that records all write operations on a replica set in order (local.oplog.rs).

Oplog is the foundation of MongoDB replication, but it also plays a critical role in backup. Using mongodump --oplog captures changes that occurred during the backup window so the restored dump is closer to one consistent point in time.

Important: --oplog only works with a full instance backup. If you specify --db or --collection, this option is not available.


3. Backup Strategy Types at a Glance

MongoDB backup methods fall into three broad categories.

MethodToolsSpeedConsistencyRecommended for
Logical backupmongodump / mongorestoreSlowModerateSmall-to-medium DBs, portability required
Filesystem snapshotLVM, AWS EBS, ZFSFastHighLarge DBs, minimal downtime
Cloud managed backupMongoDB AtlasAutomaticVery highCloud environments, PITR required

Part 1 focuses exclusively on logical backup using mongodump and mongorestore.


4. mongodump Deep Dive

mongodump is MongoDB's official logical backup tool. It exports data in BSON format and saves it to files on disk.

4.1 Basic Command Structure

mongodump [options]

4.2 Common Usage Examples

Full instance backup

mongodump \
  --uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
  --out=/backup/mongodb/$(date +%Y%m%d_%H%M%S)

Backup a specific database

mongodump \
  --uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
  --db=myDatabase \
  --out=/backup/mongodb/

Backup a specific collection

mongodump \
  --uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
  --db=myDatabase \
  --collection=users \
  --out=/backup/mongodb/

gzip compression + single archive file

mongodump \
  --uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
  --gzip \
  --archive=/backup/mongodb/myDatabase_$(date +%Y%m%d).gz

Full backup with Oplog (to account for writes during the dump)

mongodump \
  --uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
  --oplog \
  --gzip \
  --out=/backup/mongodb/full_$(date +%Y%m%d_%H%M%S)

Important: --oplog only works with a full instance backup (no --db or --collection flags). Specifying a database or collection makes this option unavailable.

4.3 Recommended Setup for Replica Set Environments

In production, always target a Secondary node for backups. Running mongodump against the Primary consumes CPU, memory, and disk I/O — directly impacting application performance.

mongodump \
  --uri="mongodb://secondary.example.com:27017/?readPreference=secondary&authSource=admin" \
  --oplog \
  --out=/backup/mongodb/

4.4 Required Permissions

mongodump requires find privileges on the target databases. The built-in backup role grants backup access across all databases.

db.createUser({
  user: "backupUser",
  pwd: "securePassword",
  roles: [{ role: "backup", db: "admin" }]
})

5. mongorestore Deep Dive

mongorestore reads a BSON dump created by mongodump and loads it into a MongoDB instance.

5.1 Basic Restore Commands

Restore a full backup

mongorestore \
  --uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
  --drop \
  /backup/mongodb/20260413_120000/

--drop deletes existing collections before restoring. Use it for a clean overwrite to prevent data duplication.

Restore a specific database

mongorestore \
  --uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
  --db=myDatabase \
  --drop \
  /backup/mongodb/20260413_120000/myDatabase/

Restore a specific collection

mongorestore \
  --uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
  --db=myDatabase \
  --collection=users \
  /backup/mongodb/20260413_120000/myDatabase/users.bson

Restore a gzip archive

mongorestore \
  --uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
  --gzip \
  --archive=/backup/mongodb/myDatabase_20260413.gz \
  --drop

Restore with Oplog replay

mongorestore \
  --uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
  --oplogReplay \
  --drop \
  /backup/mongodb/full_20260413_120000/

5.2 Required Permissions

mongorestore requires insert and createCollection privileges. The built-in restore role is the recommended choice.

db.createUser({
  user: "restoreUser",
  pwd: "securePassword",
  roles: [{ role: "restore", db: "admin" }]
})

6. Production Operations Routine

A backup that stops at command examples is only half implemented. A real production routine must bundle Secondary-targeted execution, checksum verification, compression, and retention management together.

6.1 Automated Backup Script

#!/bin/bash
# MongoDB automated backup script (production use)

set -euo pipefail

MONGO_URI="mongodb://backupUser:securePassword@secondary.example.com:27017/?readPreference=secondary&authSource=admin"
BACKUP_DIR="/backup/mongodb"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_PATH="$BACKUP_DIR/$DATE"
RETENTION_DAYS=7
LOG_FILE="/var/log/mongodb_backup.log"

log() {
  echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}

log "===== MongoDB backup started: $DATE ====="

mkdir -p "$BACKUP_PATH"

log "Running mongodump..."
mongodump \
  --uri="$MONGO_URI" \
  --oplog \
  --gzip \
  --out="$BACKUP_PATH"

BACKUP_SIZE=$(du -sh "$BACKUP_PATH" | cut -f1)
if [ -d "$BACKUP_PATH" ] && [ "$(ls -A "$BACKUP_PATH")" ]; then
  log "Backup succeeded. Size: $BACKUP_SIZE"
else
  log "ERROR: backup directory is empty"
  exit 1
fi

log "Computing checksums..."
find "$BACKUP_PATH" -type f -exec md5sum {} \; > "$BACKUP_PATH/checksums.md5"

log "Creating archive..."
tar -czf "$BACKUP_DIR/$DATE.tar.gz" -C "$BACKUP_DIR" "$DATE"
rm -rf "$BACKUP_PATH"

# AWS S3 upload (optional)
# log "Uploading to S3..."
# aws s3 cp "$BACKUP_DIR/$DATE.tar.gz" s3://my-mongodb-backups/

log "Deleting backups older than ${RETENTION_DAYS} days..."
find "$BACKUP_DIR" -name "*.tar.gz" -mtime +$RETENTION_DAYS -delete

log "===== Backup complete ====="

Register with cron — runs daily at 2 AM

0 2 * * * /opt/scripts/mongodb_backup.sh >> /var/log/mongodb_backup.log 2>&1

6.2 Restore Procedure (with checksum verification)

#!/bin/bash
BACKUP_ARCHIVE="/backup/mongodb/20260413_020000.tar.gz"
RESTORE_DIR="/tmp/mongodb_restore"
MONGO_URI="mongodb://restoreUser:securePassword@localhost:27017/?authSource=admin"

mkdir -p "$RESTORE_DIR"
tar -xzf "$BACKUP_ARCHIVE" -C "$RESTORE_DIR"

# Verify checksums before restoring
cd "$RESTORE_DIR/20260413_020000"
md5sum -c checksums.md5

mongorestore \
  --uri="$MONGO_URI" \
  --oplogReplay \
  --drop \
  --gzip \
  "$RESTORE_DIR/20260413_020000"

echo "Restore complete"

7. Limitations and Caveats of mongodump

mongodump is powerful, but it is not the right tool for every situation. Understand these limitations before relying on it in production.

Performance Impact

mongodump communicates directly with a running MongoDB instance, so it competes for resources. It can cause infrequently accessed data to be loaded into memory, evicting hot data from the cache.

Running backups against a Secondary reduces this impact but does not eliminate it entirely.

Indexes Are Not Backed Up Directly

mongodump does not capture index data. All indexes must be rebuilt during restore, which can significantly extend recovery time for large databases.

Not Suitable for Large Databases

For databases in the hundreds of gigabytes or larger, mongodump is slow and resource-intensive. In these cases, consider filesystem snapshots (covered in Part 2) or Atlas cloud backup (covered in Part 3).

Sharded Cluster Caveats

When running mongodump against a sharded cluster, stop the Balancer and pause cross-shard transactions plus DDL operations such as creating or modifying collections to reduce the risk of inconsistent dumps. In sharded environments, PBM (Percona Backup for MongoDB) is a more reliable choice.


8. Closing — Restore Testing and What's Next in Part 2

Part 1 covered why MongoDB backup matters, the core concepts of RTO, RPO, and Oplog, and the hands-on usage of mongodump and mongorestore.

"A backup you've never tested restoring is just an assumption."

A successful backup log is not enough. Build regular restore rehearsals into your operations standard — combining the restore procedure, --drop, --oplogReplay, and checksum verification. Running an actual restore in a staging environment at least once a month and recording the result is strongly recommended.

Part 2 will cover filesystem snapshots (LVM / AWS EBS) for faster, lower-overhead backups, Percona Backup for MongoDB (PBM) for consistent backups in sharded clusters, and how to build a fully automated backup pipeline.

Share This Article

Series Navigation

MongoDB Backup & Recovery Guide

1 / 3 · 1

Previous
View full series
Next

Explore this topic·Start with featured series

한국어

Follow new posts via RSS

Until the newsletter opens, RSS is the fastest way to get updates.

Open RSS Guide