MongoDB Backup & Recovery Guide Part 1 — From RTO, RPO & Oplog to mongodump/mongorestore in Practice
MongoDB backup strategy starts with recovery objectives, not tool selection. You need to define RTO (Recovery Time Objective) and RPO (Recovery Point Objective) before you can choose backup frequency or method. Part 1 covers the core options of mongodump and mongorestore with hands-on examples, and explains how to combine Secondary-targeted backups, permission separation, checksum verification, and retention management into a real operations routine. mongodump is a portable logical backup tool, but it has limits with large databases and sharded clusters — limits that define the boundary into Part 2 (filesystem snapshots, PBM) and Part 3 (Atlas cloud backup, PITR).
Series outline
- Part 1 — From RTO, RPO & Oplog to mongodump/mongorestore in Practice (this post)
- Part 2 — Filesystem Snapshots (LVM·EBS), Percona Backup for MongoDB (PBM), Automation Pipelines (coming soon)
- Part 3 — MongoDB Atlas Cloud Backup, Point-in-Time Recovery (PITR), Disaster Recovery Checklist (coming soon)
Table of Contents
- Introduction — Backup starts with recovery objectives, not tools
- Core Concepts: RTO, RPO & Oplog
- Backup Strategy Types at a Glance
- mongodump Deep Dive
- mongorestore Deep Dive
- Production Operations Routine
- Limitations and Caveats of mongodump
- Closing — Restore Testing and What's Next in Part 2
1. Introduction — Backup starts with recovery objectives, not tools
The most common mistake when talking about MongoDB backups is starting with "which tool should I use?"
The right starting point is two questions:
- How quickly must the service be back online? (RTO)
- How much data loss is acceptable? (RPO)
Without answering these first, you cannot properly design backup frequency, tool choice, or retention policy.
So why does backup matter? In production environments, data loss occurs through more pathways than most teams expect.
- Accidental deletion: running
db.collection.drop()against the wrong environment - Schema migration errors: data transformation scripts that corrupt documents at scale
- Ransomware and infrastructure attacks: situations where data becomes inaccessible without an offline copy
- Hardware failure: a single node going down with no replica set
- Compliance requirements: GDPR, HIPAA, SOC 2, and similar regulations mandate backup retention
Without backups, a straightforward technical failure can escalate into a regulatory violation and financial penalty.
2. Core Concepts: RTO, RPO & Oplog
There are three concepts you must understand before designing a backup strategy.
RTO (Recovery Time Objective)
How quickly must the service resume after a failure?
For example, "RTO = 1 hour" means the system must be back to normal operation within one hour of an incident.
RPO (Recovery Point Objective)
How far back in time can you afford to lose data?
"RPO = 6 hours" means up to six hours of data loss is acceptable. The shorter the RPO, the higher the backup frequency required — and the higher the cost.
Oplog (Operation Log)
A special capped collection that records all write operations on a replica set in order (local.oplog.rs).
Oplog is the foundation of MongoDB replication, but it also plays a critical role in backup. Using mongodump --oplog captures changes that occurred during the backup window so the restored dump is closer to one consistent point in time.
Important:
--oplogonly works with a full instance backup. If you specify--dbor--collection, this option is not available.
3. Backup Strategy Types at a Glance
MongoDB backup methods fall into three broad categories.
| Method | Tools | Speed | Consistency | Recommended for |
|---|---|---|---|---|
| Logical backup | mongodump / mongorestore | Slow | Moderate | Small-to-medium DBs, portability required |
| Filesystem snapshot | LVM, AWS EBS, ZFS | Fast | High | Large DBs, minimal downtime |
| Cloud managed backup | MongoDB Atlas | Automatic | Very high | Cloud environments, PITR required |
Part 1 focuses exclusively on logical backup using mongodump and mongorestore.
4. mongodump Deep Dive
mongodump is MongoDB's official logical backup tool. It exports data in BSON format and saves it to files on disk.
4.1 Basic Command Structure
mongodump [options]
4.2 Common Usage Examples
Full instance backup
mongodump \
--uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
--out=/backup/mongodb/$(date +%Y%m%d_%H%M%S)
Backup a specific database
mongodump \
--uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
--db=myDatabase \
--out=/backup/mongodb/
Backup a specific collection
mongodump \
--uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
--db=myDatabase \
--collection=users \
--out=/backup/mongodb/
gzip compression + single archive file
mongodump \
--uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
--gzip \
--archive=/backup/mongodb/myDatabase_$(date +%Y%m%d).gz
Full backup with Oplog (to account for writes during the dump)
mongodump \
--uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
--oplog \
--gzip \
--out=/backup/mongodb/full_$(date +%Y%m%d_%H%M%S)
Important:
--oplogonly works with a full instance backup (no--dbor--collectionflags). Specifying a database or collection makes this option unavailable.
4.3 Recommended Setup for Replica Set Environments
In production, always target a Secondary node for backups. Running mongodump against the Primary consumes CPU, memory, and disk I/O — directly impacting application performance.
mongodump \
--uri="mongodb://secondary.example.com:27017/?readPreference=secondary&authSource=admin" \
--oplog \
--out=/backup/mongodb/
4.4 Required Permissions
mongodump requires find privileges on the target databases. The built-in backup role grants backup access across all databases.
db.createUser({
user: "backupUser",
pwd: "securePassword",
roles: [{ role: "backup", db: "admin" }]
})
5. mongorestore Deep Dive
mongorestore reads a BSON dump created by mongodump and loads it into a MongoDB instance.
5.1 Basic Restore Commands
Restore a full backup
mongorestore \
--uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
--drop \
/backup/mongodb/20260413_120000/
--dropdeletes existing collections before restoring. Use it for a clean overwrite to prevent data duplication.
Restore a specific database
mongorestore \
--uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
--db=myDatabase \
--drop \
/backup/mongodb/20260413_120000/myDatabase/
Restore a specific collection
mongorestore \
--uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
--db=myDatabase \
--collection=users \
/backup/mongodb/20260413_120000/myDatabase/users.bson
Restore a gzip archive
mongorestore \
--uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
--gzip \
--archive=/backup/mongodb/myDatabase_20260413.gz \
--drop
Restore with Oplog replay
mongorestore \
--uri="mongodb://admin:password@localhost:27017/?authSource=admin" \
--oplogReplay \
--drop \
/backup/mongodb/full_20260413_120000/
5.2 Required Permissions
mongorestore requires insert and createCollection privileges. The built-in restore role is the recommended choice.
db.createUser({
user: "restoreUser",
pwd: "securePassword",
roles: [{ role: "restore", db: "admin" }]
})
6. Production Operations Routine
A backup that stops at command examples is only half implemented. A real production routine must bundle Secondary-targeted execution, checksum verification, compression, and retention management together.
6.1 Automated Backup Script
#!/bin/bash
# MongoDB automated backup script (production use)
set -euo pipefail
MONGO_URI="mongodb://backupUser:securePassword@secondary.example.com:27017/?readPreference=secondary&authSource=admin"
BACKUP_DIR="/backup/mongodb"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_PATH="$BACKUP_DIR/$DATE"
RETENTION_DAYS=7
LOG_FILE="/var/log/mongodb_backup.log"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
log "===== MongoDB backup started: $DATE ====="
mkdir -p "$BACKUP_PATH"
log "Running mongodump..."
mongodump \
--uri="$MONGO_URI" \
--oplog \
--gzip \
--out="$BACKUP_PATH"
BACKUP_SIZE=$(du -sh "$BACKUP_PATH" | cut -f1)
if [ -d "$BACKUP_PATH" ] && [ "$(ls -A "$BACKUP_PATH")" ]; then
log "Backup succeeded. Size: $BACKUP_SIZE"
else
log "ERROR: backup directory is empty"
exit 1
fi
log "Computing checksums..."
find "$BACKUP_PATH" -type f -exec md5sum {} \; > "$BACKUP_PATH/checksums.md5"
log "Creating archive..."
tar -czf "$BACKUP_DIR/$DATE.tar.gz" -C "$BACKUP_DIR" "$DATE"
rm -rf "$BACKUP_PATH"
# AWS S3 upload (optional)
# log "Uploading to S3..."
# aws s3 cp "$BACKUP_DIR/$DATE.tar.gz" s3://my-mongodb-backups/
log "Deleting backups older than ${RETENTION_DAYS} days..."
find "$BACKUP_DIR" -name "*.tar.gz" -mtime +$RETENTION_DAYS -delete
log "===== Backup complete ====="
Register with cron — runs daily at 2 AM
0 2 * * * /opt/scripts/mongodb_backup.sh >> /var/log/mongodb_backup.log 2>&1
6.2 Restore Procedure (with checksum verification)
#!/bin/bash
BACKUP_ARCHIVE="/backup/mongodb/20260413_020000.tar.gz"
RESTORE_DIR="/tmp/mongodb_restore"
MONGO_URI="mongodb://restoreUser:securePassword@localhost:27017/?authSource=admin"
mkdir -p "$RESTORE_DIR"
tar -xzf "$BACKUP_ARCHIVE" -C "$RESTORE_DIR"
# Verify checksums before restoring
cd "$RESTORE_DIR/20260413_020000"
md5sum -c checksums.md5
mongorestore \
--uri="$MONGO_URI" \
--oplogReplay \
--drop \
--gzip \
"$RESTORE_DIR/20260413_020000"
echo "Restore complete"
7. Limitations and Caveats of mongodump
mongodump is powerful, but it is not the right tool for every situation. Understand these limitations before relying on it in production.
Performance Impact
mongodump communicates directly with a running MongoDB instance, so it competes for resources. It can cause infrequently accessed data to be loaded into memory, evicting hot data from the cache.
Running backups against a Secondary reduces this impact but does not eliminate it entirely.
Indexes Are Not Backed Up Directly
mongodump does not capture index data. All indexes must be rebuilt during restore, which can significantly extend recovery time for large databases.
Not Suitable for Large Databases
For databases in the hundreds of gigabytes or larger, mongodump is slow and resource-intensive. In these cases, consider filesystem snapshots (covered in Part 2) or Atlas cloud backup (covered in Part 3).
Sharded Cluster Caveats
When running mongodump against a sharded cluster, stop the Balancer and pause cross-shard transactions plus DDL operations such as creating or modifying collections to reduce the risk of inconsistent dumps. In sharded environments, PBM (Percona Backup for MongoDB) is a more reliable choice.
8. Closing — Restore Testing and What's Next in Part 2
Part 1 covered why MongoDB backup matters, the core concepts of RTO, RPO, and Oplog, and the hands-on usage of mongodump and mongorestore.
"A backup you've never tested restoring is just an assumption."
A successful backup log is not enough. Build regular restore rehearsals into your operations standard — combining the restore procedure, --drop, --oplogReplay, and checksum verification. Running an actual restore in a staging environment at least once a month and recording the result is strongly recommended.
Part 2 will cover filesystem snapshots (LVM / AWS EBS) for faster, lower-overhead backups, Percona Backup for MongoDB (PBM) for consistent backups in sharded clusters, and how to build a fully automated backup pipeline.