MongoDB Backup & Recovery Guide Part 2 — Filesystem Snapshots (LVM·EBS), PBM & Automation Pipelines
Logical backup alone struggles to meet the backup time, restore time, and consistency requirements of large MongoDB deployments. Part 2 covers LVM and EBS filesystem snapshots as block-level physical backups — with the WiredTiger consistency conditions that make them safe — and Percona Backup for MongoDB (PBM) for sharded clusters and strict PITR needs. It closes with a decision guide for choosing the right method per situation and a production automation pipeline that extends beyond scheduled execution to include restore verification and alerting.
Series outline
- Part 1 — From RTO, RPO & Oplog to mongodump/mongorestore in Practice
- Part 2 — Filesystem Snapshots (LVM·EBS), PBM & Automation Pipelines (this post)
- Part 3 — MongoDB Atlas Cloud Backup, Point-in-Time Recovery (PITR), Disaster Recovery Checklist (coming soon)
Table of Contents
- Introduction — Picking up where Part 1's limits left off
- What Is a Filesystem Snapshot?
- WiredTiger and Snapshot Consistency
- MongoDB Backup with LVM Snapshots
- AWS EBS Snapshot Strategy
- Percona Backup for MongoDB (PBM) Deep Dive
- Installing and Configuring PBM
- PBM Backup and Restore Commands in Practice
- Backup Method Comparison — Choosing by Situation
- Automation Pipeline: From Backup Execution to Verification and Alerting
- Closing — What's Next in Part 3
1. Introduction — Picking up where Part 1's limits left off
Part 1 established that mongodump is portable and simple, making it a good fit for smaller databases. But logical backup alone falls short in these situations:
- Backup time is too long for databases in the hundreds of gigabytes or larger
- Index rebuilding adds significant time during restore
- Consistent backups across a sharded cluster are hard to guarantee
- RPO is tight enough that real Point-in-Time Recovery is necessary
In these cases, the next step goes in two directions: filesystem snapshots (LVM/EBS) for block-level physical backup, and Percona Backup for MongoDB (PBM) as a distributed backup tool.
2. What Is a Filesystem Snapshot?
A filesystem snapshot is a physical backup method that captures an instant copy of a disk image at the block level. Unlike mongodump, which communicates directly with MongoDB and reads data document by document, snapshots operate at the storage layer of the operating system or cloud infrastructure.
The core mechanism is Copy-on-Write (CoW). At snapshot creation time, pointers are established between the live data and the snapshot volume. Only when the live data actually changes are the modified blocks copied into the snapshot volume. This makes snapshot creation extremely fast and keeps initial storage overhead low.
Key advantages of snapshot backup:
- Even multi-hundred-GB databases can have a snapshot created in seconds
- Nearly zero CPU and memory overhead on the MongoDB instance
- Block-level copy means indexes are preserved as-is — no rebuild needed after restore
- Automatable at the infrastructure level in cloud environments (AWS, GCP, Azure)
3. WiredTiger and Snapshot Consistency
MongoDB's default storage engine, WiredTiger, pairs well with snapshot backups.
WiredTiger guarantees data durability through checkpoints. MongoDB configures WiredTiger to create checkpoints at 60-second intervals, writing in-memory data to disk in a consistent state at that point. When journaling is enabled, even if a snapshot is taken between two checkpoints, WiredTiger will automatically replay the journal on restart to bring the data to a consistent state.
There is one rule that must be followed strictly.
Critical: If the data directory (
/data/db) and the journal directory live on separate volumes, or if journaling is disabled, lock writes withdb.fsyncLock()and capture the related volumes within the same backup procedure. Taking the data volume and journal volume separately breaks the consistency guarantee.
db.fsyncLock() — When Is It Actually Needed?
When using LVM snapshots with WiredTiger, db.fsyncLock() is generally not required. With journaling enabled, WiredTiger's own snapshot mechanism handles consistency.
However, if the journal and data files are on separate volumes, if journaling is disabled, or if the snapshot tool does not guarantee atomic consistency across volumes, lock writes with db.fsyncLock(), take the snapshot, then release immediately with db.fsyncUnlock().
// Run in mongosh
// 1. Flush all data to disk and lock writes
db.adminCommand({ fsync: 1, lock: true })
// → Take the snapshot here using an external tool
// 2. Release the lock immediately
db.adminCommand({ fsyncUnlock: 1 })
4. MongoDB Backup with LVM Snapshots
LVM (Logical Volume Manager) is the most widely used snapshot backup method for Linux on-premises MongoDB deployments.
4.1 Prerequisites
MongoDB data files must be running on an LVM volume.
# Verify volume group
sudo vgdisplay
# Verify logical volumes
sudo lvdisplay
# Confirm MongoDB data path (e.g. /dev/vg0/mongodb → /data/mongodb)
df -h /data/mongodb
4.2 Creating an LVM Snapshot (Backup)
#!/bin/bash
# LVM snapshot MongoDB backup script
set -euo pipefail
VG_NAME="vg0"
LV_NAME="mongodb"
SNAP_NAME="mdb-snap-$(date +%Y%m%d)"
SNAP_SIZE="20G"
BACKUP_DIR="/backup/mongodb-snapshots"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p "$BACKUP_DIR"
echo "[$(date)] Starting LVM snapshot creation..."
# 1. Create snapshot (typically completes in under 1 second)
sudo lvcreate \
--size "$SNAP_SIZE" \
--snapshot \
--name "$SNAP_NAME" \
/dev/$VG_NAME/$LV_NAME
echo "[$(date)] Snapshot created: /dev/$VG_NAME/$SNAP_NAME"
# 2. Mount read-only
sudo mkdir -p /mnt/mongodb-snapshot
sudo mount -o ro /dev/$VG_NAME/$SNAP_NAME /mnt/mongodb-snapshot
# 3. Archive to compressed file
sudo tar -czf "$BACKUP_DIR/$DATE.tar.gz" \
-C /mnt/mongodb-snapshot .
echo "[$(date)] Archive created: $BACKUP_DIR/$DATE.tar.gz"
# 4. Unmount and remove snapshot
sudo umount /mnt/mongodb-snapshot
sudo lvremove -f /dev/$VG_NAME/$SNAP_NAME
echo "[$(date)] LVM snapshot backup complete"
4.3 Restoring from an LVM Snapshot
#!/bin/bash
BACKUP_ARCHIVE="/backup/mongodb-snapshots/20260413_020000.tar.gz"
RESTORE_LV="mdb-new"
VG_NAME="vg0"
RESTORE_SIZE="100G"
# 1. Create a new logical volume for the restore
sudo lvcreate --size $RESTORE_SIZE --name $RESTORE_LV $VG_NAME
# 2. Restore the backup archive onto the new volume
sudo gzip -d -c "$BACKUP_ARCHIVE" | sudo dd of=/dev/$VG_NAME/$RESTORE_LV
# 3. Mount
sudo mount /dev/$VG_NAME/$RESTORE_LV /srv/mongodb
# 4. Remove lock files (critical)
sudo rm -f /srv/mongodb/mongod.lock
# 5. Restart mongod pointing to the restored data path
sudo mongod --dbpath /srv/mongodb --repair
sudo mongod --dbpath /srv/mongodb
echo "Restore complete"
Important: A snapshot can contain a stale
mongod.lockfile. If you useddb.fsyncLock(), remove that file before restore; do not delete other WiredTiger files blindly. Use startup logs and MongoDB's official recovery procedure to decide the next step.
5. AWS EBS Snapshot Strategy
For MongoDB running in the cloud, AWS EBS snapshots are the most straightforward physical backup option. EBS snapshots are stored in S3 using an incremental method — after the first snapshot, only changed blocks are stored, making them highly cost-efficient.
Caveats for RAID Configurations
If multiple EBS volumes are striped with RAID, individual EBS snapshot tools alone cannot guarantee a consistent state across all disks. In that case, choose one of two approaches:
- LVM on top of RAID: use LVM snapshots to capture a consistent state in a single operation
db.fsyncLock()before EBS snapshot: lock writes, then snapshot all volumes simultaneously
Automated EBS Snapshots with the AWS CLI
#!/bin/bash
# AWS EBS snapshot automation script (single-volume environment)
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
VOLUME_ID="vol-0123456789abcdef0"
DESCRIPTION="mongodb-backup-$(date +%Y%m%d-%H%M%S)"
REGION="ap-northeast-2"
RETENTION_DAYS=7
echo "[$(date)] Starting EBS snapshot creation..."
SNAPSHOT_ID=$(aws ec2 create-snapshot \
--region "$REGION" \
--volume-id "$VOLUME_ID" \
--description "$DESCRIPTION" \
--tag-specifications "ResourceType=snapshot,Tags=[{Key=Name,Value=$DESCRIPTION},{Key=Retention,Value=$RETENTION_DAYS}]" \
--query 'SnapshotId' \
--output text)
echo "[$(date)] Snapshot request submitted: $SNAPSHOT_ID"
aws ec2 wait snapshot-completed \
--region "$REGION" \
--snapshot-ids "$SNAPSHOT_ID"
echo "[$(date)] Snapshot complete: $SNAPSHOT_ID"
# Delete snapshots beyond retention window
CUTOFF_DATE=$(date -d "$RETENTION_DAYS days ago" +%Y-%m-%dT%H:%M:%SZ)
OLD_SNAPSHOTS=$(aws ec2 describe-snapshots \
--region "$REGION" \
--filters "Name=tag:Retention,Values=$RETENTION_DAYS" \
--query "Snapshots[?StartTime<'$CUTOFF_DATE'].SnapshotId" \
--output text)
for snap in $OLD_SNAPSHOTS; do
aws ec2 delete-snapshot --region "$REGION" --snapshot-id "$snap"
echo "[$(date)] Deleted old snapshot: $snap"
done
echo "[$(date)] EBS snapshot backup complete"
6. Percona Backup for MongoDB (PBM) Deep Dive
mongodump excels in small environments; LVM/EBS snapshots shine for large single-node or replica set deployments. But when you need a consistent backup across multiple shards in a sharded cluster, both approaches hit a wall.
That is where Percona Backup for MongoDB (PBM) comes in. PBM is an open-source distributed backup solution for MongoDB replica sets and sharded clusters, offering enterprise-grade features at no cost. It is a separate tool from MongoDB's official Atlas backup.
Cluster-Wide Consistency
In a sharded cluster, each shard's backup may be captured at slightly different points in time. PBM uses idempotent Oplog updates to "fast-forward" every shard to the same timestamp, achieving cluster-wide consistency.
Supported Backup Types
| Type | Description | Characteristics |
|---|---|---|
| Logical | BSON data dump | Portable, slow for large data |
| Physical (Hot) | Direct WiredTiger file copy | Very fast, no downtime |
| Selective | Specific DB/collection only | Storage-efficient, fast restore |
| Incremental Physical | Changed blocks only | Cost-effective, shorter recovery |
Storage backends include AWS S3, Azure Blob, Google Cloud Storage, and MinIO (on-premises S3-compatible). PBM also captures Oplog continuously for 24/7 PITR when combined with a base snapshot.
7. Installing and Configuring PBM
7.1 Installation (Ubuntu/Debian)
# Add Percona repository and install PBM
wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb
sudo dpkg -i percona-release_latest.generic_all.deb
sudo percona-release enable pbm release
sudo apt-get update
sudo apt-get install -y percona-backup-mongodb
7.2 Create a Dedicated PBM User in MongoDB
// Run in mongosh
use admin
db.createUser({
user: "pbmUser",
pwd: "pbmSecurePassword",
roles: [
{ role: "readWrite", db: "admin" },
{ role: "backup", db: "admin" },
{ role: "restore", db: "admin" },
{ role: "clusterMonitor", db: "admin" },
{ role: "readAnyDatabase", db: "admin" }
]
})
7.3 PBM Configuration File (S3 Storage Example)
# /etc/pbm/pbm_config.yaml
storage:
type: s3
s3:
region: ap-northeast-2
bucket: my-mongodb-backups
prefix: production-cluster
credentials:
access-key-id: YOUR_ACCESS_KEY
secret-access-key: YOUR_SECRET_KEY
serverSideEncryption:
sseAlgorithm: aws:kms
kmsKeyID: arn:aws:kms:ap-northeast-2:123456789:key/abcd-1234
pitr:
enabled: true
oplogSpanMin: 10 # Save an Oplog slice every 10 minutes
restore:
batchSize: 500
numInsertionWorkers: 10
# Apply and verify configuration
pbm config --file /etc/pbm/pbm_config.yaml
pbm config --list
7.4 Registering the pbm-agent Service
A pbm-agent must be running on every node in the replica set.
# Set environment variable (/etc/default/pbm-agent)
PBM_MONGODB_URI="mongodb://pbmUser:pbmSecurePassword@localhost:27017/?authSource=admin"
# Enable, start, and verify
sudo systemctl enable pbm-agent
sudo systemctl start pbm-agent
pbm status
8. PBM Backup and Restore Commands in Practice
8.1 Running a Backup
# Logical backup
pbm backup
# Physical (hot) backup — recommended for large DBs
pbm backup --type physical
# Incremental physical backup (only changed blocks after the first)
pbm backup --type incremental
# Selective backup for specific namespaces
pbm backup --ns="myDatabase.users,myDatabase.orders"
# List available backups
pbm list
8.2 Checking Backup Status
# Full cluster backup status
pbm status
# Detailed backup list
pbm list --full
# Example output:
# Snapshots:
# 2026-04-13T02:00:05Z [physical] <-- replset: rs0, rs1 | ok
# 2026-04-12T02:00:03Z [logical] <-- replset: rs0, rs1 | ok
#
# PITR <on>:
# 2026-04-12T02:00:03Z - 2026-04-13T09:45:22Z
8.3 Running a Restore
# Restore to a specific snapshot
pbm restore 2026-04-13T02:00:05Z
# PITR — restore to a specific point in time (second-level precision)
# Example: a collection was accidentally dropped at 09:30:00 — restore to 1 minute before
pbm restore --time="2026-04-13T09:29:00"
# Selective restore for a specific namespace
pbm restore 2026-04-13T02:00:05Z --ns="myDatabase.users"
# Monitor restore progress
pbm logs --event restore
8.4 Automated Scheduled Backups with cron
# /etc/cron.d/pbm-backup
# Full physical backup every day at 2 AM
0 2 * * * root pbm backup --type physical >> /var/log/pbm-backup.log 2>&1
# PITR is always active via pitr.enabled: true in pbm_config.yaml — no separate cron needed
9. Backup Method Comparison — Choosing by Situation
The right question is not "which method is best" but "which situation am I in?"
| Criteria | mongodump | Filesystem Snapshot | PBM |
|---|---|---|---|
| Backup speed | Slow (hours for large DBs) | Fast (seconds to minutes) | Fast (physical backup) |
| Restore speed | Slow (index rebuild required) | Fast | Fast |
| Consistency | Moderate (--oplog improves it) | High (with journaling enabled) | Very high |
| Sharded cluster | Limited | Difficult | Full support |
| PITR support | Limited | Not standalone | Full support |
| Storage efficiency | Moderate (gzip available) | Incremental (EBS) | Incremental physical |
| Setup complexity | Easy | Moderate | Moderate to complex |
| Cost | Free | Infrastructure cost | Free (open-source) |
| Recommended for | Small scale, migration | Mid-large on-premises | Mid-large, sharded clusters |
10. Automation Pipeline: From Backup Execution to Verification and Alerting
The core of a solid backup strategy is automation. Manual procedures are vulnerable to human error and oversight, and a successful backup log is not proof that the data can actually be recovered.
10.1 Full PBM-Based Pipeline Flow
10.2 Automated Backup Integrity Verification Script
This script restores the latest backup to an isolated test instance and confirms the document count.
#!/bin/bash
# MongoDB backup integrity verification script (recommended weekly)
set -euo pipefail
TEST_MONGO_URI="mongodb://admin:password@test-mongo:27017/?authSource=admin"
ALERT_EMAIL="ops-team@company.com"
LOG_FILE="/var/log/mongodb-backup-verify.log"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
log "===== Backup integrity verification started ====="
# 1. Get the most recent backup ID
LATEST_BACKUP=$(pbm list --json 2>/dev/null | \
python3 -c "import json,sys; data=json.load(sys.stdin); print(data['snapshots'][0]['name'])")
log "Target backup: $LATEST_BACKUP"
# 2. Restore to the test instance
log "Restoring to test instance..."
PBM_MONGODB_URI="$TEST_MONGO_URI" pbm restore "$LATEST_BACKUP" --wait
# 3. Verify document count in a key collection
log "Verifying restore result..."
DOC_COUNT=$(mongosh "$TEST_MONGO_URI" --quiet --eval \
"db.getSiblingDB('myDatabase').users.countDocuments({})")
if [ "$DOC_COUNT" -gt 0 ]; then
log "Verification passed: users collection document count = $DOC_COUNT"
else
log "Verification FAILED: document count = 0"
echo "MongoDB backup verification failed: $LATEST_BACKUP" | \
mail -s "[ALERT] MongoDB Backup Verify Failed" "$ALERT_EMAIL"
exit 1
fi
log "===== Backup integrity verification complete ====="
10.3 Slack Notification Integration
#!/bin/bash
# Send PBM backup results to a Slack Webhook
SLACK_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
BACKUP_LOG=$(pbm logs --event backup --tail 20 2>&1)
if echo "$BACKUP_LOG" | grep -q "error"; then
STATUS="FAILED"
COLOR="danger"
else
STATUS="SUCCESS"
COLOR="good"
fi
curl -s -X POST "$SLACK_WEBHOOK" \
-H 'Content-type: application/json' \
-d "{
\"attachments\": [{
\"color\": \"$COLOR\",
\"title\": \"MongoDB Backup: $STATUS\",
\"text\": \"$(date '+%Y-%m-%d %H:%M:%S')\",
\"footer\": \"MongoDB Backup Monitor\"
}]
}"
11. Closing — What's Next in Part 3
Part 2 covered filesystem snapshots (LVM/EBS) for high-speed physical backup in large deployments, and Percona Backup for MongoDB (PBM) for solving consistency challenges in sharded clusters. It also walked through the decision criteria for choosing between methods and how to build an automation pipeline that extends to restore verification and alerting.
"An unautomated backup procedure will eventually fail."
Backup failures arrive without warning, and by then it is already too late. Scheduled execution alone is not enough — restore verification and alerting must be part of the pipeline before it qualifies as a real backup strategy.
Part 3 will cover MongoDB Atlas cloud backup for delegating all infrastructure management to the cloud, Point-in-Time Recovery (PITR) with second-level precision, and a practical disaster recovery (DR) checklist and Runbook based on real failure scenarios.