Sunday, May 3, 2026
All posts
Lv.2 BeginnerMongoDB
25 min readLv.2 Beginner
SeriesMongoDB Backup & Recovery Guide · Part 2/3View series hub

MongoDB Backup & Recovery Guide Part 2 — Filesystem Snapshots (LVM·EBS), PBM & Automation Pipelines

MongoDB Backup & Recovery Guide Part 2 — Filesystem Snapshots (LVM·EBS), PBM & Automation Pipelines

Logical backup alone struggles to meet the backup time, restore time, and consistency requirements of large MongoDB deployments. Part 2 covers LVM and EBS filesystem snapshots as block-level physical backups — with the WiredTiger consistency conditions that make them safe — and Percona Backup for MongoDB (PBM) for sharded clusters and strict PITR needs. It closes with a decision guide for choosing the right method per situation and a production automation pipeline that extends beyond scheduled execution to include restore verification and alerting.

Series outline

Table of Contents

  1. Introduction — Picking up where Part 1's limits left off
  2. What Is a Filesystem Snapshot?
  3. WiredTiger and Snapshot Consistency
  4. MongoDB Backup with LVM Snapshots
  5. AWS EBS Snapshot Strategy
  6. Percona Backup for MongoDB (PBM) Deep Dive
  7. Installing and Configuring PBM
  8. PBM Backup and Restore Commands in Practice
  9. Backup Method Comparison — Choosing by Situation
  10. Automation Pipeline: From Backup Execution to Verification and Alerting
  11. Closing — What's Next in Part 3

1. Introduction — Picking up where Part 1's limits left off

Part 1 established that mongodump is portable and simple, making it a good fit for smaller databases. But logical backup alone falls short in these situations:

  • Backup time is too long for databases in the hundreds of gigabytes or larger
  • Index rebuilding adds significant time during restore
  • Consistent backups across a sharded cluster are hard to guarantee
  • RPO is tight enough that real Point-in-Time Recovery is necessary

In these cases, the next step goes in two directions: filesystem snapshots (LVM/EBS) for block-level physical backup, and Percona Backup for MongoDB (PBM) as a distributed backup tool.


2. What Is a Filesystem Snapshot?

A filesystem snapshot is a physical backup method that captures an instant copy of a disk image at the block level. Unlike mongodump, which communicates directly with MongoDB and reads data document by document, snapshots operate at the storage layer of the operating system or cloud infrastructure.

The core mechanism is Copy-on-Write (CoW). At snapshot creation time, pointers are established between the live data and the snapshot volume. Only when the live data actually changes are the modified blocks copied into the snapshot volume. This makes snapshot creation extremely fast and keeps initial storage overhead low.

Key advantages of snapshot backup:

  • Even multi-hundred-GB databases can have a snapshot created in seconds
  • Nearly zero CPU and memory overhead on the MongoDB instance
  • Block-level copy means indexes are preserved as-is — no rebuild needed after restore
  • Automatable at the infrastructure level in cloud environments (AWS, GCP, Azure)

3. WiredTiger and Snapshot Consistency

MongoDB's default storage engine, WiredTiger, pairs well with snapshot backups.

WiredTiger guarantees data durability through checkpoints. MongoDB configures WiredTiger to create checkpoints at 60-second intervals, writing in-memory data to disk in a consistent state at that point. When journaling is enabled, even if a snapshot is taken between two checkpoints, WiredTiger will automatically replay the journal on restart to bring the data to a consistent state.

There is one rule that must be followed strictly.

Critical: If the data directory (/data/db) and the journal directory live on separate volumes, or if journaling is disabled, lock writes with db.fsyncLock() and capture the related volumes within the same backup procedure. Taking the data volume and journal volume separately breaks the consistency guarantee.

db.fsyncLock() — When Is It Actually Needed?

When using LVM snapshots with WiredTiger, db.fsyncLock() is generally not required. With journaling enabled, WiredTiger's own snapshot mechanism handles consistency.

However, if the journal and data files are on separate volumes, if journaling is disabled, or if the snapshot tool does not guarantee atomic consistency across volumes, lock writes with db.fsyncLock(), take the snapshot, then release immediately with db.fsyncUnlock().

// Run in mongosh
// 1. Flush all data to disk and lock writes
db.adminCommand({ fsync: 1, lock: true })

// → Take the snapshot here using an external tool

// 2. Release the lock immediately
db.adminCommand({ fsyncUnlock: 1 })

4. MongoDB Backup with LVM Snapshots

LVM (Logical Volume Manager) is the most widely used snapshot backup method for Linux on-premises MongoDB deployments.

4.1 Prerequisites

MongoDB data files must be running on an LVM volume.

# Verify volume group
sudo vgdisplay

# Verify logical volumes
sudo lvdisplay

# Confirm MongoDB data path (e.g. /dev/vg0/mongodb → /data/mongodb)
df -h /data/mongodb

4.2 Creating an LVM Snapshot (Backup)

#!/bin/bash
# LVM snapshot MongoDB backup script

set -euo pipefail

VG_NAME="vg0"
LV_NAME="mongodb"
SNAP_NAME="mdb-snap-$(date +%Y%m%d)"
SNAP_SIZE="20G"
BACKUP_DIR="/backup/mongodb-snapshots"
DATE=$(date +%Y%m%d_%H%M%S)

mkdir -p "$BACKUP_DIR"

echo "[$(date)] Starting LVM snapshot creation..."

# 1. Create snapshot (typically completes in under 1 second)
sudo lvcreate \
  --size "$SNAP_SIZE" \
  --snapshot \
  --name "$SNAP_NAME" \
  /dev/$VG_NAME/$LV_NAME

echo "[$(date)] Snapshot created: /dev/$VG_NAME/$SNAP_NAME"

# 2. Mount read-only
sudo mkdir -p /mnt/mongodb-snapshot
sudo mount -o ro /dev/$VG_NAME/$SNAP_NAME /mnt/mongodb-snapshot

# 3. Archive to compressed file
sudo tar -czf "$BACKUP_DIR/$DATE.tar.gz" \
  -C /mnt/mongodb-snapshot .

echo "[$(date)] Archive created: $BACKUP_DIR/$DATE.tar.gz"

# 4. Unmount and remove snapshot
sudo umount /mnt/mongodb-snapshot
sudo lvremove -f /dev/$VG_NAME/$SNAP_NAME

echo "[$(date)] LVM snapshot backup complete"

4.3 Restoring from an LVM Snapshot

#!/bin/bash
BACKUP_ARCHIVE="/backup/mongodb-snapshots/20260413_020000.tar.gz"
RESTORE_LV="mdb-new"
VG_NAME="vg0"
RESTORE_SIZE="100G"

# 1. Create a new logical volume for the restore
sudo lvcreate --size $RESTORE_SIZE --name $RESTORE_LV $VG_NAME

# 2. Restore the backup archive onto the new volume
sudo gzip -d -c "$BACKUP_ARCHIVE" | sudo dd of=/dev/$VG_NAME/$RESTORE_LV

# 3. Mount
sudo mount /dev/$VG_NAME/$RESTORE_LV /srv/mongodb

# 4. Remove lock files (critical)
sudo rm -f /srv/mongodb/mongod.lock
# 5. Restart mongod pointing to the restored data path
sudo mongod --dbpath /srv/mongodb --repair
sudo mongod --dbpath /srv/mongodb

echo "Restore complete"

Important: A snapshot can contain a stale mongod.lock file. If you used db.fsyncLock(), remove that file before restore; do not delete other WiredTiger files blindly. Use startup logs and MongoDB's official recovery procedure to decide the next step.


5. AWS EBS Snapshot Strategy

For MongoDB running in the cloud, AWS EBS snapshots are the most straightforward physical backup option. EBS snapshots are stored in S3 using an incremental method — after the first snapshot, only changed blocks are stored, making them highly cost-efficient.

Caveats for RAID Configurations

If multiple EBS volumes are striped with RAID, individual EBS snapshot tools alone cannot guarantee a consistent state across all disks. In that case, choose one of two approaches:

  1. LVM on top of RAID: use LVM snapshots to capture a consistent state in a single operation
  2. db.fsyncLock() before EBS snapshot: lock writes, then snapshot all volumes simultaneously

Automated EBS Snapshots with the AWS CLI

#!/bin/bash
# AWS EBS snapshot automation script (single-volume environment)

INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
VOLUME_ID="vol-0123456789abcdef0"
DESCRIPTION="mongodb-backup-$(date +%Y%m%d-%H%M%S)"
REGION="ap-northeast-2"
RETENTION_DAYS=7

echo "[$(date)] Starting EBS snapshot creation..."

SNAPSHOT_ID=$(aws ec2 create-snapshot \
  --region "$REGION" \
  --volume-id "$VOLUME_ID" \
  --description "$DESCRIPTION" \
  --tag-specifications "ResourceType=snapshot,Tags=[{Key=Name,Value=$DESCRIPTION},{Key=Retention,Value=$RETENTION_DAYS}]" \
  --query 'SnapshotId' \
  --output text)

echo "[$(date)] Snapshot request submitted: $SNAPSHOT_ID"

aws ec2 wait snapshot-completed \
  --region "$REGION" \
  --snapshot-ids "$SNAPSHOT_ID"

echo "[$(date)] Snapshot complete: $SNAPSHOT_ID"

# Delete snapshots beyond retention window
CUTOFF_DATE=$(date -d "$RETENTION_DAYS days ago" +%Y-%m-%dT%H:%M:%SZ)
OLD_SNAPSHOTS=$(aws ec2 describe-snapshots \
  --region "$REGION" \
  --filters "Name=tag:Retention,Values=$RETENTION_DAYS" \
  --query "Snapshots[?StartTime<'$CUTOFF_DATE'].SnapshotId" \
  --output text)

for snap in $OLD_SNAPSHOTS; do
  aws ec2 delete-snapshot --region "$REGION" --snapshot-id "$snap"
  echo "[$(date)] Deleted old snapshot: $snap"
done

echo "[$(date)] EBS snapshot backup complete"

6. Percona Backup for MongoDB (PBM) Deep Dive

mongodump excels in small environments; LVM/EBS snapshots shine for large single-node or replica set deployments. But when you need a consistent backup across multiple shards in a sharded cluster, both approaches hit a wall.

That is where Percona Backup for MongoDB (PBM) comes in. PBM is an open-source distributed backup solution for MongoDB replica sets and sharded clusters, offering enterprise-grade features at no cost. It is a separate tool from MongoDB's official Atlas backup.

Cluster-Wide Consistency

In a sharded cluster, each shard's backup may be captured at slightly different points in time. PBM uses idempotent Oplog updates to "fast-forward" every shard to the same timestamp, achieving cluster-wide consistency.

Supported Backup Types

TypeDescriptionCharacteristics
LogicalBSON data dumpPortable, slow for large data
Physical (Hot)Direct WiredTiger file copyVery fast, no downtime
SelectiveSpecific DB/collection onlyStorage-efficient, fast restore
Incremental PhysicalChanged blocks onlyCost-effective, shorter recovery

Storage backends include AWS S3, Azure Blob, Google Cloud Storage, and MinIO (on-premises S3-compatible). PBM also captures Oplog continuously for 24/7 PITR when combined with a base snapshot.


7. Installing and Configuring PBM

7.1 Installation (Ubuntu/Debian)

# Add Percona repository and install PBM
wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb
sudo dpkg -i percona-release_latest.generic_all.deb
sudo percona-release enable pbm release
sudo apt-get update
sudo apt-get install -y percona-backup-mongodb

7.2 Create a Dedicated PBM User in MongoDB

// Run in mongosh
use admin
db.createUser({
  user: "pbmUser",
  pwd: "pbmSecurePassword",
  roles: [
    { role: "readWrite",       db: "admin" },
    { role: "backup",          db: "admin" },
    { role: "restore",         db: "admin" },
    { role: "clusterMonitor",  db: "admin" },
    { role: "readAnyDatabase", db: "admin" }
  ]
})

7.3 PBM Configuration File (S3 Storage Example)

# /etc/pbm/pbm_config.yaml

storage:
  type: s3
  s3:
    region: ap-northeast-2
    bucket: my-mongodb-backups
    prefix: production-cluster
    credentials:
      access-key-id: YOUR_ACCESS_KEY
      secret-access-key: YOUR_SECRET_KEY
    serverSideEncryption:
      sseAlgorithm: aws:kms
      kmsKeyID: arn:aws:kms:ap-northeast-2:123456789:key/abcd-1234

pitr:
  enabled: true
  oplogSpanMin: 10    # Save an Oplog slice every 10 minutes

restore:
  batchSize: 500
  numInsertionWorkers: 10
# Apply and verify configuration
pbm config --file /etc/pbm/pbm_config.yaml
pbm config --list

7.4 Registering the pbm-agent Service

A pbm-agent must be running on every node in the replica set.

# Set environment variable (/etc/default/pbm-agent)
PBM_MONGODB_URI="mongodb://pbmUser:pbmSecurePassword@localhost:27017/?authSource=admin"

# Enable, start, and verify
sudo systemctl enable pbm-agent
sudo systemctl start pbm-agent
pbm status

8. PBM Backup and Restore Commands in Practice

8.1 Running a Backup

# Logical backup
pbm backup

# Physical (hot) backup — recommended for large DBs
pbm backup --type physical

# Incremental physical backup (only changed blocks after the first)
pbm backup --type incremental

# Selective backup for specific namespaces
pbm backup --ns="myDatabase.users,myDatabase.orders"

# List available backups
pbm list

8.2 Checking Backup Status

# Full cluster backup status
pbm status

# Detailed backup list
pbm list --full

# Example output:
# Snapshots:
#   2026-04-13T02:00:05Z [physical] <-- replset: rs0, rs1 | ok
#   2026-04-12T02:00:03Z [logical]  <-- replset: rs0, rs1 | ok
#
# PITR <on>:
#   2026-04-12T02:00:03Z - 2026-04-13T09:45:22Z

8.3 Running a Restore

# Restore to a specific snapshot
pbm restore 2026-04-13T02:00:05Z

# PITR — restore to a specific point in time (second-level precision)
# Example: a collection was accidentally dropped at 09:30:00 — restore to 1 minute before
pbm restore --time="2026-04-13T09:29:00"

# Selective restore for a specific namespace
pbm restore 2026-04-13T02:00:05Z --ns="myDatabase.users"

# Monitor restore progress
pbm logs --event restore

8.4 Automated Scheduled Backups with cron

# /etc/cron.d/pbm-backup

# Full physical backup every day at 2 AM
0 2 * * * root pbm backup --type physical >> /var/log/pbm-backup.log 2>&1

# PITR is always active via pitr.enabled: true in pbm_config.yaml — no separate cron needed

9. Backup Method Comparison — Choosing by Situation

The right question is not "which method is best" but "which situation am I in?"

CriteriamongodumpFilesystem SnapshotPBM
Backup speedSlow (hours for large DBs)Fast (seconds to minutes)Fast (physical backup)
Restore speedSlow (index rebuild required)FastFast
ConsistencyModerate (--oplog improves it)High (with journaling enabled)Very high
Sharded clusterLimitedDifficultFull support
PITR supportLimitedNot standaloneFull support
Storage efficiencyModerate (gzip available)Incremental (EBS)Incremental physical
Setup complexityEasyModerateModerate to complex
CostFreeInfrastructure costFree (open-source)
Recommended forSmall scale, migrationMid-large on-premisesMid-large, sharded clusters

10. Automation Pipeline: From Backup Execution to Verification and Alerting

The core of a solid backup strategy is automation. Manual procedures are vulnerable to human error and oversight, and a successful backup log is not proof that the data can actually be recovered.

10.1 Full PBM-Based Pipeline Flow

10.2 Automated Backup Integrity Verification Script

This script restores the latest backup to an isolated test instance and confirms the document count.

#!/bin/bash
# MongoDB backup integrity verification script (recommended weekly)

set -euo pipefail

TEST_MONGO_URI="mongodb://admin:password@test-mongo:27017/?authSource=admin"
ALERT_EMAIL="ops-team@company.com"
LOG_FILE="/var/log/mongodb-backup-verify.log"

log() {
  echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}

log "===== Backup integrity verification started ====="

# 1. Get the most recent backup ID
LATEST_BACKUP=$(pbm list --json 2>/dev/null | \
  python3 -c "import json,sys; data=json.load(sys.stdin); print(data['snapshots'][0]['name'])")

log "Target backup: $LATEST_BACKUP"

# 2. Restore to the test instance
log "Restoring to test instance..."
PBM_MONGODB_URI="$TEST_MONGO_URI" pbm restore "$LATEST_BACKUP" --wait

# 3. Verify document count in a key collection
log "Verifying restore result..."
DOC_COUNT=$(mongosh "$TEST_MONGO_URI" --quiet --eval \
  "db.getSiblingDB('myDatabase').users.countDocuments({})")

if [ "$DOC_COUNT" -gt 0 ]; then
  log "Verification passed: users collection document count = $DOC_COUNT"
else
  log "Verification FAILED: document count = 0"
  echo "MongoDB backup verification failed: $LATEST_BACKUP" | \
    mail -s "[ALERT] MongoDB Backup Verify Failed" "$ALERT_EMAIL"
  exit 1
fi

log "===== Backup integrity verification complete ====="

10.3 Slack Notification Integration

#!/bin/bash
# Send PBM backup results to a Slack Webhook

SLACK_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
BACKUP_LOG=$(pbm logs --event backup --tail 20 2>&1)

if echo "$BACKUP_LOG" | grep -q "error"; then
  STATUS="FAILED"
  COLOR="danger"
else
  STATUS="SUCCESS"
  COLOR="good"
fi

curl -s -X POST "$SLACK_WEBHOOK" \
  -H 'Content-type: application/json' \
  -d "{
    \"attachments\": [{
      \"color\": \"$COLOR\",
      \"title\": \"MongoDB Backup: $STATUS\",
      \"text\": \"$(date '+%Y-%m-%d %H:%M:%S')\",
      \"footer\": \"MongoDB Backup Monitor\"
    }]
  }"

11. Closing — What's Next in Part 3

Part 2 covered filesystem snapshots (LVM/EBS) for high-speed physical backup in large deployments, and Percona Backup for MongoDB (PBM) for solving consistency challenges in sharded clusters. It also walked through the decision criteria for choosing between methods and how to build an automation pipeline that extends to restore verification and alerting.

"An unautomated backup procedure will eventually fail."

Backup failures arrive without warning, and by then it is already too late. Scheduled execution alone is not enough — restore verification and alerting must be part of the pipeline before it qualifies as a real backup strategy.

Part 3 will cover MongoDB Atlas cloud backup for delegating all infrastructure management to the cloud, Point-in-Time Recovery (PITR) with second-level precision, and a practical disaster recovery (DR) checklist and Runbook based on real failure scenarios.

Share This Article

Series Navigation

MongoDB Backup & Recovery Guide

2 / 3 · 2

Explore this topic·Start with featured series

한국어

Follow new posts via RSS

Until the newsletter opens, RSS is the fastest way to get updates.

Open RSS Guide