Friday, April 17, 2026
All posts
Lv.3 IntermediateMongoDB
32 min readLv.3 Intermediate
SeriesMongoDB ACID Mastery · Part 4/5View series hub

MongoDB ACID — Part 4: Durability — journaling, checkpoints, and Write Concern

MongoDB ACID — Part 4: Durability — journaling, checkpoints, and Write Concern

Durability — the D in ACID — asks whether a write that received a commit acknowledgment will still be there after power loss or a crash. This Part 4 walks through the gap between memory and disk, how WAL-style journaling and WiredTiger checkpoints fit together, and how single-node crash recovery replays the journal after the last checkpoint. For replica sets, Write Concern’s w, j, and wtimeout change when the driver may return success and what replication/journal bar you are actually waiting for — including why the default moved toward majority, why a wtimeout is not the same as “no rollback,” and why writeConcernMajorityJournalDefault matters. Secondary behavior and defaults vary by version and topology, so treat this as a conceptual map and verify against the manual for the build you run.

Series outline

Table of contents

  1. Introduction
  2. Why durability is hard — the memory vs disk gap
  3. WAL — Write-Ahead Logging
  4. WiredTiger journal — single-node durability
  5. Checkpoints — periodic snapshots
  6. Journal + checkpoints — crash recovery
  7. Write Concern — durability in a distributed replica set
  8. Write Concern options — w, j, wtimeout
  9. Version-specific behavior (reference)
  10. Practical durability settings
  11. Failure scenarios and topology
  12. Closing

1. Introduction

In Part 3 we focused on isolation and Read Concern — what concurrent transactions can see. This article covers Durability: whether a write that received a successful commit response will still be there after events like power loss or process crashes.

Operationally, durability spans WiredTiger journaling and checkpoints on a single mongod, and replication with Write Concern in a replica set. In one sentence:

Before claiming “done,” the system first records changes in a log (WAL-style), then — depending on topology — uses Write Concern to decide how many nodes must acknowledge the write.

We start with single-node journaling and checkpoints, then split what writeConcern means in a replica set. Fine-grained behavior and defaults change by major version; treat the following as a mental model and always cross-check the manual for your deployment.


2. Why durability is hard — the memory vs disk gap

CPUs and RAM are fast but volatile; SSDs and disks are slower but retain data without power. Databases keep hot working sets in RAM; if you acknowledge “persisted” while data is only in memory, you can lose work on failure.

If you delay responses until every change is fsync’d to disk, you are safer but latency rises. Durability engineering trades speed and safety; MongoDB relies heavily on append-only logs and periodic checkpoints.


3. WAL — Write-Ahead Logging

WAL (Write-Ahead Logging) boils down to:

Before mutating durable data files, append the change to a log (journal).

Data file updates may involve random I/O; appending to a log is closer to sequential I/O. After a crash, the server can replay the log to reconcile on-disk data and cache state.

A coarse write path looks like this:

  1. The client sends an update.
  2. The storage engine updates the in-memory cache.
  3. The change is written to the journal (WAL).
  4. Whether the client may receive “done” after step 3 depends on writeConcern and journal flush policy (§§7–8).
  5. A later checkpoint flushes dirty pages to data files.

So “we always reply right after step 3” is not a global default. Ack timing depends on w, j, wtimeout, and topology (standalone vs replica set). Use §§3–6 for structure; use §§7–8 plus the manual for actual guarantees.

The diagram below is illustrative. You can only treat “ack after durable journal” as guaranteed when j: true (and the rest of the write concern) is in play; default connection settings may differ.


4. WiredTiger journal — single-node durability

MongoDB’s default storage engine WiredTiger uses a journal. Under the data directory you will see a journal/ path (or equivalent); file naming and rotation depend on version and settings.

4.1 What goes into the journal

Journal records carry enough information to redo page-level changes after a crash. Internal record layout is an implementation detail; operators usually care whether recovery can find the journal, whether disks are full, and how this interacts with backups.

4.2 Flush behavior and writeConcern

The journal may sit in a memory buffer before flushing to disk. Flush timing is influenced by parameters such as storage.journal.commitIntervalMs and by client j: true (wait until the journal is durable on disk).

FactorSummary
j: trueAsk to delay the response until the journal is durable on that member
journalCommitIntervalMsHow often the server may group journal commits (ms); lower values can mean more I/O
w: "majority"In a replica set, combines with majority acknowledgment rules; meaning also depends on writeConcernMajorityJournalDefault

⚠️ While buffered only: if power is lost before the journal hits disk, recent writes may be lost. If you need “safe” acknowledgments, design j and w explicitly.

4.3 Disabling journaling

Older versions allowed turning journaling off in some configurations. Recent major releases disallow or strongly restrict disabling journaling for normal replica set members. See Manage Journaling and release notes for the exact version you run.


5. Checkpoints — periodic snapshots

If the journal is a fast log of recent changes, a checkpoint is a consistent snapshot that flushes dirty cache pages to data files. MongoDB/WiredTiger runs checkpoints on a schedule; storage.syncPeriodSecs controls how often (seconds — check the manual for defaults).

After a checkpoint:

  • Data files can reflect state up to that point in a consistent way.
  • Older journal files may become eligible for cleanup (policy/version dependent).

If a crash happens during checkpoint creation, metadata updates are designed so the previous checkpoint remains valid and recovery can still proceed.


6. Journal + checkpoints — crash recovery

During normal operation:

  • Up to checkpoint T₀, data files are stable.
  • After T₀, newer changes live in the journal and cache until the next checkpoint.

If the process dies between T₀ and the next checkpoint, restart typically:

  1. Locates the last valid checkpoint.
  2. Replays journal records after that point in order.
  3. Opens for traffic.

The window where the journal is not yet on disk remains a risk; mitigate with j: true and/or appropriate w settings and server parameters.


7. Write Concern — durability in a distributed replica set

On a replica set, one primary’s journal is not enough. If the primary acknowledges a write before it is replicated and then fails, a new primary may roll back writes that never reached a majority of data-bearing nodes.

Write Concern is the contract for how many members must acknowledge a write before the driver gets success. The main knobs are:

{
  w: "majority", // or a number, etc.
  j: true, // whether to require journal durability where applicable
  wtimeout: 5000, // ms; 0 can mean “wait indefinitely” in practice
}

How majority acknowledgment lines up with Primary / Secondaries varies by deployment. The diagram below is schematic only.


8. Write Concern options — w, j, wtimeout

8.1 w

  • w: 0: Effectively unacknowledged / fire-and-forget from the client’s perspective. Only for very narrow cases where loss is acceptable.
  • w: 1: Often “primary acknowledged,” but does not guarantee replication to secondaries. Writes that exist only on the former primary can be lost on failover.
  • w: "majority": Wait until a majority of voting members meet the concern. Since MongoDB 5.0 the default write concern moved toward majority to reduce surprise data loss when people left w:1 as default.

PSA topologies (Primary–Secondary–Arbiter) deserve special care: a majority of voters is not the same as a majority of data-bearing nodes. Do not read “majority” the same way as in a three-data-node replica set.

8.2 j

j: true asks that, for members participating in the concern, the write is journaled to durable storage before acknowledgment. If omitted, connection defaults and w combine to define behavior.

writeConcernMajorityJournalDefault: when true (default in many clusters), w: "majority" can imply that majority nodes have journaled the write on disk — check your mongod parameters. This article avoids equating “majority” with a single fixed journal meaning without that context.

8.3 wtimeout

wtimeout bounds how long the client waits for the requested concern. Important nuances:

  • A timeout means “could not confirm the concern in time”, not necessarily “the write never hit the primary.” The write may already be on the primary while replication lags.
  • Conversely, if a write never reached majority and the primary fails, it may still be rolled back later. So timeout ≠ “no rollback” and acknowledgment ≠ “durable across failover” unless your concern actually matched your RPO goals.

Production setups usually set a finite wtimeout to avoid hanging forever when secondaries stall.


9. Version-specific behavior (reference)

Major releases can change secondary apply order, majority acknowledgment paths, and performance characteristics. This article does not pin a specific internal pipeline to a version number.

For the build you run, start with:


10. Practical durability settings

10.1 Closer to finance / payments

const wcFinancial = {
  w: "majority",
  j: true,
  wtimeout: 10000,
};

await db.collection("ledger").insertOne(doc, { writeConcern: wcFinancial });

Pair with readConcern: "majority" (from Part 3) when you want reads that avoid rolled-back data, subject to the same topology caveats.

10.2 Typical web services

Often the default majority plus a reasonable wtimeout is enough. If you have PSA, cross-region secondaries, or slow replicas, majority acknowledgments may time out — watch SLOs and replication lag.

10.3 Logs / analytics (loss tolerable)

await db.collection("events").insertOne(
  { type: "click", t: new Date() },
  { writeConcern: { w: 1, j: false } }
);

Use patterns like w: 0 only when you truly accept blind loss and have another observability story.

10.4 Server parameters (sketch)

Example mongod.conf fragment — verify values for your version:

storage:
  journal:
    commitIntervalMs: 100
  wiredTiger:
    engineConfig:
      cacheSizeGB: 4

Lowering commitIntervalMs can increase journal flush frequency and I/O load.


11. Failure scenarios and topology

The table below is a rough guide; real behavior depends on member counts, arbiters, election rules, and network partitions.

Scenariow: 0w: 1, j: falsew: 1, j: truew: "majority" (typical 3 data nodes)
Primary process crash (single-node view)very riskyriskybetter if journaledstronger when replication concern is met
Primary failoverunreplicated writes risksame class of riskprimary-only risk remainsmitigates many common single-node failures

w: "majority" does not solve simultaneous loss of a majority of members or certain split-brain extremes — those belong to infrastructure design and RPO/RTO planning.


12. Closing

Single node (WiredTiger): memory changes → journal (WAL-style)checkpoints to data files; after a crash, replay the journal from after the last checkpoint.

Replica set: Write Concern defines how many members must acknowledge a write; after failover, writes that never reached your chosen bar may be rolled back.

Part 5 will tie patterns, tuning, and anti-patterns together. For day-to-day operations, read readConcern and writeConcern alongside Part 2 and Part 3 — they are one operational bundle.


References


April 2026 — defaults and driver behavior change over time; cite the manual for the version you deploy.

Share This Article

Series Navigation

MongoDB ACID Mastery

4 / 5 · 5

Explore this topic·Start with featured series

한국어

Follow new posts via RSS

Until the newsletter opens, RSS is the fastest way to get updates.

Open RSS Guide