Skip to main content
This isn’t about replacing testing, it’s about catching the bugs testing misses. Race conditions. Distributed system failures. Edge cases you didn’t think to test. TLA+ finds them by exploring every possible sequence of events, not just the ones you thought to check. It’s computationally expensive, but it is worth it. This guide is for complete beginners. By the end, you’ll (hopefully, if I wrote this well) understand why TLA+ matters and have written your first specifications. For comprehensive syntax details, continue to the TLA+ Syntax Guide that I wrote.

What is TLA+?

TLA+ is a formal specification language created by Leslie Lamport. It’s not a programming language, you don’t compile and run it. Instead, you describe what a system should do, and a tool called TLC explores every possible behavior to check if your properties hold. Major companies use TLA+ for critical systems:
  • Amazon Web Services used it to find subtle bugs in S3, DynamoDB, and other core services
  • Microsoft uses it for Azure and Xbox services
  • MongoDB verified their replication protocol
These aren’t academic exercises. AWS engineers reported that TLA+ found bugs they’d never have found through testing, bugs that could have caused data loss in production.

The Mental Model

Before diving into syntax, let’s build intuition. State: A state is a snapshot of your system at one moment. Think of it as a photograph: the light is on, the balance is $100, the door is locked. A state captures the value of every variable at one instant. Behavior: A behavior is a sequence of states, like the individual frames on a filmstrip. The system starts in some initial state and moves through states over time. Each step changes something. Specification: A TLA+ specification describes which behaviors are valid. It says: “Here are the starting conditions. Here are the allowed transitions. Here are the properties that must always hold.” It doesn’t say how the system works—just what it’s allowed to do. Model Checker: TLC is the model checker for TLA+. You give it a specification, and it explores every reachable state, checking your properties. If a property can be violated, TLC shows you exactly how: the sequence of steps that leads to the bug.

Example 1: A Light Switch

Let’s write a complete specification. This one models something familiar with minimal states, a light switch, but it introduces all the core concepts.
---------------------------- MODULE LightSwitch ----------------------------
VARIABLE light

Init == light = "off"

TurnOn ==
    /\ light = "off"
    /\ light' = "on"

TurnOff ==
    /\ light = "on"
    /\ light' = "off"

Next == TurnOn \/ TurnOff

TypeOK == light \in {"on", "off"}

Spec == Init /\ [][Next]_light
=============================================================================
Let me explain each part: VARIABLE light - We have one variable called light. It holds the current state of the switch. Init == light = "off" -The system starts with the light off. This defines valid initial states. TurnOn - This is an action. The /\ means “and.” This action says: “If the light is currently off (light = "off"), then in the next state, the light is on (light' = "on").” The prime (') means “the value in the next state.” TurnOff - Another action: if the light is on, it can become off. Next == TurnOn \/ TurnOff - The \/ means “or.” At each step, either TurnOn or TurnOff can happen. This defines all valid transitions. TypeOK == light \in {"on", "off"} - This is an invariant: a property that should always be true. The light should always be either “on” or “off”—never “broken” or “exploded” or anything else. Spec == Init /\ [][Next]_light - This ties it together: start in an Init state, then repeatedly take Next steps. The [] means “always.” The _light subscript allows “stuttering” steps where nothing changes. This might seem odd, but it’s essential when composing specifications. For now, treat it as boilerplate: list all your variables after the underscore. That’s it. A complete, runnable specification in about 15 lines. You can paste this into MacTLA or the TLA+ Toolbox and verify that TypeOK always holds.
You’ve just written your first TLA+ specification. The light switch is trivial, but it introduced every concept you need: variables, initial states, actions, invariants, and the Spec formula.

Example 2: The Bank Transfer Bug

The light switch used a single variable with two values. Real systems are more interesting, they have multiple interacting pieces. This example introduces functions (like dictionaries), which let us track balances for multiple accounts with a single variable. Now let’s see TLA+ catch a real bug, one that testing would likely miss. Imagine a simple bank with two accounts. We want to transfer money between them. Here’s a first attempt:
------------------------------ MODULE BankTransfer ------------------------------
EXTENDS Integers

VARIABLES balance

Accounts == {"alice", "bob"}

Init == balance = [a \in Accounts |-> 100]  \* Everyone starts with $100

Transfer(from, to, amount) ==
    /\ from /= to
    /\ balance[from] >= amount              \* Sufficient funds
    /\ balance' = [balance EXCEPT
                    ![from] = balance[from] - amount,
                    ![to] = balance[to] + amount]

Next == \E from, to \in Accounts : \E amt \in 1..50 : Transfer(from, to, amt)

\* Invariant: total money is conserved
TotalConserved == balance["alice"] + balance["bob"] = 200

\* Invariant: no overdrafts
NoOverdraft == \A a \in Accounts : balance[a] >= 0

Spec == Init /\ [][Next]_balance
=============================================================================
New concepts here: [a \in Accounts |-> 100] - This creates a function (like a dictionary or map). Each account maps to 100. balance[from] - Access the balance for a specific account. EXCEPT - Update specific entries in a function. ![from] = ... updates the from entry. \E from, to \in Accounts - “There exists” quantifier. Some from and to in Accounts. Run TLC on this specification. It will verify that TotalConserved and NoOverdraft always hold. Money is neither created nor destroyed. But wait, this specification models atomic transfers. What if transfers aren’t atomic?

The Concurrent Bug

In real systems, a transfer might involve multiple steps: read both balances, compute new values, write them back. If two transfers happen concurrently, they can interleave/intersect badly. Let’s model this:
--------------------------- MODULE BankTransferBuggy ---------------------------
EXTENDS Integers

VARIABLES balance, pc, readAlice, readBob

Accounts == {"alice", "bob"}

Init ==
    /\ balance = [a \in Accounts |-> 100]
    /\ pc = "idle"
    /\ readAlice = 0
    /\ readBob = 0

\* Step 1: Read both balances
StartTransfer ==
    /\ pc = "idle"
    /\ readAlice' = balance["alice"]
    /\ readBob' = balance["bob"]
    /\ pc' = "read"
    /\ UNCHANGED balance

\* Step 2: Write updated balances (Alice sends 50 to Bob)
FinishTransfer ==
    /\ pc = "read"
    /\ balance' = [balance EXCEPT !["alice"] = readAlice - 50, !["bob"] = readBob + 50]
    /\ pc' = "idle"
    /\ UNCHANGED <<readAlice, readBob>>

Next == StartTransfer \/ FinishTransfer

TotalConserved == balance["alice"] + balance["bob"] = 200

Spec == Init /\ [][Next]_<<balance, pc, readAlice, readBob>>
=============================================================================
This models one transfer as two steps. TLC will still say TotalConserved holds. Why? Because there’s only one transfer in flight at a time. Now imagine two concurrent transfers. Transfer A reads, then Transfer B reads, then A writes, then B writes:
  1. A reads: Alice=100, Bob=100
  2. B reads: Alice=100, Bob=100
  3. A writes: Alice=50, Bob=150
  4. B writes: Alice=70, Bob=130 (B used stale values!)
The final balances are Alice=70, Bob=130. Total = 200. But what if both transfers moved money the same direction?
  1. A reads: Alice=100, Bob=100 (A will send 50 from Alice to Bob)
  2. B reads: Alice=100, Bob=100 (B will also send 50 from Alice to Bob)
  3. A writes: Alice=50, Bob=150
  4. B writes: Alice=50, Bob=150 (B overwrote A’s changes!)
Both transfers “succeeded,” but Alice only lost 50insteadof50 instead of 100. Money was created.
This is the classic read-modify-write race condition. TLA+ finds it by exploring all possible interleavings. Testing might run millions of times without hitting the exact interleaving that triggers the bug.
A proper fix uses transactions, locks, or compare-and-swap.
The specification above models one transfer at a time, so TLC won’t find the bug. To see TLC catch the race condition, you’d extend it to model two concurrent transfers with separate program counters and read variables for each. That’s a great exercise once you’ve finished the Syntax Guide.

Example 3: A Taste of Real Systems

Here’s a simplified version of Two-Phase Commit, a protocol used by distributed databases to ensure all nodes agree on whether to commit or abort a transaction.
-------------------------- MODULE TwoPhaseCommit --------------------------
CONSTANT Nodes                    \* Set of participating nodes

VARIABLES nodeState, coordinator

Init ==
    /\ nodeState = [n \in Nodes |-> "working"]
    /\ coordinator = "waiting"

\* A node votes to prepare
Prepare(n) ==
    /\ nodeState[n] = "working"
    /\ nodeState' = [nodeState EXCEPT ![n] = "prepared"]
    /\ UNCHANGED coordinator

\* Coordinator commits if all nodes prepared
Commit ==
    /\ coordinator = "waiting"
    /\ \A n \in Nodes : nodeState[n] = "prepared"
    /\ coordinator' = "committed"
    /\ nodeState' = [n \in Nodes |-> "committed"]

\* Coordinator aborts
Abort ==
    /\ coordinator = "waiting"
    /\ coordinator' = "aborted"
    /\ nodeState' = [n \in Nodes |-> "aborted"]

Next ==
    \/ \E n \in Nodes : Prepare(n)
    \/ Commit
    \/ Abort

\* Safety: nodes never disagree
Consistency ==
    ~(\E n1, n2 \in Nodes :
        nodeState[n1] = "committed" /\ nodeState[n2] = "aborted")

Spec == Init /\ [][Next]_<<nodeState, coordinator>>
=============================================================================
Don’t worry if you don’t fully understand this yet. The point is: this 30-line specification models a real protocol running in production databases worldwide. TLC can verify that Consistency holds for configurations you specify, let’s say its 3 or 5 nodes, exploring every possible interleaving of events. Once you’re comfortable with the basics, you’ll be able to read and modify specifications like this. You’ll find bugs in your designs before you write code.

Core Vocabulary

Let’s solidify the concepts you’ve learned:
TermMeaning
StateA snapshot: the values of all variables at one moment
BehaviorA sequence of states (the system’s “history”)
VariableSomething that can change between states
ActionA formula describing a valid state transition
InvariantA property that must be true in every reachable state
InitA formula describing valid initial states
NextA formula describing all valid transitions
SpecThe complete specification: Init plus allowed steps
TLCThe model checker that explores all behaviors
Key notation:
  • x' — The value of x in the next state
  • /\ — And (both must be true)
  • \/ — Or (either can be true)
  • == — Definition (this IS that)
  • = — Equality test (is this equal to that?)
  • [] — Always (in every state)
  • \A — For all
  • \E — There exists

Where to Go Next

You’ve learned the core concepts: states, actions, invariants, and how TLC finds bugs by exploring all possibilities. You’ve seen three examples, from trivial to realistic. Next steps:
  1. TLA+ Syntax Guide - Comprehensive coverage of sets, functions, temporal operators, and everything you need for real specifications. You’ll learn to write the concurrent bank transfer model and understand every line of Two-Phase Commit.
  2. MacTLA - Native macOS tool I wrote for running TLA specifications. Try the examples from this guide. See what happens.
  3. Learn TLA+ - Hillel Wayne’s excellent interactive tutorial with exercises.
  4. TLA+ Video Course - Leslie Lamport’s lectures explaining the theory.
The goal isn’t to formally verify every line of code. It’s to find the subtle bugs in your designs, the race conditions, the edge cases, the failure modes you didn’t think to test. Write a small specification, run TLC, and see what it finds.