What is TLA+?
TLA+ is a formal specification language created by Leslie Lamport. It’s not a programming language, you don’t compile and run it. Instead, you describe what a system should do, and a tool called TLC explores every possible behavior to check if your properties hold. Major companies use TLA+ for critical systems:- Amazon Web Services used it to find subtle bugs in S3, DynamoDB, and other core services
- Microsoft uses it for Azure and Xbox services
- MongoDB verified their replication protocol
The Mental Model
Before diving into syntax, let’s build intuition. State: A state is a snapshot of your system at one moment. Think of it as a photograph: the light is on, the balance is $100, the door is locked. A state captures the value of every variable at one instant. Behavior: A behavior is a sequence of states, like the individual frames on a filmstrip. The system starts in some initial state and moves through states over time. Each step changes something. Specification: A TLA+ specification describes which behaviors are valid. It says: “Here are the starting conditions. Here are the allowed transitions. Here are the properties that must always hold.” It doesn’t say how the system works—just what it’s allowed to do. Model Checker: TLC is the model checker for TLA+. You give it a specification, and it explores every reachable state, checking your properties. If a property can be violated, TLC shows you exactly how: the sequence of steps that leads to the bug.Example 1: A Light Switch
Let’s write a complete specification. This one models something familiar with minimal states, a light switch, but it introduces all the core concepts.VARIABLE light - We have one variable called light. It holds the current state of the switch.
Init == light = "off" -The system starts with the light off. This defines valid initial states.
TurnOn - This is an action. The /\ means “and.” This action says: “If the light is currently off (light = "off"), then in the next state, the light is on (light' = "on").” The prime (') means “the value in the next state.”
TurnOff - Another action: if the light is on, it can become off.
Next == TurnOn \/ TurnOff - The \/ means “or.” At each step, either TurnOn or TurnOff can happen. This defines all valid transitions.
TypeOK == light \in {"on", "off"} - This is an invariant: a property that should always be true. The light should always be either “on” or “off”—never “broken” or “exploded” or anything else.
Spec == Init /\ [][Next]_light - This ties it together: start in an Init state, then repeatedly take Next steps. The [] means “always.” The _light subscript allows “stuttering” steps where nothing changes. This might seem odd, but it’s essential when composing specifications. For now, treat it as boilerplate: list all your variables after the underscore.
That’s it. A complete, runnable specification in about 15 lines. You can paste this into MacTLA or the TLA+ Toolbox and verify that TypeOK always holds.
Example 2: The Bank Transfer Bug
The light switch used a single variable with two values. Real systems are more interesting, they have multiple interacting pieces. This example introduces functions (like dictionaries), which let us track balances for multiple accounts with a single variable. Now let’s see TLA+ catch a real bug, one that testing would likely miss. Imagine a simple bank with two accounts. We want to transfer money between them. Here’s a first attempt:[a \in Accounts |-> 100] - This creates a function (like a dictionary or map). Each account maps to 100.
balance[from] - Access the balance for a specific account.
EXCEPT - Update specific entries in a function. ![from] = ... updates the from entry.
\E from, to \in Accounts - “There exists” quantifier. Some from and to in Accounts.
Run TLC on this specification. It will verify that TotalConserved and NoOverdraft always hold. Money is neither created nor destroyed.
But wait, this specification models atomic transfers. What if transfers aren’t atomic?
The Concurrent Bug
In real systems, a transfer might involve multiple steps: read both balances, compute new values, write them back. If two transfers happen concurrently, they can interleave/intersect badly. Let’s model this:TotalConserved holds. Why? Because there’s only one transfer in flight at a time.
Now imagine two concurrent transfers. Transfer A reads, then Transfer B reads, then A writes, then B writes:
- A reads: Alice=100, Bob=100
- B reads: Alice=100, Bob=100
- A writes: Alice=50, Bob=150
- B writes: Alice=70, Bob=130 (B used stale values!)
- A reads: Alice=100, Bob=100 (A will send 50 from Alice to Bob)
- B reads: Alice=100, Bob=100 (B will also send 50 from Alice to Bob)
- A writes: Alice=50, Bob=150
- B writes: Alice=50, Bob=150 (B overwrote A’s changes!)
This is the classic read-modify-write race condition. TLA+ finds it by exploring all possible interleavings. Testing might run millions of times without hitting the exact interleaving that triggers the bug.
Example 3: A Taste of Real Systems
Here’s a simplified version of Two-Phase Commit, a protocol used by distributed databases to ensure all nodes agree on whether to commit or abort a transaction.Consistency holds for configurations you specify, let’s say its 3 or 5 nodes, exploring every possible interleaving of events.
Once you’re comfortable with the basics, you’ll be able to read and modify specifications like this. You’ll find bugs in your designs before you write code.
Core Vocabulary
Let’s solidify the concepts you’ve learned:| Term | Meaning |
|---|---|
| State | A snapshot: the values of all variables at one moment |
| Behavior | A sequence of states (the system’s “history”) |
| Variable | Something that can change between states |
| Action | A formula describing a valid state transition |
| Invariant | A property that must be true in every reachable state |
| Init | A formula describing valid initial states |
| Next | A formula describing all valid transitions |
| Spec | The complete specification: Init plus allowed steps |
| TLC | The model checker that explores all behaviors |
x'— The value of x in the next state/\— And (both must be true)\/— Or (either can be true)==— Definition (this IS that)=— Equality test (is this equal to that?)[]— Always (in every state)\A— For all\E— There exists
Where to Go Next
You’ve learned the core concepts: states, actions, invariants, and how TLC finds bugs by exploring all possibilities. You’ve seen three examples, from trivial to realistic. Next steps:- TLA+ Syntax Guide - Comprehensive coverage of sets, functions, temporal operators, and everything you need for real specifications. You’ll learn to write the concurrent bank transfer model and understand every line of Two-Phase Commit.
- MacTLA - Native macOS tool I wrote for running TLA specifications. Try the examples from this guide. See what happens.
- Learn TLA+ - Hillel Wayne’s excellent interactive tutorial with exercises.
- TLA+ Video Course - Leslie Lamport’s lectures explaining the theory.