Three Approaches to Binary Analysis
Before diving into formal methods specifically, it helps to understand where they fit in the broader landscape. Binary analysts typically work with three complementary approaches, each with distinct tradeoffs. Static analysis examines code without executing it. Tools like Ghidra, IDA Pro, Binary Ninja, and radare2 disassemble binaries, recover control flow graphs, and identify patterns. Static analysis is fast and sees all code in the binary, but it can’t resolve values computed at runtime. Indirect jumps, dynamically unpacked code, and heavily obfuscated binaries can defeat it. Dynamic analysis runs the program and observes its behavior. Debuggers (GDB, x64dbg), tracers (strace, ltrace), fuzzers (AFL, libFuzzer), and instrumentation frameworks (DynamoRIO, Frida) fall into this category. Dynamic analysis sees actual runtime values and naturally handles obfuscation, code has to do what it does. The tradeoff: you only see paths that actually execute. Edge cases, error handlers, and paths requiring specific inputs may never run. Formal methods use mathematical reasoning to analyze program behavior. Rather than running concrete inputs or statically pattern-matching, formal techniques like symbolic execution reason about what could happen across classes of inputs. Symbolic execution frameworks like angr and Triton, backed by SMT solvers like Z3, can answer questions like “does any input reach this code?” or “what input satisfies these constraints?” The cost is computational: path explosion limits scalability, and modeling external environments (syscalls, libraries) adds complexity.| Aspect | Static | Dynamic | Formal |
|---|---|---|---|
| Execution | None | Concrete runs | Symbolic/abstract |
| Coverage | All visible code | Executed paths only | Explored paths |
| Scalability | Excellent | Good | Limited by path explosion |
| False positives | Higher | Lower | Varies |
| False negatives | Lower | Higher | Varies |
| Best for | Triage, structure | Behavior, values | Constraint solving, proofs |
The Core Idea
Formal modeling for binary analysis treats programs as mathematical objects. Rather than running a program with specific inputs and observing what happens, we analyze what could happen across all possible inputs, or ask the computer to find inputs that produce specific behaviors. Three techniques form the foundation of this approach.Symbolic Execution
Symbolic execution runs a program with symbolic values instead of concrete inputs. Instead of saying “the user entered ‘password123’”, we say “the user entered some string X”. As the program executes, we track how X flows through computations and what constraints accumulate. When the program branches on a condition involving X, we explore both paths. One path records that the condition must be true; the other records that it must be false. These accumulated constraints form the path condition a logical formula describing exactly what inputs reach each program point. The key insight: instead of testing one input at a time, symbolic execution reasons about entire classes of inputs simultaneously.SMT Solving
Satisfiability Modulo Theories (SMT) solving takes those accumulated constraints and finds concrete values that satisfy them, or proves no such values exist. The “modulo theories” part is important. Unlike pure boolean satisfiability (SAT), SMT solvers understand the semantics of operations. A bit-vector theory knows that0xFF + 1 = 0x00 for 8-bit values. An array theory understands indexing and updates. This domain knowledge makes SMT solvers far more powerful than raw SAT solvers for program analysis.
Modern SMT solvers like Z3 can handle complex constraints over integers, bit-vectors, arrays, and strings. When symbolic execution discovers a path to interesting code, the SMT solver produces an actual input that reaches it.
Abstract Interpretation
Abstract interpretation approximates program behavior to make analysis tractable for larger programs. Instead of tracking exact values, we track properties: “this variable is positive”, “this pointer points to heap memory”, “this value came from user input”. The critical property of abstract interpretation (when used for verification) is soundness: it provides over-approximations that never miss real behaviors. If abstract interpretation says a variable might be negative, it could be negative in some execution. If it says a variable is always positive, that’s guaranteed. The tradeoff is precision. Sound over-approximating analyses may report that something is possible when it actually isn’t (false positives), but they won’t miss real behaviors (no false negatives). This property makes abstract interpretation suitable for proving absence of bugs, not just finding them.A Concrete Example
Consider a program that checks if an input matches a transformed password:x = input XOR 0x005Ay = rotate_left(x, 3)- The success path requires
y == 0x1337
y = 0x1337meansx = rotate_right(0x1337, 3) = 0xE266(for 16-bit rotation)x = input XOR 0x005Ameansinput = 0xE266 XOR 0x005A = 0xE23C
input = 0xE23C. No manual arithmetic needed—and unlike manual analysis, this scales to far more complex constraint systems.