Decompiler Construction: Chapter 9 - Coverage-Guided Fuzzing for Validation and Symbolic Execution
Assume all your passes are wrong and have never been proven.
The idea is to find failures before someone else does, pass fail at the smallest scale first, and snowball into catastrophic behaviours.
Some of these include:
- Mis-reconstructed control flow
- Incorrect SSA transformations
- Undefined behavior misinterpretation
- Incorrect type recovery
- Silent arithmetic mismatches
Standard tests do not always catch these; your best strategy is using coverage-guided fuzzing.
Core Idea
Instead of randomly generating inputs, a fuzzer:
- Generates an input program
- Runs it through the passes
- Measures coverage
- Keeps inputs that explore new paths
- Mutates them to go deeper
Over time, it systematically explores edge cases.
Minimum Viable Target
Your fuzzing loop should look like this:
If behavior differs -> Bug!!!!.
Symbolic Execution
Symbolic execution runs programs with symbols instead of concrete values, tracks constraints instead of actual data. It should be as optimal and fast as possible
- Inputs become variables (e.g.,
x = X) - Branches add constraints (e.g.
if (x > 10)->X > 10, else ->X <= 10 - Each path stores: path constraints, symbolic expressions for the state
A good symbolic execution library you could utilize: Z3.
Purpose
Solves for inputs that reach specific paths or behaviors by solving constraints.
In validation
Used to detect mismatches between original and transformed IR:
Diverging path conditions Impossible constraints after transformation Missing or extra execution paths
Limitation
Path explosion (exponential growth) makes full execution infeasible; it requires bounding or selective use. In practice, it is applied with bounded depth or selective path exploration.
Oracle
An oracle determines whether the results are correct.
This means comparing behaviors in this pseudo-code
1
2
3
4
5
6
7
State_1 = symbolic_execution(original_IR);
Transformed = original_IR;
For every pass as p:
Transformed = p(Transformed);
if not equivalent(State_1, symbollic_execution(Transformed)):
equivalence is defined over satisfiable path constraints and observable state
Inputs must avoid or filter undefined behavior, or the results are meaningless. Comparisons must define what differences are allowed (if any)
Additional Checks (Should consider)
SSA invariants hold CFG is valid (no broken edges or dominance violations) Type logic remains consistent
A weak oracle hides bugs. A strong oracle exposes bugs
Model
A program is modeled as an input-output system operating over an internal state.
An execution produces a trace:
T = (S, M, O)
- S: final machine state (registers)
- M: observable memory/abstracted states
- O: observable outputs (effects)
Equivalence between executions is defined over equality of O and agreement of S and M under allowed abstraction.
Symbolic execution operates over symbolic states and path constraints, where a path is valid only if its constraints are satisfiable.
All comparisons in the oracle are performed over these trace and constraint representations.
Next Chapter: Chapter 10 - Dead Code Elimination and IR Canonicalization
Prev Chapter: Chapter 8 - Designing Safe Optimization Passes