Post

Decompiler Construction: Chapter 9 - Coverage-Guided Fuzzing for Validation and Symbolic Execution

Decompiler Construction: Chapter 9 - Coverage-Guided Fuzzing for Validation and Symbolic Execution

Assume all your passes are wrong and have never been proven.

The idea is to find failures before someone else does, pass fail at the smallest scale first, and snowball into catastrophic behaviours.

Some of these include:

  • Mis-reconstructed control flow
  • Incorrect SSA transformations
  • Undefined behavior misinterpretation
  • Incorrect type recovery
  • Silent arithmetic mismatches

Standard tests do not always catch these; your best strategy is using coverage-guided fuzzing.

Core Idea

Instead of randomly generating inputs, a fuzzer:

  • Generates an input program
  • Runs it through the passes
  • Measures coverage
  • Keeps inputs that explore new paths
  • Mutates them to go deeper

Over time, it systematically explores edge cases.

Minimum Viable Target

Your fuzzing loop should look like this:

If behavior differs -> Bug!!!!.

Symbolic Execution

Symbolic execution runs programs with symbols instead of concrete values, tracks constraints instead of actual data. It should be as optimal and fast as possible

  • Inputs become variables (e.g., x = X)
  • Branches add constraints (e.g. if (x > 10) -> X > 10, else -> X <= 10
  • Each path stores: path constraints, symbolic expressions for the state

A good symbolic execution library you could utilize: Z3.

Purpose

Solves for inputs that reach specific paths or behaviors by solving constraints.

In validation

Used to detect mismatches between original and transformed IR:

Diverging path conditions Impossible constraints after transformation Missing or extra execution paths

Limitation

Path explosion (exponential growth) makes full execution infeasible; it requires bounding or selective use. In practice, it is applied with bounded depth or selective path exploration.

Oracle

An oracle determines whether the results are correct.

This means comparing behaviors in this pseudo-code

1
2
3
4
5
6
7
State_1 = symbolic_execution(original_IR);

Transformed = original_IR;
For every pass as p:
	Transformed = p(Transformed);
	if not equivalent(State_1, symbollic_execution(Transformed)):
	 	equivalence is defined over satisfiable path constraints and observable state

Inputs must avoid or filter undefined behavior, or the results are meaningless. Comparisons must define what differences are allowed (if any)

Additional Checks (Should consider)

SSA invariants hold CFG is valid (no broken edges or dominance violations) Type logic remains consistent

A weak oracle hides bugs. A strong oracle exposes bugs

Model

A program is modeled as an input-output system operating over an internal state.

An execution produces a trace:

T = (S, M, O)

  • S: final machine state (registers)
  • M: observable memory/abstracted states
  • O: observable outputs (effects)

Equivalence between executions is defined over equality of O and agreement of S and M under allowed abstraction.

Symbolic execution operates over symbolic states and path constraints, where a path is valid only if its constraints are satisfiable.

All comparisons in the oracle are performed over these trace and constraint representations.


Next Chapter: Chapter 10 - Dead Code Elimination and IR Canonicalization

Prev Chapter: Chapter 8 - Designing Safe Optimization Passes

This post is licensed under CC BY 4.0 by the author.