Post

Decompiler Construction: Chapter 8 - Designing Safe Optimization Passes

Decompiler Construction: Chapter 8 - Designing Safe Optimization Passes

Optimization passes transform IR into a simpler, more canonical form without altering program semantics. The core philosophy of a pass is to reduce code and expressions to their minimal canonical form, applying transformations iteratively until a fixed point is reached (i.e., no further changes occur). Any incorrect or unsafe transformations will corrupt the output. Every pass must be deterministic and convergent.

Pass Design Principles

All passes must obey the following: Semantic Preservation A transformation must not change behavior. Each transformation must strictly simplify or canonicalize the IR. Passes must not expand expressions or introduce instability. Determinism Given the same IR, a pass must always produce the same output, repeatable. Convergent Repeated application must terminate. Oscillating transformations are invalid.

Model

Passes are applied iteratively, obeying any flags it may have, pseudo-code:

1
2
3
4
5
6
7
8
repeat:
    changed = false;

    for each pass P:
        if P modifies IR and (flags(P) == true or flag_count(P) == 0):
            changed = true;

until changed == false;

Arithmetic Safety

Arithmetic transformations are not always safe due to undefined behavior and architecture-specific semantics, which may be lost from lifting to IL.

Pass environment flags must guard all arithmetic optimizations.

Unsafe Cases

Division by zero Floating-point precision changes Reordering non-associative operations

Example

  • (R1 + 0) -> R1; // SAFE
  • (R1 * 1) -> R1; // SAFE
  • (R1 - R1) -> 0; // Unsafe if NaN or overflow possible

If safety cannot be proven, the transformation is NOT SAFE.

Categories of Passes

Passes are organized into categories. Here are the most common ones:

Constant Folding

Calculate values

1
R1 = 2 + 3;

Must respect arithmetic safety rules.

1
R1 = 5;

Constant Propagation

Replace variables with known values (IF SAFE AND SUBEXPRESSIONS ARE SAFE)

1
2
R1 = 5;
R2 = R1 + 2;
1
R2 = 5 + 2;

Dead Code Elimination

Remove unused code

1
R1 = 5; // if R1 is never used -> remove

Must ensure no side effects are lost, MUST BE SAFE.

Copy Propagation

Eliminate redundant assignments, utilize SSA:

1
2
R2 = R1;
R3 = R2;
1
R3 = R1;

Control Flow Simplification

Simplify CFG structure:

  • Remove unreachable blocks
  • Collapse trivial branches
  • Inline single-entry/single-exit blocks

Correctness

A pass must refuse to transform if:

  • Type information is ambiguous
  • Safety cannot be proven
  • Control flow dependencies are unclear
  • Side effects present

When in doubt, do nothing.. Incorrect simplification is worse than no simplification.

Possible Failures

Poorly designed passes lead to:

  • Non-termination (oscillation between forms)
  • Incorrect output (semantic corruption)
  • Unsafe Over-simplification (loss of structure)

Example of oscillation: R1 << 1; <-> R1 * 2;


Next Chapter: Chapter 9 - Coverage-Guided Fuzzing for Validation and Symbollic Execution

Prev Chapter: Chapter 7 - IReconstructing Control Flow using Graph Theory, SSA, and Types

This post is licensed under CC BY 4.0 by the author.