Post

Decompiler Construction: Chapter 10 - Dead Code Elimination and IR Canonicalization

Decompiler Construction: Chapter 10 - Dead Code Elimination and IR Canonicalization

Before performing any transformation, you need to make sure every mutable statement and expression is safe, ALWAYS CHECK ENVIRONMENT FLAGS!!!

Dead Code Elimination

Removes statements that do not affect the observable program trace. This is equivalent to removing statements that do not contribute to any value that is reachable through a live path in the CFG.

Using the execution model T = (S, M, O), a statement is considered dead if its removal does not change O, and does not affect S or M under the equivalence relation defined in Chapter 9.

A statement is dead if:

  • Its result is not used by any live computation
  • It has no side effects on M or O

Common side effects include:

  • Memory read/writes
  • Unintended control flow changes
  • Unsafe abstraction elimination
  • Elimination of non-primitive statements with undertiministic behaviours

** ALWAYS CHECK FOR THESE. EACH TRANSFORMATION NEEDS TO BE 100% SAFE **

IR Canonicalization

Transforms equivalent programs into a normalized form while preserving semantics over: T = (S, M, O).

Canonicalization enforces a deterministic structure on expressions and statements. Examples of these include:

Expression ordering

  • r1 = 1 + r1 -> r1 = r1 + 1 (ANY ARITHMETIC TYPES SHOULD BE COMMUTATIVE)
  • r1 = (a + b) + c -> r1 = c + (a + b)

Boolean Simplification

  • (r1 == r2) ? true : false -> r1 == r2
  • (true) ? true : r1() -> true
  • (NO VOLATILE/UNSAFE EXPRESSIONS) ? r1() : r1() -> r1()

Branching simplification

Talked in detail in Chapter 11

Two IRs are equivalent if their canonical forms are proven identical under structural isomorphism.

Expression Equivalence Checking

This ties into Chapter 16 about deobfuscation. Use Z3 library for checking satisfiability. Example:

1
2
(a + b) + c
a + (b + c)

We will prove they are equivalent using Z3. We convert IR -> Z3 Symbollic Form.

1
(a + b) + c != a + (b + c)

Check if SAT

If UNSAT -> Equivalent

Constraints

All transformations must preserve equivalence over the execution trace model defined in Chapter 9. It is impossible to simplify every expression.


Next Chapter: Chapter 11 - Control Flow Recovery and Branch Simplification

Prev Chapter: Chapter 9 - Coverage-Guided Fuzzing for Validation and Symbollic Execution

This post is licensed under CC BY 4.0 by the author.