Decompiler Construction: Chapter 10 - Dead Code Elimination and IR Canonicalization
Before performing any transformation, you need to make sure every mutable statement and expression is safe, ALWAYS CHECK ENVIRONMENT FLAGS!!!
Dead Code Elimination
Removes statements that do not affect the observable program trace. This is equivalent to removing statements that do not contribute to any value that is reachable through a live path in the CFG.
Using the execution model T = (S, M, O), a statement is considered dead if its removal does not change O, and does not affect S or M under the equivalence relation defined in Chapter 9.
A statement is dead if:
- Its result is not used by any live computation
- It has no side effects on M or O
Common side effects include:
- Memory read/writes
- Unintended control flow changes
- Unsafe abstraction elimination
- Elimination of non-primitive statements with undertiministic behaviours
** ALWAYS CHECK FOR THESE. EACH TRANSFORMATION NEEDS TO BE 100% SAFE **
IR Canonicalization
Transforms equivalent programs into a normalized form while preserving semantics over: T = (S, M, O).
Canonicalization enforces a deterministic structure on expressions and statements. Examples of these include:
Expression ordering
r1 = 1 + r1->r1 = r1 + 1(ANY ARITHMETIC TYPES SHOULD BE COMMUTATIVE)r1 = (a + b) + c->r1 = c + (a + b)
Boolean Simplification
(r1 == r2) ? true : false->r1 == r2(true) ? true : r1()->true(NO VOLATILE/UNSAFE EXPRESSIONS) ? r1() : r1()->r1()
Branching simplification
Talked in detail in Chapter 11
Two IRs are equivalent if their canonical forms are proven identical under structural isomorphism.
Expression Equivalence Checking
This ties into Chapter 16 about deobfuscation. Use Z3 library for checking satisfiability. Example:
1
2
(a + b) + c
a + (b + c)
We will prove they are equivalent using Z3. We convert IR -> Z3 Symbollic Form.
1
(a + b) + c != a + (b + c)
Check if SAT
If UNSAT -> Equivalent
Constraints
All transformations must preserve equivalence over the execution trace model defined in Chapter 9. It is impossible to simplify every expression.
Next Chapter: Chapter 11 - Control Flow Recovery and Branch Simplification
Prev Chapter: Chapter 9 - Coverage-Guided Fuzzing for Validation and Symbollic Execution