Post

Decompiler Construction: Chapter 16 - Deobfuscation Mixed Boolean Arithmetic and Opaque Predicate Removal

Decompiler Construction: Chapter 16 - Deobfuscation Mixed Boolean Arithmetic and Opaque Predicate Removal

We will be utilizing Z3 SMT solver for this chapter.

All reasoning in this chapter assumes fixed-width bit-vector semantics (e.g., 32-bit, 64-bit) with wraparound behavior.*

Mixed Boolean Arithmetic

Mixed Boolean Arithmetic (MBA) obfuscation relies on expressing simple arithmetic operations using redundant combinations of bitwise and arithmetic identities while preserving semantics.

Because these transformations are not unique, simplification must be proven semantically rather than relying on pattern matching.

A minimal form is defined as a canonical representation that:

  • Uses the fewest operations
  • Minimizes expression depth
  • Prefers arithmetic operators over expanded boolean forms

Equivalence Checking

To safely simplify an expression E into F, we must prove:

1
E == F

This is done by encoding both expressions into Z3 and checking:

1
(E != F) is UNSAT

Only if this condition holds can the transformation be applied.

Common MBA Identities

While heuristics are unsafe, they are useful for generating minimal forms before SMT validation.

Linear Reconstruction (Addition Patterns)

Many MBAs expand addition into XOR + carries:

1
(x ^ y) + ((x & y) << 1)
1
x + y

Reduction strategy:

  • Detect XOR/AND carry decomposition patterns
  • Reconstruct the addition operator
  • Verify with SMT equivalence check

XOR Linear Forms

1
(x ^ y ^ x)
1
y

Reduction strategy:

  • Normalize XOR chains using associativity and self-cancellation:

    • x ^ x = 0
    • x ^ 0 = x

Constant Mask Folding

1
(x & 0xFF) | (x & ~0xFF)
1
x

Reduction strategy:

  • Perform bit-mask partition analysis
  • Ensure masks fully cover the domain
  • Validate with SMT when non-trivial

Boolean Collapse Patterns

1
(x | ~x)
1
-1
1
(x & ~x)
1
0

These must respect the bit-width of the expression.

Common MBA Patterns

  • x+(x*y) -> x*(y+1)
  • x+(y*x) -> x*(y+1)
  • (x*y)+x -> x*(y+1)
  • (y*x)+x -> x*(y+1)
  • (x*y)+(z*x) -> x*(y+z)
  • (y*x)+(x*z) -> x*(y+z)
  • x+(y+(x*w)) -> y+(x*(w+1))
  • (x*y)+((~x)*y) -> -y
  • (x|(~y))&((-x)|y) -> x
  • (~(x))+1 -> -x
  • ((x*y)+((~x)*(-y))) -> ((2*x)+1)*y

All non-trivial patterns must be verified with SMT before applying.

Opaque Predicate Removal

An opaque predicate is a condition that always evaluates to a constant (true or false) but is constructed to appear dynamic.

To remove an opaque predicate:

  1. Encode the condition C into Z3
  2. Check the satisfiability of both branches:
1
2
C == true
C == false
  1. If one branch is UNSAT:

    • Remove the unreachable branch
    • Simplify the CFG accordingly

Example:

1
2
3
if ((x ^ x) == 0) {
    // always taken
}

Since (x ^ x) == 0 is always true, the false branch is removed. Any volatile expression means that it is always unsafe.

Constraints

  • Heuristics are only used for candidate generation
  • SMT is the only authority for correctness
  • If equivalence cannot be proven, no transformation is applied
  • Bit-width must be preserved across all transformations

Next Chapter: Chapter 17 - Lowering IR to Readable and Executable Code

Prev Chapter: Chapter 15 - Recovering Virtual Functions

This post is licensed under CC BY 4.0 by the author.