Decompiler Construction: Chapter 16 - Deobfuscation Mixed Boolean Arithmetic and Opaque Predicate Removal
We will be utilizing Z3 SMT solver for this chapter.
All reasoning in this chapter assumes fixed-width bit-vector semantics (e.g., 32-bit, 64-bit) with wraparound behavior.*
Mixed Boolean Arithmetic
Mixed Boolean Arithmetic (MBA) obfuscation relies on expressing simple arithmetic operations using redundant combinations of bitwise and arithmetic identities while preserving semantics.
Because these transformations are not unique, simplification must be proven semantically rather than relying on pattern matching.
A minimal form is defined as a canonical representation that:
- Uses the fewest operations
- Minimizes expression depth
- Prefers arithmetic operators over expanded boolean forms
Equivalence Checking
To safely simplify an expression E into F, we must prove:
1
E == F
This is done by encoding both expressions into Z3 and checking:
1
(E != F) is UNSAT
Only if this condition holds can the transformation be applied.
Common MBA Identities
While heuristics are unsafe, they are useful for generating minimal forms before SMT validation.
Linear Reconstruction (Addition Patterns)
Many MBAs expand addition into XOR + carries:
1
(x ^ y) + ((x & y) << 1)
1
x + y
Reduction strategy:
- Detect XOR/AND carry decomposition patterns
- Reconstruct the addition operator
- Verify with SMT equivalence check
XOR Linear Forms
1
(x ^ y ^ x)
1
y
Reduction strategy:
Normalize XOR chains using associativity and self-cancellation:
x ^ x = 0x ^ 0 = x
Constant Mask Folding
1
(x & 0xFF) | (x & ~0xFF)
1
x
Reduction strategy:
- Perform bit-mask partition analysis
- Ensure masks fully cover the domain
- Validate with SMT when non-trivial
Boolean Collapse Patterns
1
(x | ~x)
1
-1
1
(x & ~x)
1
0
These must respect the bit-width of the expression.
Common MBA Patterns
x+(x*y)->x*(y+1)x+(y*x)->x*(y+1)(x*y)+x->x*(y+1)(y*x)+x->x*(y+1)(x*y)+(z*x)->x*(y+z)(y*x)+(x*z)->x*(y+z)x+(y+(x*w))->y+(x*(w+1))(x*y)+((~x)*y)->-y(x|(~y))&((-x)|y)->x(~(x))+1->-x((x*y)+((~x)*(-y)))->((2*x)+1)*y
All non-trivial patterns must be verified with SMT before applying.
Opaque Predicate Removal
An opaque predicate is a condition that always evaluates to a constant (true or false) but is constructed to appear dynamic.
To remove an opaque predicate:
- Encode the condition
Cinto Z3 - Check the satisfiability of both branches:
1
2
C == true
C == false
If one branch is UNSAT:
- Remove the unreachable branch
- Simplify the CFG accordingly
Example:
1
2
3
if ((x ^ x) == 0) {
// always taken
}
Since (x ^ x) == 0 is always true, the false branch is removed. Any volatile expression means that it is always unsafe.
Constraints
- Heuristics are only used for candidate generation
- SMT is the only authority for correctness
- If equivalence cannot be proven, no transformation is applied
- Bit-width must be preserved across all transformations
Next Chapter: Chapter 17 - Lowering IR to Readable and Executable Code