Post

Decompiler Construction: Chapter 11 - Control Flow Recovery and Branch Simplification

Decompiler Construction: Chapter 11 - Control Flow Recovery and Branch Simplification

Branch simplification and recovery are not complicated conceptually, but doing them correctly without breaking semantics is where things can get fairly weird.

At the low level, control flow is just jumps. The goal here is to reduce these to a more conceptual understanding of what is going on at the IR level while safely preserving the same logic.

When simplifying IR control flow heavily, you also have the bonus of usually making SSA and CFG construction faster.

Transformations are only safe if they present no side effects and no syntactical violations occur with the IR.

Here are some of the most common ones:

Basic Branch Merging

1
2
if (?) then goto L; end
if (?) then goto L; end
1
if (? or ?) then goto L; end

Sequential Folding

1
2
3
if (?) then
    if (?) then goto L; end
end
1
if (? and ?) then goto L; end

Branch Normalization (Inversion + Fall)

1
2
3
if (!?) then goto L1; end
goto L2;
L1:
1
if (?) then goto L2; end

Loop recovery

1
2
3
4
5
L1:
....
If (?) then goto L2; end
Goto L1:
L2:
1
2
3
Do {
....
} while (not ?);

Jump Threading

1
2
3
if (?) then goto L1; end
L1:
goto L2;
1
if (?) then goto L2; end

Empty Block Removal

1
2
L1:
goto L2;

This block has no value and should be eliminated entirely by redirecting all incoming edges to L2.

Notes

All of these transformations seem trivial; any control flow transformations on the IR you must check if they do not have any structural violations (e.g., “break or continue not inside of a loop”). Any wrong transformation WILL break control flow integrity.


Next Chapter: Chapter 12 - Abstract Modeling: Pages, Regions, Self-Modifying Code, and Indirect Calls

Prev Chapter: Chapter 10 - Dead Code Elimination and IR Canonicalization

This post is licensed under CC BY 4.0 by the author.