Decompiler Construction: Chapter 5 - Lifting IL to IR
The purpose of the IL is to simplify lifting from different architectures into the IR. An added benefit is that it enables a multi-architecture decompiler: only one lifting pipeline is needed to support targets such as ARM and x86.
Because lifting is the first stage in the IR pipeline, no optimizations should be used at this level. If the IL and IR are designed correctly, the IL should be able to lift directly to IR statements with minimal transformation.
At this stage, the goal is direct semantic translation rather than simplification. Builders and helper utilities should be used to construct IR nodes simply.
Process
Label Mapping
Before lifting anything, map labels to their IL equivalents so they can be resolved and emitted correctly during lifting.
IL Traversal
Since assembly has already been lifted into IL, the IR can be processed directly without re-parsing or lexing. IL disassembly should already be provided to make lifting simpler.
Indirect Calls
Indirect calls should be resolved using information already emitted during lifting. This data should be preserved in the IL and interpreted during analysis.
They can be modeled using conditional control flow based on resolved targets.
Note: Actual assembly-level calls should not be treated as high-level function calls. They are control-flow transfers that may also modify the stack state depending on the calling convention and preceding instructions.
To model this safely, emit a dedicated IR control-flow operation, such as:
1
page_goto(LABEL)
Because indirect calls, jmps, etc., like this are very common, it is safe to give them their own abstract type.
This ensures control flow is explicitly represented in the IR, making analysis and reconstruction more reliable.
Emissions
Most instructions should map directly to one or more IR statements. Some instructions may also require emitting additional metadata, such as flag updates or implicit side effects, depending on their semantics.
Examples
Simple IL Lift to IR:
1
2
3
LOADINT r251, 0
MOVE r250, r251
MOVE r250, r35
IR:
1
2
3
r251 = 0
r250 = r251
r250 = r35
Next Chapter: Chapter 6 - IR Semantics: Edge Cases, Undefined Behavior, and Special Handling
Prev Chapter: Chapter 4 - Designing an Architecture-Agnostic IR