Decompiler Construction: Chapter 12 - Abstract Modeling Pages, Regions, Self-Modifying Code, and Indirect Calls

Posted May 27, 2026 Updated Jun 17, 2026

By Daniel

3 min read

The idea of modeling abstract concepts is to represent them in a form that is easy to analyze and transform, while still preserving all semantics necessary to handle side effects.

Architecture-Specific Instructions

Going back to early chapters when lifting to IL, you will encounter instructions that are highly CPU-specific. Example PAUSE on x86.

Instructions like this do not meaningfully modify state (registers, memory, or control flow), but instead provide hints to the processor (e.g., improving spin-wait behavior). These should be modeled explicitly as special operations, typically as intrinsic or pseudo-calls, with side-effect semantics added to them.

In this case, PAUSE can be treated as a side-effect-free operation with no impact on control flow. This makes it safer to transform during later passes, as long as timing-sensitive behavior is not a concern, modeled as critical regions.

Do not drop these instructions blindly. Model them precisely, then decide how aggressively you want to optimize them.

Pages / Regions

Executable code is often organized into logical regions (or pages), especially in a binary that relies on dynamic dispatch, trampolines, or runtime-generated code.

Instead of treating every address as unrelated, we model these as higher-level regions. A region represents a contiguous block of executable behavior that can be targeted by calls or jumps.

In IR, this is represented as pages. When a call or branch targets such a region, we transform it accordingly.

This abstraction becomes crucial when:

Multiple entry points target the same logical code
Code is dynamically generated or relocated
Control flow cannot be resolved statically

For now, the goal is not to optimize pages, but model them correctly. More aggressive transformations come later.

Self-Modifying Code (SMC)

Using the DBI data from earlier, along with the logical PC we made, we can detect and group regions of self-modifying code (SMC) into contiguous statements.

Instead of modeling each mutation independently, we treat the observed execution as mutually exclusive paths depending on the runtime memory state.

Example pseudo-IR (observed execution):

R1 = 1;
// or
R2 = 8;
R3 = 9;
// (determined to be contiguous by logical PC)

Modeled abstractly:

if (memread[**executes R1**]) {
    R1 = 1;
} else if (memread[**executes R2**]) {
    R2 = 8;
    R3 = 9;
}

Because the region is contiguous (as determined by the logical PC), we only need to guard on the first instruction. The remainder of the block is assumed to follow deterministically once the path is selected.

This abstraction allows us to:

Preserve correctness under mutation
Avoid duplicating analysis across variants
Treat SMC as structured control flow rather than randomness

However, this model is only valid if:

The region boundaries are accurate
The execution paths are truly mutually exclusive
No interleaving mutations violate the assumed structure This is all determined by the logical PC

Indirect Calls

Using information gathered from DBI, we can resolve possible targets of indirect calls.

We model the indirect read as a virtual variable so it does not introduce unintended side effects on other variables.

Example:

R2 may call either hi or hello. This can be modeled in pseudo-IR as:

virt_var1 = memread(R2);
if (virt_var1 == hi) {
    hi();
} else if (virt_var1 == hello) {
    hello();
}

Do not model indirect calls as switch statements. An if-else chain is generally safer, as it preserves order and avoids introducing incorrect assumptions about exhaustiveness. It also prevents unnecessary pollution of the CFG with artificial jump tables.

Next Chapter: Chapter 13 - Safe Page-Level Optimization

Prev Chapter: Chapter 11 - Control Flow Recovery and Branch Simplification

Decompiler Construction, Chapter 12 - Abstract Modeling Pages, Regions, Self-Modifying Code, and Indirect Calls

This post is licensed under CC BY 4.0 by the author.

Architecture-Specific Instructions

Pages / Regions

Self-Modifying Code (SMC)

Indirect Calls

Trending Tags