Post

Decompiler Construction: Chapter 13 - Safe Page-Level Optimization

Decompiler Construction: Chapter 13 - Safe Page-Level Optimization

Page starts are typically known or inferred correctly. However, page end boundaries are often incomplete, incorrect, or missing entirely.

Because of this, page termination must be inferred through analysis rather than assumed.

You must determine page ends safely using control-flow and how code is being executed. This usually requires a set of helper functions that can validate whether a page is complete and well-formed.

Useful information to evaluate includes:

  • Whether execution can fall out of the page (fall-through into unrelated regions)
  • Presence of jump-outs (branches exiting the page boundary)
  • Sub-pages or nested regions within the page
  • Whether external code can fall into the page (incoming edges from outside the expected region)

These determine whether a page boundary is valid or if it must be expanded, split, or merged with adjacent regions.

Determining Ends

To determine page ends, you must ensure the region does not intersect with another page and that it represents a valid scope that does not cut off any control-flow constructs, such as orphaned continue or break statements.

You need a function that can verify whether a block satisfies these constraints without violations.

Intersecting Pages

If a page intersects another page, it is generally best to move the page end above the region it intersects.

Falls

Handling falls requires treating them as jumps into a page rather than direct calls.

For fall-through behavior, you should model it similarly, but insert a label where control flow continues from the original IR. Dead labels can be removed later during cleanup.

Jumps Into

To handle jumps into a page, use SSA analysis to determine which registers are used in that region. Then introduce a virtual variable, such as a controller, which determines the entry label.

You then route execution through the parent page based on this controller value.

Example:

1
2
3
4
5
6
7
8
9
10
page_1(controller, a1) {
    if (controller == 1) {
        goto label_1;
    }
    a1 += 8;
label_1:
    a1 += 2;
    return a1;
}
page_1(1, 2);

Subpages

Sub-pages are unsafe and should be treated as independent pages. Move their code to the bottom of the program and handle fall-through behavior accordingly.

Merging Pages

Pages can be merged when they are only referenced (jumped into) from a single location. In this case, inlining is typically safe.

Args and Results

You can infer page arguments by analyzing SSA and identifying values that are used before definition within the page. These values represent incoming dependencies from outside the region and should be treated as implicit parameters of the page.

For results, you identify values that are defined within the page and later used outside of it.

Once identified, these results must be propagated through all outgoing control-flow edges, including calls and jumps, using SSA def-use chains. If the value crosses region boundaries through multiple paths, it must be carried along each path.

This turns a page into a function-like abstraction:

  • Inputs = SSA uses dominated by external definitions
  • Outputs = SSA definitions used outside the page

However, this abstraction is only valid if dominance and reachability conditions confirm that no hidden redefinitions occur across the page.


Next Chapter: Chapter 14 - Modeling Inlined Functions as Virtual Function

Prev Chapter: Chapter 12 - Abstract Modeling Pages, Regions, Self-Modifying Code, and Indirect Calls

This post is licensed under CC BY 4.0 by the author.