Decompiler Construction: Chapter 7 - Reconstructing Control Flow using Graph Theory, SSA, and Types

Posted May 27, 2026 Updated Jun 17, 2026

By Daniel

6 min read

Now that the IR is ready to go, we can begin transforming it into readable code.

Pass Methodology

This chapter and the preceding chapters (up to the final one) define transformation passes. Each pass operates on the IR and may modify it.

Passes must be applied iteratively until a fixed point is reached i.e., until no pass produces any changes.

After any modification, all passes must be re-run.
Termination is achieved only when a full pass cycle results in no changes.

Each transformation must reduce or simplify the IR toward a more canonical form.

Emitted statements and expressions should be as minimal as possible.
Transformations must be convergent (i.e., avoid oscillation or expansion).

Failure to enforce minimal and convergent transformations may result in non-termination.

Graphing IR

Because a CFG is not maintained directly, a helper function must be used to construct one from the IR. A CFG allows us a data structure to easily and simply follow and analyze control flow.

Node Construction

IR statements are linear. CFG nodes (basic blocks) are constructed by grouping contiguous statements into ranges [start, end], where start and end are statement indices.

A new node begins at:

The entry point of the IR
Any statement that is the target of a control flow edge (e.g., jump/branch target)
Any statement immediately following a control flow terminator

A node ends at:

Any control flow terminator (e.g., jump, conditional branch, return)
The statement immediately before a new node begins

Each node must:

Contain only sequential statements with no internal control flow splits
Have a single entry point (start)
End with at most one control flow terminator

Edges

This process analyzes control flow constructs and emits edges between nodes:

Conditional statements (if/else) create branching edges.
Loop constructs (while, do-while) introduce back-edges and cycles.
Page calls and any abstract jumps create edges.

The resulting CFG must:

Accurately represent all possible execution paths.
Ensure every block is reachable or explicitly marked unreachable.
Resolve all jump targets into valid graph edges.

CFG construction must be deterministic and repeatable, as it is relied upon by many analysis and transformation passes.

Example Graph

Lets refer back to this simple IR

int32_t hello = 0;
while (hello < 100) {
	hello = hello + (hello * hello);
}
printf("%d, %d\n", hello, hello);

Download GraphViz Example

Static Single Assignment

Static Single Assignment (SSA) assigns each variable a unique version (internal register) for every write.

If a variable can have multiple possible definitions at a merge point in a CFG, a φ (phi) node is used to select the correct version based on the incoming path.

Algorithm

The standard approach for constructing SSA form is the Cytron et al. algorithm using dominance frontiers.

It operates by first computing the CFG and dominator tree, then using dominance frontiers to determine where φ (phi) definitions are required.

Phi-nodes are placed at all CFG join points that are in the dominance frontier of a definition, guaranteeing that all possible reaching definitions are correctly merged.

Time complexity:

V = number of variables
B = number of basic blocks
E = number of CFG edges

With Lengauer-Tarjan Imidiate Dominators O(V * E + E (E, B))

Psuedo-code:

INPUT:
    CFG with basic blocks
    defsites[v] = set of blocks where variable v is assigned

OUTPUT:
    CFG in SSA form (with phi nodes + renamed variables)

STEP 1: Compute Dominators
{
computeDominators(CFG)
    for each block b:
        dom[b] = all blocks initially

    dom[start] = {start}

    repeat until no change:
        for each block b != start:
            dom[b] = intersection of dom[p] for all predecessors p of b
                     union {b}
}

STEP 2: Compute Immediate Dominators
{
idom = buildImmediateDominators(dom)

build dominator tree from idom
}

STEP 3: Compute Dominance Frontiers
{
computeDF(b):

    DF[b] = empty set
    for each successor s of b:
        if idom[s] != b:
            add s to DF[b]

    for each child c in dominator tree:
        for each w in DF[c]:
            if idom[w] != b:
                add w to DF[b]
}

STEP 4: Place φ-Def Sites
{
for each variable v:
    worklist = defsites[v]
    hasPhi = empty set

    while worklist not empty:
        b = pop(worklist)

        for each d in DF[b]:
            if d not in hasPhi:
                insert φ(v) in block d
                hasPhi.add(d)

                if v not defined in d:
                    worklist.add(d)
}

STEP 5: Rename Variables (SSA enforcement)
{
rename(block b):
    for each instruction in b:
        replace uses with current version on stack
        replace defs with new version:
            v_i = new version counter(v)
            push v_i onto stack[v]

    for each successor s:
        for each φ in s:
            set φ argument for edge b -> s
            to current stack version of that variable

    for each child c in dominator tree:
        rename(c)

    pop all variables defined in b from stack
}

start:
    initialize empty stacks for all variables
    rename(start_block)

Example SSA Form

We use R as the prefix here for there SSA version, given:

int32_t hello = 0;
while (hello < 100) {
	hello = hello + (hello * hello);
}
printf("%d, %d\n", hello, hello);

SSA:

int32_t hello(R1) = 0;  
while (hello(PHI[R1, R2]) < 100) { 
	hello(R2) = hello(PHI[R1, R2]) + (hello(PHI[R1, R2]) * hello(PHI[R1, R2])); 
} 
printf("%d, %d\n", hello(PHI[R1, R2]), hello(PHI[R1, R2]));

Types and Abstract Types using SSA

Type information originates during Assembly -> IL lifting, where instructions use bitcast to redundantly cast each register use, etc. These types must be preserved in the IR, even if redundant, as they provide the foundation for later refinement.

However, lifted types are not authoritarian; they are often incomplete, overly specific, and overly redundant. The type system must utilize SSA to help refine and simplify them.

Using SSA

Each SSA value is associated with a set of possible types, rather than a single fixed type. Initially, this set may contain only the lifted type, but it is progressively refined through usage. A direct assignment:

R1 = ...

assigns R1 an initial type from lifting. A phi-node:

R3 = PHI(R1, R2)

represents a merge of possible values and must also merge their types:

type(R3) = merge(type(R1), type(R2))

Type Merging

Type merging must be explicitly defined and deterministic:

Identical types -> preserved
Compatible types -> promoted (e.g., int + float -> numeric)
Incompatible/abstract types -> USER HANDLED

Dominance must not be used to arbitrarily select a type. Doing so discards valid paths and produces incorrect results.

Types are refined by propagating from SSA uses:

Arithmetic operations -> numeric types
Memory access (*R) -> pointer types, obey region size
Collapse equivalent types
Eliminate unused SAFE abstractions
Lower SAFE abstract types into primitive types when possible

Premature simplification must be avoided, as it can destroy information required for correct reconstruction.

Next Chapter: Chapter 8 - Designing Safe Optimization Passes

Prev Chapter: Chapter 6 - IR Semantics Edge Cases, Undefined Behavior, and Special Handling

Decompiler Construction, Chapter 7 - Reconstructing Control Flow using Graph Theory, SSA, and Types

This post is licensed under CC BY 4.0 by the author.

Decompiler Construction: Chapter 7 - Reconstructing Control Flow using Graph Theory, SSA, and Types

Pass Methodology

Graphing IR

Node Construction

Edges

Example Graph

Static Single Assignment

Algorithm

Example SSA Form

Types and Abstract Types using SSA

Using SSA

Type Merging

Refinement

Trending Tags