Compiler

Compiler Optimization Techniques: Loop Invariant Code Motion and Static Single Assignment Form

J

27 Jul 2025 — 10 min read

Abstract

This report presents a comprehensive analysis of two fundamental compiler optimization techniques: Loop Invariant Code Motion (LICM) and Static Single Assignment (SSA) Form. We explore the theoretical foundations, implementation algorithms, and practical applications of these techniques in modern optimizing compilers. The report demonstrates how SSA form provides an elegant framework for implementing LICM and other dataflow optimizations efficiently.

1. Introduction

Modern compilers employ sophisticated optimization techniques to transform high-level source code into efficient machine code. Among these techniques, Loop Invariant Code Motion and Static Single Assignment Form stand out as particularly powerful and widely adopted approaches. LICM reduces redundant computations within loops, while SSA form provides a program representation that simplifies many optimization algorithms.

The relationship between these techniques is particularly interesting: SSA form’s explicit use-def chains and dominance properties make LICM implementation both more elegant and more efficient. This report examines both techniques in detail, providing formal definitions, algorithms, and implementation considerations.

2. Loop Invariant Code Motion

2.1 Formal Definition

Let us define a control flow graph $G = (V, E)$ where $V$ represents basic blocks and $E$ represents control flow edges. A loop $L$ in $G$ is a strongly connected component with a unique entry node called the header.

Definition 2.1 (Loop Invariant). An instruction $i$ is loop invariant with respect to loop $L$ if and only if:

$$\forall \text{ executions } e_1, e_2 \text{ of } i \text{ within } L: \text{value}(i, e_1) = \text{value}(i, e_2)$$

We define the set of loop invariant instructions inductively:

Base case: An instruction $i$ is loop invariant if all its operands are either:

Constants
Defined outside loop $L$

Inductive case: An instruction $i$ is loop invariant if all its operands are either:

Constants
Defined outside loop $L$
Defined by instructions already proven to be loop invariant

Formally, the set of loop invariant instructions is the least fixed point:

$$\text{Invariant}(L) = \text{lfp}(\lambda S. {i \in L : \forall \text{op} \in \text{operands}(i), \text{op} \notin L \lor \text{op} \in S})$$

where $\text{lfp}$ denotes the least fixed point operator.

2.2 Mathematical Framework

The LICM optimization can be formulated as a dataflow analysis problem. We define the following lattice:

Let $\mathcal{L} = (\mathcal{P}(\text{Instructions}), \subseteq)$ be the powerset lattice of instructions, where the dataflow value at each program point represents the set of loop invariant instructions.

The transfer function for a basic block $B$ is:

$$f_B(X) = \text{Gen}_B \cup (X - \text{Kill}_B)$$

where:

$\text{Gen}_B$ = instructions in $B$ whose operands are all defined outside the loop or in $X$
$\text{Kill}_B$ = instructions whose results are redefined in $B$

2.3 The LICM Algorithm

The complete LICM algorithm consists of three phases: identification, safety analysis, and code motion.

Phase 1: Invariant Identification

// Algorithm 1: Loop Invariant Identification
auto identify_invariants(Loop* L, DominatorTree* DT) -> std::set<Instruction*> {
  auto invariants = std::set<Instruction*> {};
  auto changed = true;

  // Fixed-point iteration
  while (changed) {
    changed = false;
    for (auto* BB : L->blocks()) {
      for (auto& I : BB->instructions()) {
        // Check if all operands are loop invariant
        if (!invariants.contains(&I) && is_invariant_candidate(&I, L, invariants)) {
          invariants.insert(&I);
          changed = true;
        }
      }
    }
  }

  return invariants;
}

The correctness of this algorithm follows from the monotonicity of the transfer function:

Theorem 2.1. The invariant identification algorithm terminates and computes the largest set of loop invariant instructions.

Proof. The algorithm computes the least fixed point of the equation:
$$\text{Inv} = {i \in L : \forall \text{ op } \in \text{operands}(i), \text{ op } \notin L \lor \text{ op } \in \text{Inv}}$$

Since $\mathcal{P}(\text{Instructions})$ forms a complete lattice and the transfer function is monotonic, by the Knaster-Tarski theorem, a least fixed point exists and is computable by iteration.

Phase 2: Safety Analysis

An instruction $i$ can be safely hoisted if and only if:

$$\text{Safe}(i) \iff \text{NoSideEffects}(i) \land \text{DominatesUses}(i, L) \land \text{SpeculationSafe}(i) \land \text{NoAliasing}(i, L)$$

The dominance condition ensures that the instruction would have been executed on all paths through the loop:

Definition 2.2 (Safe to Hoist - Refined). An instruction $i$ in loop $L$ is safe to hoist if:

$i$ has no side effects
For all uses $u$ of $i$ within $L$: $\text{Dom}(i, u)$ (i dominates all its uses in the loop)
Either:
- $i$ is guaranteed to execute in $L$ (i.e., $\text{Dom}(\text{header}(L), \text{BB}(i))$), OR
- $i$ does not throw exceptions and does not access memory
$\forall j \in L$ where $j$ may access memory: $\neg\text{MayAlias}(i, j)$

This refined condition allows partial hoisting when the instruction doesn’t execute on all paths, provided it’s safe to speculatively execute. For instructions that may not execute on all loop iterations but are otherwise safe to move, we can employ partial LICM techniques.

Phase 3: Code Motion

The actual code motion must preserve program semantics. We insert instructions into the loop preheader:

// Algorithm 2: Code Motion with Dependency Ordering
void perform_code_motion(Loop* L, std::set<Instruction*>& invariants,
                         DominatorTree* DT) {
  // Compute dependency graph
  auto dep_graph = build_dependency_graph(invariants);

  // Topological sort respecting dependencies
  auto sorted = topological_sort(dep_graph);

  auto* preheader = L->preheader();
  for (auto* I : sorted) {
    if (is_safe_to_hoist(I, L, DT)) {
      I->moveBefore(preheader->terminator());
    }
  }
}

2.4 Partial Loop Invariant Code Motion

When an invariant instruction doesn’t dominate all loop exits but is otherwise safe to speculate, we can perform partial hoisting. This technique is particularly beneficial for instructions in conditionally executed blocks within loops.

The key insight is to hoist the computation while preserving the original control flow semantics:

// Algorithm 2b: Partial LICM
bool perform_partial_licm(Loop* L, Instruction* I, DominatorTree* DT) {
  // Check if partial hoisting is profitable
  if (!is_profitable_to_partially_hoist(I, L)) {
    return false;
  }

  // Create a flag to track if the instruction would have executed
  auto* preheader = L->preheader();
  auto* flag = create_flag_variable(preheader, false);

  // Set flag to true on paths where I executes
  for (auto* pred : I->parent()->predecessors()) {
    if (L->contains(pred)) {
      insert_flag_update(pred, flag, true);
    }
  }

  // Hoist the computation guarded by the flag
  auto* hoisted = I->clone();
  auto* guard = create_guard(preheader, flag, hoisted);

  // Replace original instruction with hoisted result
  I->replaceAllUsesWith(hoisted);
  I->eraseFromParent();

  return true;
}

This approach enables optimization of loops with complex control flow while maintaining correctness.

2.5 Complexity Analysis

The time complexity of LICM is $O(n^2)$ in the worst case, where $n$ is the number of instructions in the loop. However, in practice, the algorithm exhibits near-linear behavior:

Theorem 2.2. For a loop with $n$ instructions and maximum nesting depth $d$, LICM has complexity $O(n \cdot d \cdot \alpha(n))$, where $\alpha$ is the inverse Ackermann function from union-find operations in dominance queries.

3. Static Single Assignment Form

3.1 Formal Definition

Definition 3.1 (SSA Form). A program is in SSA form if and only if:

Each variable is assigned exactly once in the program text
Each use of a variable is dominated by its definition

Formally, for a CFG $G = (V, E)$ with variables $\text{Vars}$:

$$\forall v \in \text{Vars}: |{d : d \text{ defines } v}| = 1$$

$$\forall \text{ use } u \text{ of } v: \exists! \text{ def } d \text{ of } v : \text{Dom}(d, u)$$

3.2 Phi Functions

To maintain single assignment at control flow merge points, we introduce $\phi$-functions:

Definition 3.2 ($\phi$-function). A $\phi$-function at the beginning of block $B$ with $k$ predecessors is defined as:

$$v_i = \phi(v_{i_1}: B_1, v_{i_2}: B_2, …, v_{i_k}: B_k)$$

where $v_{i_j}$ is the value of $v$ coming from predecessor $B_j$.

The semantics of $\phi$-functions can be formalized using the following operational rule:

$$\frac{\text{CFG edge } B_j \rightarrow B \text{ is taken, } v_{i_j} = c}{v_i = \phi(…, v_{i_j}: B_j, …) \text{ evaluates to } c}$$

3.3 SSA Construction Algorithm

SSA construction involves two main phases: $\phi$-function placement and variable renaming.

Phase 1: Computing Dominance Frontiers

The dominance frontier of a node $n$ is:

$$\text{DF}(n) = {m : n \text{ dominates a predecessor of } m \text{ but does not strictly dominate } m}$$

More formally:
$$\text{DF}(n) = {m : \exists p \in \text{pred}(m), n \in \text{Dom}(p) \land n \notin \text{Dom}(m) - {m}}$$

// Algorithm 3: Dominance Frontier Computation
auto compute_dominance_frontier(const CFG& cfg, const DominatorTree& dt)
    -> std::map<BasicBlock*, std::set<BasicBlock*>> {
  auto df = std::map<BasicBlock*, std::set<BasicBlock*>> {};

  for (auto* b : cfg.blocks()) {
    for (auto* p : b->predecessors()) {
      auto* runner = p;
      // Walk up dominator tree until we reach b's immediate dominator
      while (runner != dt.idom(b)) {
        df[runner].insert(b);
        runner = dt.idom(runner);
      }
    }
  }

  return df;
}

Note on Efficient Implementation: While Algorithm 3 clearly illustrates the concept of dominance frontiers, production compilers employ more efficient algorithms. Cytron et al.’s algorithm computes all dominance frontiers in $O(n^2)$ time through a single post-order traversal of the dominator tree:

// Sketch of Cytron's efficient algorithm
auto compute_df_cytron(const DominatorTree& dt)
    -> std::map<BasicBlock*, std::set<BasicBlock*>> {
  auto df = std::map<BasicBlock*, std::set<BasicBlock*>> {};

  // Post-order traversal of dominator tree
  for (auto* b : post_order(dt)) {
    // Local contribution
    for (auto* pred : b->predecessors()) {
      if (dt.idom(b) != pred) {
        df[pred].insert(b);
      }
    }

    // Up contribution
    for (auto* child : dt.children(b)) {
      for (auto* w : df[child]) {
        if (dt.idom(w) != b) {
          df[b].insert(w);
        }
      }
    }
  }

  return df;
}

This algorithm exploits the property that dominance frontiers can be computed bottom-up in the dominator tree, achieving better performance for large CFGs.

Phase 2: Phi Function Placement

Given a set of blocks $S$ that contain definitions of variable $v$, we must place $\phi$-functions at the iterated dominance frontier:

$$\text{DF}^+(S) = \bigcup_{i=1}^{\infty} \text{DF}^i(S)$$

where $\text{DF}^1(S) = \text{DF}(S)$ and $\text{DF}^{i+1}(S) = \text{DF}(S \cup \text{DF}^i(S))$.

Theorem 3.1 (Minimal Phi Placement). The set $\text{DF}^+(S)$ represents the minimal set of blocks requiring $\phi$-functions for variable $v$ defined in blocks $S$.

Phase 3: Variable Renaming

The renaming phase performs a depth-first traversal of the dominator tree:

// Algorithm 4: SSA Variable Renaming
class SSARenamer {
  std::map<string, std::stack<std::string>> var_stacks_;
  std::map<std::string, int32_t> var_counters_;

  auto rename_block(BasicBlock* bb, DominatorTree* dt) -> void {
    auto saved_sizes = std::map<std::string, size_t> {};

    // Save stack sizes for restoration
    for (const auto& [var, stack] : var_stacks_) {
      saved_sizes[var] = stack.size();
    }

    // Process phi functions
    for (auto& phi : bb->phis()) {
      auto new_name = get_fresh_name(phi.var);
      phi.result = new_name;
      var_stacks_[phi.var].push(new_name);
    }

    // Process instructions
    for (auto& inst : bb->instructions()) {
      // Rename uses
      for (auto& operand : inst.operands()) {
        if (is_variable(operand) && !var_stacks_[operand].empty()) {
          operand = var_stacks_[operand].top();
        }
      }

      // Rename definition
      if (inst.defines_variable()) {
        auto new_name = get_fresh_name(inst.result);
        var_stacks_[inst.result].push(new_name);
        inst.result = new_name;
      }
    }

    // Fill in phi operands in successors
    for (auto* succ : bb->successors()) {
      for (auto& phi : succ->phis()) {
        if (!var_stacks_[phi.var].empty()) {
          phi.add_operand(var_stacks_[phi.var].top(), bb);
        }
      }
    }

    // Recursively process dominated blocks
    for (auto* child : dt->children(bb)) {
      rename_block(child, dt);
    }

    // Restore stacks
    for (auto& [var, stack] : var_stacks_) {
      while (stack.size() > saved_sizes[var]) {
        stack.pop();
      }
    }
  }
};

3.4 Properties of SSA Form

SSA form exhibits several important properties that facilitate optimization:

Property 3.1 (Unique Reaching Definition). For every use of a variable $v$ in SSA form, there exists exactly one reaching definition.

Property 3.2 (Dominance Property). Every use of a variable $v$ is dominated by its unique definition.

These properties enable efficient implementation of many optimizations:

Constant Propagation: $O(n)$ using sparse conditional constant propagation
Dead Code Elimination: $O(n)$ using a simple mark-and-sweep algorithm
Global Value Numbering: $O(n \log n)$ using hash-based partitioning

4. LICM in SSA Form

The combination of LICM with SSA form yields particularly elegant algorithms. The explicit use-def chains in SSA eliminate the need for iterative dataflow analysis.

4.1 SSA-Based Loop Invariant Detection

In SSA form, loop invariance can be determined by a simple graph traversal:

// Algorithm 5: SSA-based Loop Invariant Detection
bool is_loop_invariant_ssa(Value* v, Loop* L) {
  if (auto* inst = dynamic_cast<Instruction*>(v)) {
    // Check if instruction is in loop
    if (!L->contains(inst->parent())) {
      return true;  // Defined outside loop
    }

    // Check if it's a phi at loop header
    if (auto* phi = dynamic_cast<PhiNode*>(inst)) {
      if (phi->parent() == L->header()) {
        // Loop-carried dependency - not invariant
        return false;
      }
    }

    // Recursively check operands
    for (auto* operand : inst->operands()) {
      if (!is_loop_invariant_ssa(operand, L)) {
        return false;
      }
    }

    return true;
  }

  // Constants and function arguments are invariant
  return true;
}

4.2 Advantages of SSA-Based LICM

The SSA representation provides several advantages for LICM:

Direct Use-Def Chains: No need for reaching definitions analysis
Simplified Dependency Analysis: Dependencies are explicit in the SSA graph
Efficient Safety Checks: Dominance relationships are preserved in SSA

Theorem 4.1. LICM in SSA form has complexity $O(n \cdot \alpha(n))$ where $n$ is the number of instructions and $\alpha$ is the inverse Ackermann function.

Proof sketch. Each instruction is visited once, and dominance queries using preprocessed dominator trees have near-constant time complexity.

5. Advanced Topics

5.1 Memory SSA

Traditional SSA only handles scalar variables. Memory SSA extends the concept to memory operations, addressing the challenge of applying SSA concepts when memory locations are not statically known and may alias.

Key Concepts:

Memory Definitions (MemoryDef): Store operations that modify memory state
Memory Uses (MemoryUse): Load operations that read memory
Memory Phi (MemoryPhi): Merge points for memory states at control flow joins

// Memory SSA representation
class MemorySSA {
  // Memory states are versioned like variables
  using MemoryVersion = int32_t;

  struct MemoryDef {
    Instruction* inst;
    MemoryVersion version;
    MemoryVersion uses;  // Previous version
  };

  struct MemoryUse {
    Instruction* inst;
    MemoryVersion version;  // Version being used
  };

  struct MemoryPhi {
    BasicBlock* block;
    MemoryVersion result;
    std::vector<std::pair<MemoryVersion, BasicBlock*>> operands;
  };

  // Walking algorithm for Memory SSA optimization
  Value* optimize_memory_access(MemoryAccess* MA, AliasAnalysis* AA) {
    if (auto* MU = dyn_cast<MemoryUse>(MA)) {
      auto* DefiningAccess = MU->getDefiningAccess();

      // Walk up the memory SSA graph
      while (auto* MD = dyn_cast<MemoryDef>(DefiningAccess)) {
        if (AA->mustAlias(MU->getMemoryInst(),
                          MD->getMemoryInst())) {
          // Found exact definition
          return MD->getValue();
        }

        if (!AA->mayAlias(MU->getMemoryInst(),
                          MD->getMemoryInst())) {
          // This def doesn't affect our use, skip it
          DefiningAccess = MD->getDefiningAccess();
          continue;
        }

        // May alias - conservative assumption
        break;
      }
    }
    return nullptr;
  }
};

Memory SSA enables precise alias analysis and memory-based optimizations within the SSA framework. For example, it can identify redundant loads:

// Before Memory SSA optimization
x = load ptr1
store ptr2, 42
y = load ptr1  // Is this redundant?

// With Memory SSA, if ptr1 and ptr2 don't alias:
x = load ptr1
store ptr2, 42
y = x  // Load eliminated!

This representation is particularly powerful for optimizations like:

Load/store forwarding
Dead store elimination
Memory-aware LICM
Alias-aware redundancy elimination

5.2 Gated SSA and Predicated SSA

Extensions to basic SSA form include:

Gated SSA (GSA): Includes predicate information at $\phi$-functions
Predicated SSA: Explicitly represents control dependencies

These forms enable more aggressive optimizations but increase representation complexity.

6. Implementation Considerations

6.1 Engineering Trade-offs

Real-world implementations must balance several factors:

Construction Time vs. Optimization Power: Pruned SSA reduces construction overhead
Memory Usage: Semi-pruned SSA offers a middle ground
Destruction Overhead: Out-of-SSA translation requires careful handling of $\phi$-functions

6.2 Integration with Other Optimizations

SSA form serves as a foundation for numerous optimizations:

Sparse Conditional Constant Propagation (SCCP)
Partial Redundancy Elimination (PRE)
Strength Reduction
Induction Variable Analysis

7. Conclusion

Loop Invariant Code Motion and Static Single Assignment Form represent fundamental techniques in modern compiler optimization. SSA’s elegant representation of program data flow significantly simplifies the implementation of LICM and other optimizations. The mathematical foundations presented in this report demonstrate the theoretical soundness of these approaches, while the algorithms show their practical applicability.

The synergy between LICM and SSA exemplifies how good intermediate representations can dramatically simplify optimization algorithms. As compiler technology continues to evolve, these foundational techniques remain central to achieving high-performance code generation.

Future research directions include extending these techniques to handle modern programming paradigms such as concurrent and parallel programming, where traditional dominance-based analyses require adaptation to account for non-deterministic execution orders.

References

The algorithms and theoretical foundations presented in this report are based on seminal works in compiler construction, including contributions from Cytron et al. on SSA construction, and extensive research on loop optimizations in production compilers such as LLVM and GCC.