Compiler Optimization Techniques: Loop Invariant Code Motion and Static Single Assignment Form
Abstract
This report presents a comprehensive analysis of two fundamental compiler optimization techniques: Loop Invariant Code Motion (LICM) and Static Single Assignment (SSA) Form. We explore the theoretical foundations, implementation algorithms, and practical applications of these techniques in modern optimizing compilers. The report demonstrates how SSA form provides an elegant framework for implementing LICM and other dataflow optimizations efficiently.
1. Introduction
Modern compilers employ sophisticated optimization techniques to transform high-level source code into efficient machine code. Among these techniques, Loop Invariant Code Motion and Static Single Assignment Form stand out as particularly powerful and widely adopted approaches. LICM reduces redundant computations within loops, while SSA form provides a program representation that simplifies many optimization algorithms.
The relationship between these techniques is particularly interesting: SSA form’s explicit use-def chains and dominance properties make LICM implementation both more elegant and more efficient. This report examines both techniques in detail, providing formal definitions, algorithms, and implementation considerations.
2. Loop Invariant Code Motion
2.1 Formal Definition
Let us define a control flow graph $G = (V, E)$ where $V$ represents basic blocks and $E$ represents control flow edges. A loop $L$ in $G$ is a strongly connected component with a unique entry node called the header.
Definition 2.1 (Loop Invariant). An instruction $i$ is loop invariant with respect to loop $L$ if and only if:
$$\forall \text{ executions } e_1, e_2 \text{ of } i \text{ within } L: \text{value}(i, e_1) = \text{value}(i, e_2)$$
We define the set of loop invariant instructions inductively:
Base case: An instruction $i$ is loop invariant if all its operands are either:
- Constants
- Defined outside loop $L$
Inductive case: An instruction $i$ is loop invariant if all its operands are either:
- Constants
- Defined outside loop $L$
- Defined by instructions already proven to be loop invariant
Formally, the set of loop invariant instructions is the least fixed point:
$$\text{Invariant}(L) = \text{lfp}(\lambda S. {i \in L : \forall \text{op} \in \text{operands}(i), \text{op} \notin L \lor \text{op} \in S})$$
where $\text{lfp}$ denotes the least fixed point operator.
2.2 Mathematical Framework
The LICM optimization can be formulated as a dataflow analysis problem. We define the following lattice:
Let $\mathcal{L} = (\mathcal{P}(\text{Instructions}), \subseteq)$ be the powerset lattice of instructions, where the dataflow value at each program point represents the set of loop invariant instructions.
The transfer function for a basic block $B$ is:
$$f_B(X) = \text{Gen}_B \cup (X - \text{Kill}_B)$$
where:
- $\text{Gen}_B$ = instructions in $B$ whose operands are all defined outside the loop or in $X$
- $\text{Kill}_B$ = instructions whose results are redefined in $B$
2.3 The LICM Algorithm
The complete LICM algorithm consists of three phases: identification, safety analysis, and code motion.
Phase 1: Invariant Identification
// Algorithm 1: Loop Invariant Identification
auto identify_invariants(Loop* L, DominatorTree* DT) -> std::set<Instruction*> {
auto invariants = std::set<Instruction*> {};
auto changed = true;
// Fixed-point iteration
while (changed) {
changed = false;
for (auto* BB : L->blocks()) {
for (auto& I : BB->instructions()) {
// Check if all operands are loop invariant
if (!invariants.contains(&I) && is_invariant_candidate(&I, L, invariants)) {
invariants.insert(&I);
changed = true;
}
}
}
}
return invariants;
}
The correctness of this algorithm follows from the monotonicity of the transfer function:
Theorem 2.1. The invariant identification algorithm terminates and computes the largest set of loop invariant instructions.
Proof. The algorithm computes the least fixed point of the equation:
$$\text{Inv} = {i \in L : \forall \text{ op } \in \text{operands}(i), \text{ op } \notin L \lor \text{ op } \in \text{Inv}}$$
Since $\mathcal{P}(\text{Instructions})$ forms a complete lattice and the transfer function is monotonic, by the Knaster-Tarski theorem, a least fixed point exists and is computable by iteration.
Phase 2: Safety Analysis
An instruction $i$ can be safely hoisted if and only if:
$$\text{Safe}(i) \iff \text{NoSideEffects}(i) \land \text{DominatesUses}(i, L) \land \text{SpeculationSafe}(i) \land \text{NoAliasing}(i, L)$$
The dominance condition ensures that the instruction would have been executed on all paths through the loop:
Definition 2.2 (Safe to Hoist - Refined). An instruction $i$ in loop $L$ is safe to hoist if:
- $i$ has no side effects
- For all uses $u$ of $i$ within $L$: $\text{Dom}(i, u)$ (i dominates all its uses in the loop)
- Either:
- $i$ is guaranteed to execute in $L$ (i.e., $\text{Dom}(\text{header}(L), \text{BB}(i))$), OR
- $i$ does not throw exceptions and does not access memory
- $\forall j \in L$ where $j$ may access memory: $\neg\text{MayAlias}(i, j)$
This refined condition allows partial hoisting when the instruction doesn’t execute on all paths, provided it’s safe to speculatively execute. For instructions that may not execute on all loop iterations but are otherwise safe to move, we can employ partial LICM techniques.
Phase 3: Code Motion
The actual code motion must preserve program semantics. We insert instructions into the loop preheader:
// Algorithm 2: Code Motion with Dependency Ordering
void perform_code_motion(Loop* L, std::set<Instruction*>& invariants,
DominatorTree* DT) {
// Compute dependency graph
auto dep_graph = build_dependency_graph(invariants);
// Topological sort respecting dependencies
auto sorted = topological_sort(dep_graph);
auto* preheader = L->preheader();
for (auto* I : sorted) {
if (is_safe_to_hoist(I, L, DT)) {
I->moveBefore(preheader->terminator());
}
}
}
2.4 Partial Loop Invariant Code Motion
When an invariant instruction doesn’t dominate all loop exits but is otherwise safe to speculate, we can perform partial hoisting. This technique is particularly beneficial for instructions in conditionally executed blocks within loops.
The key insight is to hoist the computation while preserving the original control flow semantics:
// Algorithm 2b: Partial LICM
bool perform_partial_licm(Loop* L, Instruction* I, DominatorTree* DT) {
// Check if partial hoisting is profitable
if (!is_profitable_to_partially_hoist(I, L)) {
return false;
}
// Create a flag to track if the instruction would have executed
auto* preheader = L->preheader();
auto* flag = create_flag_variable(preheader, false);
// Set flag to true on paths where I executes
for (auto* pred : I->parent()->predecessors()) {
if (L->contains(pred)) {
insert_flag_update(pred, flag, true);
}
}
// Hoist the computation guarded by the flag
auto* hoisted = I->clone();
auto* guard = create_guard(preheader, flag, hoisted);
// Replace original instruction with hoisted result
I->replaceAllUsesWith(hoisted);
I->eraseFromParent();
return true;
}
This approach enables optimization of loops with complex control flow while maintaining correctness.
2.5 Complexity Analysis
The time complexity of LICM is $O(n^2)$ in the worst case, where $n$ is the number of instructions in the loop. However, in practice, the algorithm exhibits near-linear behavior:
Theorem 2.2. For a loop with $n$ instructions and maximum nesting depth $d$, LICM has complexity $O(n \cdot d \cdot \alpha(n))$, where $\alpha$ is the inverse Ackermann function from union-find operations in dominance queries.
3. Static Single Assignment Form
3.1 Formal Definition
Definition 3.1 (SSA Form). A program is in SSA form if and only if:
- Each variable is assigned exactly once in the program text
- Each use of a variable is dominated by its definition
Formally, for a CFG $G = (V, E)$ with variables $\text{Vars}$:
$$\forall v \in \text{Vars}: |{d : d \text{ defines } v}| = 1$$
$$\forall \text{ use } u \text{ of } v: \exists! \text{ def } d \text{ of } v : \text{Dom}(d, u)$$
3.2 Phi Functions
To maintain single assignment at control flow merge points, we introduce $\phi$-functions:
Definition 3.2 ($\phi$-function). A $\phi$-function at the beginning of block $B$ with $k$ predecessors is defined as:
$$v_i = \phi(v_{i_1}: B_1, v_{i_2}: B_2, …, v_{i_k}: B_k)$$
where $v_{i_j}$ is the value of $v$ coming from predecessor $B_j$.
The semantics of $\phi$-functions can be formalized using the following operational rule:
$$\frac{\text{CFG edge } B_j \rightarrow B \text{ is taken, } v_{i_j} = c}{v_i = \phi(…, v_{i_j}: B_j, …) \text{ evaluates to } c}$$
3.3 SSA Construction Algorithm
SSA construction involves two main phases: $\phi$-function placement and variable renaming.
Phase 1: Computing Dominance Frontiers
The dominance frontier of a node $n$ is:
$$\text{DF}(n) = {m : n \text{ dominates a predecessor of } m \text{ but does not strictly dominate } m}$$
More formally:
$$\text{DF}(n) = {m : \exists p \in \text{pred}(m), n \in \text{Dom}(p) \land n \notin \text{Dom}(m) - {m}}$$
// Algorithm 3: Dominance Frontier Computation
auto compute_dominance_frontier(const CFG& cfg, const DominatorTree& dt)
-> std::map<BasicBlock*, std::set<BasicBlock*>> {
auto df = std::map<BasicBlock*, std::set<BasicBlock*>> {};
for (auto* b : cfg.blocks()) {
for (auto* p : b->predecessors()) {
auto* runner = p;
// Walk up dominator tree until we reach b's immediate dominator
while (runner != dt.idom(b)) {
df[runner].insert(b);
runner = dt.idom(runner);
}
}
}
return df;
}
Note on Efficient Implementation: While Algorithm 3 clearly illustrates the concept of dominance frontiers, production compilers employ more efficient algorithms. Cytron et al.’s algorithm computes all dominance frontiers in $O(n^2)$ time through a single post-order traversal of the dominator tree:
// Sketch of Cytron's efficient algorithm
auto compute_df_cytron(const DominatorTree& dt)
-> std::map<BasicBlock*, std::set<BasicBlock*>> {
auto df = std::map<BasicBlock*, std::set<BasicBlock*>> {};
// Post-order traversal of dominator tree
for (auto* b : post_order(dt)) {
// Local contribution
for (auto* pred : b->predecessors()) {
if (dt.idom(b) != pred) {
df[pred].insert(b);
}
}
// Up contribution
for (auto* child : dt.children(b)) {
for (auto* w : df[child]) {
if (dt.idom(w) != b) {
df[b].insert(w);
}
}
}
}
return df;
}
This algorithm exploits the property that dominance frontiers can be computed bottom-up in the dominator tree, achieving better performance for large CFGs.
Phase 2: Phi Function Placement
Given a set of blocks $S$ that contain definitions of variable $v$, we must place $\phi$-functions at the iterated dominance frontier:
$$\text{DF}^+(S) = \bigcup_{i=1}^{\infty} \text{DF}^i(S)$$
where $\text{DF}^1(S) = \text{DF}(S)$ and $\text{DF}^{i+1}(S) = \text{DF}(S \cup \text{DF}^i(S))$.
Theorem 3.1 (Minimal Phi Placement). The set $\text{DF}^+(S)$ represents the minimal set of blocks requiring $\phi$-functions for variable $v$ defined in blocks $S$.
Phase 3: Variable Renaming
The renaming phase performs a depth-first traversal of the dominator tree:
// Algorithm 4: SSA Variable Renaming
class SSARenamer {
std::map<string, std::stack<std::string>> var_stacks_;
std::map<std::string, int32_t> var_counters_;
auto rename_block(BasicBlock* bb, DominatorTree* dt) -> void {
auto saved_sizes = std::map<std::string, size_t> {};
// Save stack sizes for restoration
for (const auto& [var, stack] : var_stacks_) {
saved_sizes[var] = stack.size();
}
// Process phi functions
for (auto& phi : bb->phis()) {
auto new_name = get_fresh_name(phi.var);
phi.result = new_name;
var_stacks_[phi.var].push(new_name);
}
// Process instructions
for (auto& inst : bb->instructions()) {
// Rename uses
for (auto& operand : inst.operands()) {
if (is_variable(operand) && !var_stacks_[operand].empty()) {
operand = var_stacks_[operand].top();
}
}
// Rename definition
if (inst.defines_variable()) {
auto new_name = get_fresh_name(inst.result);
var_stacks_[inst.result].push(new_name);
inst.result = new_name;
}
}
// Fill in phi operands in successors
for (auto* succ : bb->successors()) {
for (auto& phi : succ->phis()) {
if (!var_stacks_[phi.var].empty()) {
phi.add_operand(var_stacks_[phi.var].top(), bb);
}
}
}
// Recursively process dominated blocks
for (auto* child : dt->children(bb)) {
rename_block(child, dt);
}
// Restore stacks
for (auto& [var, stack] : var_stacks_) {
while (stack.size() > saved_sizes[var]) {
stack.pop();
}
}
}
};
3.4 Properties of SSA Form
SSA form exhibits several important properties that facilitate optimization:
Property 3.1 (Unique Reaching Definition). For every use of a variable $v$ in SSA form, there exists exactly one reaching definition.
Property 3.2 (Dominance Property). Every use of a variable $v$ is dominated by its unique definition.
These properties enable efficient implementation of many optimizations:
- Constant Propagation: $O(n)$ using sparse conditional constant propagation
- Dead Code Elimination: $O(n)$ using a simple mark-and-sweep algorithm
- Global Value Numbering: $O(n \log n)$ using hash-based partitioning
4. LICM in SSA Form
The combination of LICM with SSA form yields particularly elegant algorithms. The explicit use-def chains in SSA eliminate the need for iterative dataflow analysis.
4.1 SSA-Based Loop Invariant Detection
In SSA form, loop invariance can be determined by a simple graph traversal:
// Algorithm 5: SSA-based Loop Invariant Detection
bool is_loop_invariant_ssa(Value* v, Loop* L) {
if (auto* inst = dynamic_cast<Instruction*>(v)) {
// Check if instruction is in loop
if (!L->contains(inst->parent())) {
return true; // Defined outside loop
}
// Check if it's a phi at loop header
if (auto* phi = dynamic_cast<PhiNode*>(inst)) {
if (phi->parent() == L->header()) {
// Loop-carried dependency - not invariant
return false;
}
}
// Recursively check operands
for (auto* operand : inst->operands()) {
if (!is_loop_invariant_ssa(operand, L)) {
return false;
}
}
return true;
}
// Constants and function arguments are invariant
return true;
}
4.2 Advantages of SSA-Based LICM
The SSA representation provides several advantages for LICM:
- Direct Use-Def Chains: No need for reaching definitions analysis
- Simplified Dependency Analysis: Dependencies are explicit in the SSA graph
- Efficient Safety Checks: Dominance relationships are preserved in SSA
Theorem 4.1. LICM in SSA form has complexity $O(n \cdot \alpha(n))$ where $n$ is the number of instructions and $\alpha$ is the inverse Ackermann function.
Proof sketch. Each instruction is visited once, and dominance queries using preprocessed dominator trees have near-constant time complexity.
5. Advanced Topics
5.1 Memory SSA
Traditional SSA only handles scalar variables. Memory SSA extends the concept to memory operations, addressing the challenge of applying SSA concepts when memory locations are not statically known and may alias.
Key Concepts:
- Memory Definitions (MemoryDef): Store operations that modify memory state
- Memory Uses (MemoryUse): Load operations that read memory
- Memory Phi (MemoryPhi): Merge points for memory states at control flow joins
// Memory SSA representation
class MemorySSA {
// Memory states are versioned like variables
using MemoryVersion = int32_t;
struct MemoryDef {
Instruction* inst;
MemoryVersion version;
MemoryVersion uses; // Previous version
};
struct MemoryUse {
Instruction* inst;
MemoryVersion version; // Version being used
};
struct MemoryPhi {
BasicBlock* block;
MemoryVersion result;
std::vector<std::pair<MemoryVersion, BasicBlock*>> operands;
};
// Walking algorithm for Memory SSA optimization
Value* optimize_memory_access(MemoryAccess* MA, AliasAnalysis* AA) {
if (auto* MU = dyn_cast<MemoryUse>(MA)) {
auto* DefiningAccess = MU->getDefiningAccess();
// Walk up the memory SSA graph
while (auto* MD = dyn_cast<MemoryDef>(DefiningAccess)) {
if (AA->mustAlias(MU->getMemoryInst(),
MD->getMemoryInst())) {
// Found exact definition
return MD->getValue();
}
if (!AA->mayAlias(MU->getMemoryInst(),
MD->getMemoryInst())) {
// This def doesn't affect our use, skip it
DefiningAccess = MD->getDefiningAccess();
continue;
}
// May alias - conservative assumption
break;
}
}
return nullptr;
}
};
Memory SSA enables precise alias analysis and memory-based optimizations within the SSA framework. For example, it can identify redundant loads:
// Before Memory SSA optimization
x = load ptr1
store ptr2, 42
y = load ptr1 // Is this redundant?
// With Memory SSA, if ptr1 and ptr2 don't alias:
x = load ptr1
store ptr2, 42
y = x // Load eliminated!
This representation is particularly powerful for optimizations like:
- Load/store forwarding
- Dead store elimination
- Memory-aware LICM
- Alias-aware redundancy elimination
5.2 Gated SSA and Predicated SSA
Extensions to basic SSA form include:
- Gated SSA (GSA): Includes predicate information at $\phi$-functions
- Predicated SSA: Explicitly represents control dependencies
These forms enable more aggressive optimizations but increase representation complexity.
6. Implementation Considerations
6.1 Engineering Trade-offs
Real-world implementations must balance several factors:
- Construction Time vs. Optimization Power: Pruned SSA reduces construction overhead
- Memory Usage: Semi-pruned SSA offers a middle ground
- Destruction Overhead: Out-of-SSA translation requires careful handling of $\phi$-functions
6.2 Integration with Other Optimizations
SSA form serves as a foundation for numerous optimizations:
- Sparse Conditional Constant Propagation (SCCP)
- Partial Redundancy Elimination (PRE)
- Strength Reduction
- Induction Variable Analysis
7. Conclusion
Loop Invariant Code Motion and Static Single Assignment Form represent fundamental techniques in modern compiler optimization. SSA’s elegant representation of program data flow significantly simplifies the implementation of LICM and other optimizations. The mathematical foundations presented in this report demonstrate the theoretical soundness of these approaches, while the algorithms show their practical applicability.
The synergy between LICM and SSA exemplifies how good intermediate representations can dramatically simplify optimization algorithms. As compiler technology continues to evolve, these foundational techniques remain central to achieving high-performance code generation.
Future research directions include extending these techniques to handle modern programming paradigms such as concurrent and parallel programming, where traditional dominance-based analyses require adaptation to account for non-deterministic execution orders.
References
The algorithms and theoretical foundations presented in this report are based on seminal works in compiler construction, including contributions from Cytron et al. on SSA construction, and extensive research on loop optimizations in production compilers such as LLVM and GCC.