Syntax, Semantics, and Type Inference: A Practical Implementation in C++23

Introduction

In our previous explorations of programming language theory—from the philosophical underpinnings in "A Cross-Disciplinary Analysis," to the rigorous mathematical frameworks in "A Formal Perspective," to the practical implementations in "From Theory to Practice"—we have progressively bridged the gap between abstract theory and concrete implementation. This essay continues that journey by focusing on one of the most powerful features in modern type systems: type inference.

Type inference stands at the intersection of syntax and semantics, allowing programming languages to deduce types without explicit annotations while maintaining strong guarantees about program behavior. It exemplifies how sophisticated logical systems can enhance programmer productivity without sacrificing safety—what Cardelli called "the best of both worlds." This mechanism has evolved from academic research in the 1970s with Hindley-Milner type inference to mainstream adoption in languages like Haskell, ML, Rust, and increasingly in C++ through its auto keyword and template deduction.

In this essay, we implement a subset of Hindley-Milner type inference in modern C++23. By constructing this system, we create a direct bridge between formal type theory and practical programming, demonstrating how abstract concepts like substitution, unification, and polymorphism manifest in executable code. Our implementation leverages C++23's latest features—including improved template metaprogramming, concepts, and monadic error handling—to create a compile-time type inference engine that mirrors the theoretical algorithms while remaining practical and extensible.

This exploration not only illuminates the inner workings of type inference but also provides insight into how programming language features evolve from theoretical foundations to everyday tools. As we will see, the journey from lambda calculus to modern C++ template metaprogramming reveals the enduring value of formal semantics in creating robust and expressive programming systems.

Theoretical Foundations of Type Inference

The Hindley-Milner Type System

The Hindley-Milner type system (also known as Damas-Milner or Algorithm W) provides the theoretical foundation for type inference in many functional programming languages. It combines parametric polymorphism with automatic type inference, allowing programmers to write generic code without explicit type annotations while maintaining strong static type guarantees.

At its core, Hindley-Milner is based on simply typed lambda calculus extended with polymorphic types. The key insight is that it's possible to automatically infer the most general type for an expression through constraint solving. The formal syntax of types in Hindley-Milner can be defined as:

$$\begin{align}\tau ::= \ & \alpha & \text{(type variable)} \\ & | \ \tau \rightarrow \tau & \text{(function type)} \\ & | \ T & \text{(base type)} \\ & | \ T \ \tau_1 \ldots \tau_n & \text{(type application)} \end{align}$$

A key distinction in Hindley-Milner is between monotypes (types without quantifiers) and polytypes (quantified types), which can be formalized as:

$$\sigma ::= \ \tau \ | \ \forall \alpha. \sigma$$

This distinction enables parametric polymorphism—the ability to define functions that work on values of many different types, without depending on their specific implementations.

Type Inference Rules

The Hindley-Milner type system is defined by a set of inference rules that determine how types are assigned to expressions. These rules establish judgments of the form $\Gamma \vdash e : \tau$, meaning "in context $\Gamma$, expression $e$ has type $\tau$."

Some key rules include:

  1. Variable Rule:
    $$\frac{x : \sigma \in \Gamma}{\Gamma \vdash x : \sigma}$$

    This rule states that if a variable $x$ is bound to a polymorphic type $\sigma$ in the environment $\Gamma$, then we can immediately conclude that $x$ has type $\sigma$. In other words, the type of a variable is determined directly from its declaration in the type environment.

  2. Abstraction Rule:
    $$\frac{\Gamma, x : \tau_1 \vdash e : \tau_2}{\Gamma \vdash \lambda x.e : \tau_1 \rightarrow \tau_2}$$

    This rule applies to lambda abstractions (anonymous functions). It says that if, by extending the environment $\Gamma$ with a binding for $x$ of type $\tau_1$, you can derive that the expression $e$ has type $\tau_2$, then the lambda abstraction $\lambda x.e$ itself has the function type $\tau_1 \rightarrow \tau_2$. Essentially, it constructs the function’s type from the type of its parameter and the type of its body.

  3. Application Rule:
    $$\frac{\Gamma \vdash e_1 : \tau_1 \rightarrow \tau_2 \quad \Gamma \vdash e_2 : \tau_1}{\Gamma \vdash e_1 \; e_2 : \tau_2}$$

    This rule covers function application. It indicates that if an expression $e_1$ is a function with type $\tau_1 \rightarrow \tau_2$ and another expression $e_2$ has the matching type $\tau_1$, then applying $e_1$ to $e_2$ yields a result of type $\tau_2$. This ensures that the function's parameter type and the argument's type align correctly.

  4. Generalization Rule:
    $$\frac{\Gamma \vdash e : \tau \quad \alpha \not\in \text{FV}(\Gamma)}{\Gamma \vdash e : \forall \alpha. \tau}$$

    This rule deals with polymorphism. It states that if an expression $e$ has type $\tau$ in the environment $\Gamma$ and a type variable $\alpha$ does not appear free in $\Gamma$, then $e$ can be assigned the more general (polymorphic) type $\forall \alpha. \tau$. This generalization allows the expression to be used in multiple type contexts by abstracting over the specific type variable.

  5. Instantiation Rule:
    $$\frac{\Gamma \vdash e : \forall \alpha. \tau}{\Gamma \vdash e : \tau[\tau'/\alpha]}$$

    This rule is used for instantiating polymorphic types. When an expression $e$ has a polymorphic type $\forall \alpha. \tau$, it can be specialized to a specific type by substituting a particular type $\tau'$ for the type variable $\alpha$. The result is a more concrete type $\tau[\tau'/\alpha]$, which is then used in further type-checking.

These rules form the basis of the type checking process, but for type inference, we need an algorithm that can determine types without explicit annotations.

Algorithm W and Unification

The classic algorithm for Hindley-Milner type inference is Algorithm W, developed by Damas and Milner. The key components of this algorithm are:

  1. Type Variable Generation: Fresh type variables are created for expressions with unknown types.
  2. Constraint Generation: As the algorithm processes expressions, it generates equality constraints between types.
  3. Unification: These constraints are solved using unification, resulting in substitutions that make the types equal.
  4. Generalization: Free type variables are quantified to create polymorphic types when binding expressions to names.

The unification algorithm is particularly important, as it resolves type equations like $\tau_1 = \tau_2$. The basic unification rules can be specified as:

  1. $\text{unify}(\alpha, \alpha) = \emptyset$ (empty substitution)
  2. $\text{unify}(\alpha, \tau) = [\alpha \mapsto \tau]$ if $\alpha \not\in \text{FV}(\tau)$
  3. $\text{unify}(\tau, \alpha) = [\alpha \mapsto \tau]$ if $\alpha \not\in \text{FV}(\tau)$
  4. $\text{unify}(\tau_1 \rightarrow \tau_2, \tau_3 \rightarrow \tau_4) = \text{unify}(\tau_1, \tau_3) \circ \text{unify}(\tau_2[\theta], \tau_4[\theta])$ where $\theta = \text{unify}(\tau_1, \tau_3)$
  5. $\text{unify}(T, T) = \emptyset$ for base type $T$
  6. $\text{unify}(T \; \tau_1 \ldots \tau_n, T \; \tau_1^{\prime} \ldots \tau_n^{\prime}) = \text{unify}(\tau_1, \tau_1^{\prime}) \circ \ldots \circ \text{unify}(\tau_n[\theta_{n-1}], \tau_n^{\prime}[\theta_{n-1}])$ where $\theta_i$ is the composition of all previous substitutions

If none of these rules apply, unification fails, indicating a type error. The beauty of Algorithm W is that it not only detects type errors but also infers the most general type for an expression, allowing maximum flexibility in how that expression can be used.

Implementing Type Inference in C++23

Type Representation and Operations

To implement our type inference system in C++23, we first need to represent types and type variables. We'll define a type system with basic types, function types, and type variables:

// Type variable representation
struct TypeVar {
  std::string name;

  explicit TypeVar(std::string n) : name(std::move(n)) {}

  auto operator==(const TypeVar& other) const -> bool {
    return name == other.name;
  }
};

// Forward declaration of the Type class hierarchy
class Type;
using TypePtr = std::shared_ptr<Type>;

// Fresh type variable generator
class TypeVarGenerator {
private:
  int32_t counter = 0;

public:
  auto fresh() const -> TypeVar {
    return TypeVar("t" + std::to_string(++counter));
  }
};

// Base abstract type class
class Type : public std::enable_shared_from_this<Type> {
public:
  virtual ~Type() = default;

  virtual auto operator==(const Type& other) const -> bool = 0;
  virtual auto to_string() const -> std::string = 0;
  virtual auto substitute(const TypeVar& var,
                          const TypePtr& replacement) const
    -> std::expected<TypePtr, std::string> = 0;
  virtual auto free_vars() const -> std::set<TypeVar> = 0;
};

// Base concrete types like Int, Bool, etc.
class BaseType : public Type {
private:
  std::string name;

public:
  explicit BaseType(std::string n) : name(std::move(n)) {}

  auto operator==(const Type& other) const -> bool override {
    if (auto* base_type = dynamic_cast<const BaseType*>(&other)) {
      return name == base_type->name;
    }
    return false;
  }

  auto to_string() const -> std::string override { return name; }

  auto substitute(const TypeVar& var,
                  const TypePtr& replacement) const
      -> std::expected<TypePtr, std::string> override {
    return shared_from_this(); // Base types contain no variables
  }

  auto free_vars() const -> std::set<TypeVar> override {
    return {}; // Base types have no free variables
  }
};

// Type variables within the type system
class VarType : public Type {
private:
  TypeVar var;

public:
  explicit VarType(TypeVar v) : var(std::move(v)) {}

  auto operator==(const Type& other) const -> bool override {
    if (auto* var_type = dynamic_cast<const VarType*>(&other)) {
      return var == var_type->var;
    }
    return false;
  }

  auto to_string() const -> std::string override {
    return var.name;
  }

  auto substitute(const TypeVar& v,
                  const TypePtr& replacement) const
      -> std::expected<TypePtr, std::string> override {
    if (var == v) {
      return replacement;
    }
    return shared_from_this();
  }

  auto free_vars() const -> std::set<TypeVar> override {
    return {var};
  }

  auto get_var() const -> const TypeVar& { return var; }
};

// Function types: T1 -> T2
class FunctionType : public Type {
private:
  TypePtr from;
  TypePtr to;

public:
  FunctionType(TypePtr f, TypePtr t)
    : from(std::move(f)), to(std::move(t)) {}

  auto operator==(const Type& other) const -> bool override {
    if (auto* func_type = dynamic_cast<const FunctionType*>(&other)) {
      return *from == *(func_type->from) && *to == *(func_type->to);
    }
    return false;
  }

  auto to_string() const -> std::string override {
    // Add parentheses around function types on the left side
    auto from_str = from->to_string();
    if (dynamic_cast<const FunctionType*>(from.get())) {
      from_str = "(" + from_str + ")";
    }
    return from_str + " -> " + to->to_string();
  }

  auto substitute(const TypeVar& var,
                  const TypePtr& replacement) const
      -> std::expected<TypePtr, std::string> override {
    auto new_from = from->substitute(var, replacement);
    if (!new_from) {
      return std::unexpected {new_from.error()};
    }

    auto new_to = to->substitute(var, replacement);
    if (!new_to) {
      return std::unexpected {new_to.error()};
    }

    return std::make_shared<FunctionType>(*new_from, *new_to);
  }

  auto free_vars() const -> std::set<TypeVar> override {
    auto from_vars = from->free_vars();
    auto to_vars = to->free_vars();
    from_vars.insert(to_vars.begin(), to_vars.end());
    return from_vars;
  }

  auto get_from() const -> const TypePtr& { return from; }
  auto get_to() const -> const TypePtr& { return to; }
};

// Helper functions to create types
inline auto make_base_type(const std::string& name) -> TypePtr {
  return std::make_shared<BaseType>(name);
}

inline auto make_var_type(const TypeVar& var) -> TypePtr {
  return std::make_shared<VarType>(var);
}

inline auto make_function_type(const TypePtr& from, const TypePtr& to)
    -> TypePtr {
  return std::make_shared<FunctionType>(from, to);
}

// Common base types
inline auto int_type() -> TypePtr {
  return make_base_type("Int");
}
inline auto bool_type() -> TypePtr {
  return make_base_type("Bool");
}
inline auto string_type() -> TypePtr {
  return make_base_type("String");
}

This code defines our core type system with base types, type variables, and function types. Each type supports operations needed for the inference algorithm, such as substitution (replacing a type variable with a type) and finding free variables.

Substitution and Unification

Next, we implement substitution and unification, the core operations of type inference:

// Substitution represents a mapping from type variables to types
class Substitution {
private:
  std::map<TypeVar, TypePtr> mappings;

public:
  Substitution() = default;

  // Add a new mapping
  void add(const TypeVar& var, const TypePtr& type) {
    mappings[var] = type;
  }

  // Apply this substitution to a type
  auto apply(const TypePtr& type) const
      -> std::expected<TypePtr, std::string> {
    if (!type) {
      return std::unexpected {"Null type pointer"};
    }

    // Check if it's a type variable that we have a mapping for
    if (auto var_type = std::dynamic_pointer_cast<VarType>(type)) {
      auto it = mappings.find(var_type->get_var());
      if (it != mappings.end()) {
        // Apply recursively to the mapped type
        return apply(it->second);
      }
      return type;  // No mapping for this variable
    }

    // For function types, apply substitution to both sides
    if (auto func_type = std::dynamic_pointer_cast<FunctionType>(type)) {
      auto new_from = apply(func_type->get_from());
      if (!new_from) {
        return std::unexpected {new_from.error()};
      }

      auto new_to = apply(func_type->get_to());
      if (!new_to) {
        return std::unexpected {new_to.error()};
      }

      return std::make_shared<FunctionType>(*new_from, *new_to);
    }

    // Base types are unchanged
    return type;
  }

  // Compose this substitution with another one
  auto compose(const Substitution& other) const
      -> std::expected<Substitution, std::string> {
    auto result = Substitution {other};

    // Apply this substitution to all mappings in the other
    // substitution
    for (const auto& [var, type] : other.mappings) {
      auto new_type = apply(type);
      if (!new_type) {
        return std::unexpected {new_type.error()};
      }
      result.mappings[var] = *new_type;
    }

    // Add all mappings from this substitution
    for (const auto& [var, type] : mappings) {
      if (other.mappings.find(var) == other.mappings.end()) {
        result.mappings[var] = type;
      }
    }

    return result;
  }
};

// The unification algorithm
auto unify(const TypePtr& t1, const TypePtr& t2)
    -> std::expected<Substitution, std::string> {
  // Base case: both are the same type
  if (*t1 == *t2) {
    return Substitution {};
  }

  // Case: t1 is a type variable
  if (auto var_type = std::dynamic_pointer_cast<VarType>(t1)) {
    const auto& var = var_type->get_var();

    // Check for occurs check (prevents infinite types)
    if (t2->free_vars().contains(var)) {
      return std::unexpected {"Occurs check failed: " + var.name +
                              " occurs in " + t2->to_string()};
    }

    auto subst = Substitution {};
    subst.add(var, t2);
    return subst;
  }

  // Case: t2 is a type variable
  if (auto var_type = std::dynamic_pointer_cast<VarType>(t2)) {
    const auto& var = var_type->get_var();

    // Check for occurs check
    if (t1->free_vars().contains(var)) {
      return std::unexpected {"Occurs check failed: " + var.name +
                              " occurs in " + t1->to_string()};
    }

    auto subst = Substitution {};
    subst.add(var, t1);
    return subst;
  }

  // Case: both are function types
  auto func1 = std::dynamic_pointer_cast<FunctionType>(t1);
  auto func2 = std::dynamic_pointer_cast<FunctionType>(t2);

  if (func1 && func2) {
    // Unify the parameter types
    auto s1 = unify(func1->get_from(), func2->get_from());
    if (!s1) {
      return std::unexpected {s1.error()};
    }

    // Apply that substitution to the return types and unify them
    auto new_t1_to = s1->apply(func1->get_to());
    if (!new_t1_to) {
      return std::unexpected {new_t1_to.error()};
    }

    auto new_t2_to = s1->apply(func2->get_to());
    if (!new_t2_to) {
      return std::unexpected {new_t2_to.error()};
    }

    auto s2 = unify(*new_t1_to, *new_t2_to);
    if (!s2) {
      return std::unexpected {s2.error()};
    }

    // Compose the two substitutions
    return s1->compose(*s2);
  }

  // Types don't unify
  return std::unexpected {"Types don't unify: " + t1->to_string() +
                          " and " + t2->to_string()};
}

The Substitution class represents a mapping from type variables to types, and provides operations to apply substitutions to types and compose substitutions. The unify function implements the unification algorithm, which finds a substitution that makes two types equal if possible.

Type Environment and Inference Algorithm

Now we can implement the type environment and the core inference algorithm:

// Type scheme represents polymorphic types with quantified variables
class TypeScheme {
private:
  std::vector<TypeVar> quantified_vars;
  TypePtr type;

public:
  TypeScheme(std::vector<TypeVar> vars, TypePtr t)
    : quantified_vars(std::move(vars)), type(std::move(t)) {}

  // Instantiate this scheme by replacing quantified variables
  // with fresh ones
  auto instantiate(TypeVarGenerator& gen) const -> TypePtr {
    auto subst = Substitution {};
    for (const auto& var : quantified_vars) {
      subst.add(var, make_var_type(gen.fresh()));
    }
    auto result = subst.apply(type);
    assert(result); // This should never fail
    return *result;
  }

  auto to_string() -> std::string const {
    if (quantified_vars.empty()) {
      return type->to_string();
    }

    auto result = std::string {"forall "};
    for (auto i = 0uz; i < quantified_vars.size(); ++i) {
      result += quantified_vars[i].name;
      if (i < quantified_vars.size() - 1) {
        result += ", ";
      }
    }
    result += ". " + type->to_string();
    return result;
  }
};

// Type environment maps variable names to type schemes
class TypeEnv {
private:
  std::map<std::string, TypeScheme> env;

public:
  // Look up a variable in the environment
  auto lookup(const std::string& name) const
      -> std::expected<TypeScheme, std::string> {
    auto it = env.find(name);
    if (it == env.end()) {
      return std::unexpected {"Unbound variable: " + name};
    }
    return it->second;
  }

  // Extend the environment with a new binding
  void extend(const std::string& name, const TypeScheme& scheme) {
    env[name] = scheme;
  }

  // Apply a substitution to all types in the environment
  auto apply(const Substitution& subst) const
      -> std::expected<TypeEnv, std::string> {
    auto new_env = TypeEnv {};
    for (const auto& [name, scheme] : env) {
      auto new_type = subst.apply(scheme.type);
      if (!new_type) {
        return std::unexpected {new_type.error()};
      }
      new_env.extend(
        name, TypeScheme {
          scheme.quantified_vars,
          *new_type
        }
      );
    }
    return new_env;
  }

  // Find the free type variables in the environment
  auto free_vars() const -> std::set<TypeVar> {
    auto result = std::set<TypeVar> {};
    for (const auto& [name, scheme] : env) {
      auto type_vars = scheme.type->free_vars();
      for (const auto& var : type_vars) {
        // Only include variables that aren't quantified
        if (std::find(scheme.quantified_vars.begin(),
                      scheme.quantified_vars.end(), var) ==
            scheme.quantified_vars.end()) {
          result.insert(var);
        }
      }
    }
    return result;
  }
};

// Abstract syntax tree nodes for our simple language
class Expr {
public:
  virtual ~Expr() = default;

  virtual auto to_string() const -> std::string = 0;
};

// Integer literal
class IntLit : public Expr {
private:
  int32_t value;

public:
  explicit IntLit(int32_t v) : value(v) {}

  auto to_string() const -> std::string override {
    return std::to_string(value);
  }
  auto get_value() const -> int32_t { return value; }
};

// Boolean literal
class BoolLit : public Expr {
private:
  bool value;

public:
  explicit BoolLit(bool v) : value(v) {}

  auto to_string() const -> std::string override {
    return value ? "true" : "false";
  }
  auto get_value() const -> bool { return value; }
};

// Variable reference
class Var : public Expr {
private:
  std::string name;

public:
  explicit Var(std::string n) : name(std::move(n)) {}

  auto to_string() const -> std::string override {
    return name;
  }
  auto get_name() const -> const std::string& {
    return name;
  }
};

// Lambda abstraction
class Lambda : public Expr {
private:
  std::string param;
  std::shared_ptr<Expr> body;

public:
  Lambda(std::string p, std::shared_ptr<Expr> b)
      : param(std::move(p)), body(std::move(b)) {}

  auto to_string() const -> std::string override {
    return "λ" + param + "." + body->to_string();
  }
  auto get_param() const -> const std::string& {
    return param;
  }
  auto get_body() const -> const std::shared_ptr<Expr>& {
    return body;
  }
};

// Function application
class Apply : public Expr {
private:
  std::shared_ptr<Expr> func;
  std::shared_ptr<Expr> arg;

public:
  Apply(std::shared_ptr<Expr> f, std::shared_ptr<Expr> a)
      : func(std::move(f)), arg(std::move(a)) {}

  auto to_string() const -> std::string override {
    return "(" + func->to_string() + " " + arg->to_string() + ")";
  }
  auto get_func() const -> const std::shared_ptr<Expr>& {
    return func;
  }
  auto get_arg() const -> const std::shared_ptr<Expr>& {
    return arg;
  }
};

// Let binding
class Let : public Expr {
private:
  std::string name;
  std::shared_ptr<Expr> value;
  std::shared_ptr<Expr> body;

public:
  Let(std::string n, std::shared_ptr<Expr> v, std::shared_ptr<Expr> b)
    : name(std::move(n)), value(std::move(v)), body(std::move(b)) {}

  auto to_string() const -> std::string override {
    return "let " + name + " = " + value->to_string()
           + " in " + body->to_string();
  }
  auto get_name() const -> const std::string& {
    return name;
  }
  auto get_value() const -> const std::shared_ptr<Expr>& {
    return value;
  }
  auto get_body() const -> const std::shared_ptr<Expr>& {
    return body;
  }
};

// Infer algorithm based on Algorithm W
auto infer(TypeEnv& env, TypeVarGenerator& gen,
           const std::shared_ptr<Expr>& expr)
    -> std::expected<std::pair<Substitution, TypePtr>, std::string> {
  // Integer literal
  if (auto int_lit = std::dynamic_pointer_cast<IntLit>(expr)) {
    return std::pair {Substitution {}, int_type()};
  }

  // Boolean literal
  if (auto bool_lit = std::dynamic_pointer_cast<BoolLit>(expr)) {
    return std::pair {Substitution {}, bool_type()};
  }

  // Variable reference
  if (auto var = std::dynamic_pointer_cast<Var>(expr)) {
    auto scheme_result = env.lookup(var->get_name());
    if (!scheme_result) {
      return std::unexpected {scheme_result.error()};
    }

    // Instantiate the type scheme to get a fresh type
    auto type = scheme_result->instantiate(gen);
    return std::pair {Substitution {}, type};
  }

  // Lambda abstraction
  if (auto lambda = std::dynamic_pointer_cast<Lambda>(expr)) {
    // Create a fresh type variable for the parameter
    auto param_var = gen.fresh();
    auto param_type = make_var_type(param_var);

    // Extend the environment with the parameter
    auto new_env = env;
    new_env.extend(lambda->get_param(),
                   TypeScheme {
                     std::vector<TypeVar> {}, param_type
                   });

    // Infer the type of the body
    auto body_result = infer(new_env, gen, lambda->get_body());
    if (!body_result) {
        return std::unexpected {body_result.error()};
    }

    auto [body_subst, body_type] = *body_result;

    // Apply the substitution to the parameter type
    auto new_param_type = body_subst.apply(param_type);
    if (!new_param_type) {
        return std::unexpected {new_param_type.error()};
    }

    // Return the function type
    auto func_type = make_function_type(*new_param_type, body_type);
    return std::pair {body_subst, func_type};
  }

  // Function application
  if (auto apply = std::dynamic_pointer_cast<Apply>(expr)) {
    // Infer the type of the function
    auto func_result = infer(env, gen, apply->get_func());
    if (!func_result) {
      return std::unexpected {func_result.error()};
    }
    auto [func_subst, func_type] = *func_result;

    // Apply the substitution to the environment
    auto new_env_result = env.apply(func_subst);
    if (!new_env_result) {
      return std::unexpected {new_env_result.error()};
    }

    // Infer the type of the argument in the updated environment
    auto arg_result = infer(*new_env_result, gen, apply->get_arg());
    if (!arg_result) {
      return std::unexpected {arg_result.error()};
    }
    auto [arg_subst, arg_type] = *arg_result;

    // Create a fresh type variable for the result
    auto result_var = gen.fresh();
    auto result_type = make_var_type(result_var);

    // Apply the argument substitution to the function type
    auto new_func_type = arg_subst.apply(func_type);
    if (!new_func_type) {
      return std::unexpected {new_func_type.error()};
    }

    // Unify the function type with (arg_type -> result_type)
    auto expected_func_type = make_function_type(arg_type, result_type);
    auto unify_result = unify(*new_func_type, expected_func_type);
    if (!unify_result) {
      return std::unexpected {unify_result.error()};
    }

    // Compose all the substitutions
    auto compose_result = func_subst.compose(arg_subst);
    if (!compose_result) {
      return std::unexpected {compose_result.error()};
    }
    auto composed_subst = compose_result->compose(*unify_result);
    if (!composed_subst) {
      return std::unexpected {composed_subst.error()};
    }

    // Apply the final substitution to the result type
    auto final_type = composed_subst->apply(result_type);
    if (!final_type) {
      return std::unexpected {final_type.error()};
    }

    return std::pair {*composed_subst, *final_type};
  }

  // Let binding
  if (auto let = std::dynamic_pointer_cast<Let>(expr)) {
    // Infer the type of the value
    auto value_result = infer(env, gen, let->get_value());
    if (!value_result) {
      return std::unexpected {value_result.error()};
    }
    auto [value_subst, value_type] = *value_result;

    // Apply the substitution to the environment
    auto new_env_result = env.apply(value_subst);
    if (!new_env_result) {
      return std::unexpected {new_env_result.error()};
    }
    auto new_env = *new_env_result;

    // Generalize the type to create a type scheme
    auto env_vars = new_env.free_vars();
    auto type_vars = value_type->free_vars();

    // Find variables in the type that are not in the environment
    auto quantified_vars = std::vector<TypeVar> {};
    for (const auto& var : type_vars) {
      if (env_vars.find(var) == env_vars.end()) {
        quantified_vars.push_back(var);
      }
    }

    // Create the type scheme and extend the environment
    auto scheme = TypeScheme {quantified_vars, value_type};
    new_env.extend(let->get_name(), scheme);

    // Infer the type of the body in the extended environment
    auto body_result = infer(new_env, gen, let->get_body());
    if (!body_result) {
      return std::unexpected {body_result.error()};
    }
    auto [body_subst, body_type] = *body_result;

    // Compose the substitutions
    auto compose_result = value_subst.compose(body_subst);
    if (!compose_result) {
      return std::unexpected {compose_result.error()};
    }

    return std::pair {*compose_result, body_type};
  }

  return std::unexpected {"Unknown expression type"};
}

// Helper functions to create expressions
inline auto make_int(int value) -> std::shared_ptr<Expr> {
  return std::make_shared<IntLit>(value);
}

inline auto make_bool(bool value) -> std::shared_ptr<Expr> {
  return std::make_shared<BoolLit>(value);
}

inline auto make_var(const std::string& name) -> std::shared_ptr<Expr> {
    return std::make_shared<Var>(name);
}

inline auto make_lambda(const std::string& param,
                        const std::shared_ptr<Expr>& body)
    -> std::shared_ptr<Expr> {
  return std::make_shared<Lambda>(param, body);
}

inline auto make_apply(const std::shared_ptr<Expr>& func,
                       const std::shared_ptr<Expr>& arg)
    -> std::shared_ptr<Expr> {
  return std::make_shared<Apply>(func, arg);
}

inline auto make_let(const std::string& name,
                     const std::shared_ptr<Expr>& value,
                     const std::shared_ptr<Expr>& body)
    -> std::shared_ptr<Expr> {
  return std::make_shared<Let>(name, value, body);
}

// Infer the type of an expression and return it as a string
auto infer_type_string(const std::shared_ptr<Expr>& expr)
    -> std::expected<std::string, std::string> {
  auto env = TypeEnv {};
  auto gen = TypeVarGenerator {};
  auto result = infer(env, gen, expr);
  if (!result) {
    return std::unexpected {result.error()};
  }
  auto [subst, type] = *result;
  auto final_type = subst.apply(type);
  if (!final_type) {
    return std::unexpected {final_type.error()};
  }
  return (*final_type)->to_string();
}

This implementation includes a TypeScheme class to represent polymorphic types, a TypeEnv class to manage variable bindings, and a set of expression classes for our simple language. The infer function implements Algorithm W to infer types for expressions, and the infer_type_string function provides a convenient way to get the inferred type as a string.

Example Usage and Tests

To demonstrate our type inference system, let's use it on a variety of examples:

void run_example(const std::string& name, const std::shared_ptr<Expr>& expr) {
  std::cout << "Example: " << name << std::endl;
  std::cout << "Expression: " << expr->to_string() << std::endl;

  auto result = infer_type_string(expr);
  if (result) {
    std::cout << "Inferred type: " << *result << std::endl;
  } else {
    std::cout << "Type error: " << result.error() << std::endl;
  }
  std::cout << std::endl;
}

auto main() -> int32_t {
  // Example 1: Identity function
  auto identity = make_lambda("x", make_var("x"));
  run_example("Identity function", identity);

  // Example 2: Constant function
  auto const_func = make_lambda("x", make_lambda("y", make_var("x")));
  run_example("Constant function", const_func);

  // Example 3: Application
  auto app = make_apply(identity, make_int(42));
  run_example("Application of identity to int", app);

  // Example 4: Let polymorphism
  auto let_poly = make_let("id", identity,
                          make_apply(
                            make_apply(
                              make_var("id"), make_var("id")
                            ),
                            make_int(42)
                          ));
  run_example("Let polymorphism", let_poly);

  // Example 5: Type error (applying int to int)
  auto type_error = make_apply(make_int(42), make_int(42));
  run_example("Type error", type_error);

  return 0;
}

This program demonstrates the type inference system on several examples, including identity functions, constant functions, function applications, polymorphic let bindings, and type errors.

Extending with C++23 Features

Our implementation already uses C++23 features like std::expected for error handling. Let's enhance it further with additional C++23 features:

Concepts for Constraints

We can use C++23 concepts to define constraints on types:

// Concept for types that can be substituted
template<typename T>
concept Substitutable = requires(
  T t, const TypeVar& var, const TypePtr& replacement
) {
  {
    t.substitute(var, replacement)
  } -> std::convertible_to<std::expected<TypePtr, std::string>>;
};

// Concept for types that have free variables
template<typename T>
concept HasFreeVars = requires(T t) {
  { t.free_vars() } -> std::convertible_to<std::set<TypeVar>>;
};

// Enhanced unification with concepts
template<Substitutable T1, Substitutable T2>
  requires HasFreeVars<T1> && HasFreeVars<T2>
auto unify_generic(
  const std::shared_ptr<T1>& t1, const std::shared_ptr<T2>& t2
) -> std::expected<Substitution, std::string> {
  // Implementation similar to unify, but using concepts
}

These concepts ensure that types used in our system provide the necessary operations for substitution and finding free variables.

Pattern Matching with inspect Statements

Although pattern matching is not yet standard in C++23, we can demonstrate how it might simplify our code:

// Pseudo-code for pattern matching in future C++
auto infer_with_pattern_matching(
  TypeEnv& env, TypeVarGenerator& gen, const std::shared_ptr<Expr>& expr
) -> std::expected<std::pair<Substitution, TypePtr>, std::string> {
  inspect (expr) {
    IntLit(int32_t value) => return std::pair {Substitution {}, int_type()};

    BoolLit(bool value) => return std::pair {Substitution {}, bool_type()};

    Var(std::string name) => {
      auto scheme_result = env.lookup(name);
      if (!scheme_result) {
        return std::unexpected {scheme_result.error()};
      }
      auto type = scheme_result->instantiate(gen);
      return std::pair {Substitution {}, type};
    }

    Lambda(std::string param, std::shared_ptr<Expr> body) => {
      // Implementation for lambda
    }

    // Other cases...
  }
}

This demonstrates how pattern matching could make our code more concise and readable in future C++ versions.

Coroutines for Lazy Computation

We can use C++23 coroutines for lazy computation of types:

// A lazy type computation using coroutines
struct LazyTypeComputation {
  struct promise_type {
    TypePtr value;

    auto get_return_object() const -> LazyTypeComputation {
      return LazyTypeComputation {
        std::coroutine_handle<promise_type>::from_promise(*this)
      };
    }

    auto initial_suspend() { return std::suspend_always {}; }
    auto final_suspend() noexcept { return std::suspend_always {}; }

    void return_value(TypePtr v) { value = std::move(v); }

    void unhandled_exception() { std::terminate(); }
  };

  std::coroutine_handle<promise_type> handle;

  LazyTypeComputation(std::coroutine_handle<promise_type> h) : handle(h) {}

  ~LazyTypeComputation() {
    if (handle) {
      handle.destroy();
    }
  }

  auto get() const -> TypePtr {
    if (!handle.done()) {
      handle.resume();
    }
    return handle.promise().value;
  }
};

// Example of lazy type computation
auto compute_type_lazily(
  const std::shared_ptr<Expr>& expr, TypeEnv& env, TypeVarGenerator& gen
) -> LazyTypeComputation {
  auto result = infer(env, gen, expr);
  if (!result) {
    co_return make_base_type("Error");
  }
  auto [subst, type] = *result;
  auto final_type = subst.apply(type);
  if (!final_type) {
    co_return make_base_type("Error");
  }
  co_return *final_type;
}

This allows us to compute types lazily, which can be useful for large expressions where we may not need the full type information immediately.

Applications and Case Studies

Type Inference for a C++ Template Library

One practical application of our type inference system is to enhance template libraries in C++. For example, we could use it to infer the types of complex template expressions:

// A simple vector template
template<typename T>
class Vec {
public:
  auto map(auto f) const -> Vec {
    // Implementation details
  }

  template<typename U>
  auto fold(U init, auto f) const -> U {
    // Implementation details
  }
};

// Type inference for templates
auto infer_template_expression() -> TypePtr {
  auto env = TypeEnv {};
  auto gen = TypeVarGenerator {};

  // Define base template types
  auto a = gen.fresh();
  auto b = gen.fresh();
  auto vec_a = make_base_type("Vec<" + a.name + ">");

  // Define map and fold operations
  env.extend("map", TypeScheme {
    std::vector<TypeVar> {a, b},
    make_function_type(
      make_function_type(make_var_type(a), make_var_type(b)),
      make_function_type(vec_a, make_base_type("Vec<" + b.name + ">"))
    )
  });

  // Example expression: v.map(f).fold(0, g)
  auto expr = /* ... */;

  auto result = infer(env, gen, expr);
  // Process and return the result
}

This demonstrates how our type inference system can help with understanding complex template expressions in C++.

Domain-Specific Language with Inferred Types

Another application is to create a domain-specific language (DSL) with type inference:

// DSL for data processing

// DSL expressions
struct Expression {
  virtual ~Expression() = default;
};

struct DataSource : Expression {
  std::string source_name;
};

struct Filter : Expression {
  std::shared_ptr<Expression> input;
  std::function<bool(const std::any&)> predicate;
};

struct Map : Expression {
  std::shared_ptr<Expression> input;
  std::function<std::any(const std::any&)> transform;
};

// Type inference for DSL
auto infer_dsl_type(const std::shared_ptr<Expression>& expr) -> TypePtr {
  // Map DSL expressions to our type system
  // and perform type inference
}

This shows how our type inference system can be used to provide strong typing for DSLs, improving error detection and developer experience.

Connections to Programming Language Theory

Our implementation provides a concrete manifestation of several key concepts from programming language theory:

Curry-Howard Correspondence

The Curry-Howard correspondence establishes a deep connection between types and propositions in logic. In our implementation, this manifests in how we represent function types:

class FunctionType : public Type {
  TypePtr from;
  TypePtr to;
  // ...
};

This structure mirrors the logical implication $A \implies B$, where a function type $A \rightarrow B$ corresponds to a proof that from premise $A$ we can derive conclusion $B$. Our type inference algorithm thus not only ensures type safety but also validates the logical consistency of the program.

Principal Types and Unification

The Hindley-Milner type system guarantees that if a program is well-typed, it has a principal type—a most general type from which all other valid types can be derived by substitution. Our unification algorithm implements this principle:

auto unify(const TypePtr& t1, const TypePtr& t2)
    -> std::expected<Substitution, std::string> {
  // Implementation that finds the most general unifier
}

By finding the most general unifier, we ensure that our type inference produces the most flexible type for an expression, providing maximum reusability.

Parametric Polymorphism and Type Generalization

Our implementation of let-polymorphism demonstrates how parametric polymorphism allows functions to work on values of many different types:

// Generalize the type to create a type scheme
auto env_vars = new_env.free_vars();
auto type_vars = value_type->free_vars();

// Find variables in the type that are not in the environment
auto quantified_vars = std::vector<TypeVar> {};
for (const auto& var : type_vars) {
  if (env_vars.find(var) == env_vars.end()) {
    quantified_vars.push_back(var);
  }
}

// Create the type scheme
auto scheme = TypeScheme {quantified_vars, value_type};

This generalization process is what enables let-polymorphism, allowing a function defined in a let binding to be used with different types in different contexts.

Conclusion

In this exploration of type inference, we've bridged the gap between formal theory and practical implementation by constructing a Hindley-Milner type inference system in modern C++23. This implementation not only demonstrates the power of algorithm W and unification but also showcases how theoretical concepts from programming language theory can be realized in a mainstream language.

The resulting system provides insights into how type inference works in languages like Haskell, ML, and Rust, and offers a foundation for extending C++ with more sophisticated type inference capabilities. By implementing key concepts like substitution, unification, generalization, and instantiation, we've made abstract theoretical notions concrete and executable.

Our journey through "Syntax, Semantics, and Type Inference" complements the earlier explorations in this series by focusing on how meaning (semantics) can be derived from structure (syntax) in a systematic and provable way. Type inference exemplifies this connection, using structural analysis to derive semantic properties that ensure program correctness.

As programming languages continue to evolve, the principles of type inference will remain fundamental, helping programmers write code that is both flexible and reliable. By understanding these principles at both theoretical and implementation levels, we gain deeper insight into how programming languages bridge the gap between human intent and machine execution—the ultimate goal of programming language design.

Read more

The Convergence of Language, Understanding, and Consciousness: A Philosophical Inquiry into Human and Artificial Cognition

1. Introduction The advent of Large Language Models (LLMs) has prompted a reconsideration of fundamental philosophical questions concerning language, understanding, and consciousness. This essay examines the intersection of Wittgensteinian language philosophy, computational theories of mind, and emergent theories of consciousness to argue that the apparent distinction between human and artificial

By J

Beyond Document Lists: Extending the Unified Query Algebra to Aggregations and Hierarchical Data

Abstract This essay extends the unified query algebra framework by incorporating two critical capabilities missing from the original formulation: general aggregation operations and hierarchical data structures. We demonstrate that while posting lists provide a powerful abstraction for many scenarios, they impose restrictions that prevent the framework from handling certain important

By J