Thoughts

Syntax, Semantics, and Computation: LLMs and Their Computational Boundaries

J

28 Feb 2025 — 27 min read

Introduction

Our previous discussions on syntax, semantics, and segfaults have traversed the spectrum from cross-disciplinary analysis to formal mathematical treatments to practical implementations. In "Syntax, Semantics, and Segfaults: A Cross-Disciplinary Analysis," we explored how structured rules of language (syntax) and their interpretation (semantics) govern both human communication and program execution. "Syntax, Semantics, and Segfaults: A Formal Perspective" delved into the rigorous mathematical frameworks underlying these concepts, while "Syntax, Semantics, and Segfaults: From Theory to Practice - Rust's Type System in C++" demonstrated their concrete implementation. This progression has led us to a natural extension: the computational capabilities of Large Language Models (LLMs), which represent a novel intersection of linguistics, formal systems, and computation.

LLMs stand at an intriguing crossroads of computational theory. Unlike traditional programming languages with explicitly defined syntax and semantics, these models learn to manipulate symbols through statistical patterns derived from vast corpora. This raises profound questions about their computational power: What formal class of computational systems do LLMs represent? Can they achieve Turing completeness, or do they exhibit a weaker form of computation? And if they generate code that executes on Turing-complete systems, what form of "indirect" computational capability does this represent?

These questions are not merely theoretical—they have significant practical implications. Understanding the computational boundaries of LLMs helps us predict when they might fail on certain tasks, explains why they sometimes produce inconsistent outputs when handling complex logical problems, and guides the development of techniques to overcome their inherent limitations.

This essay explores these questions through the lens of formal computation theory, analytical philosophy, and information theory. We will examine how the representational limitations of LLMs compare to universal computation, propose a framework for understanding their "weak" or "indirect" Turing completeness, and consider the implications for both theoretical computer science and practical applications. In doing so, we bridge abstract theories of computation with emerging AI capabilities, continuing our exploration of how structured symbolic manipulation creates meaning—whether in human languages, programming languages, or the statistical patterns learned by neural networks.

The Formal Foundations of Computation

Classical Computability Theory

To understand the computational boundaries of LLMs, we must first revisit the foundations of computation itself. In the 1930s, several equivalent formalisms for computation emerged—most notably Alan Turing's theoretical machines and Alonzo Church's lambda calculus. The Church-Turing thesis posits that these models capture the intuitive notion of what is "effectively calculable." A Turing machine consists of:

A finite set of states $Q$
A finite input alphabet $\Sigma$
A finite tape alphabet $\Gamma \supseteq \Sigma$
A transition function $\delta: Q \times \Gamma \rightarrow Q \times \Gamma \times \lbrace L, R \rbrace$
A initial state $q_0 \in Q$
A set of accepting states $F \subseteq Q$

The machine operates on an infinite tape, reading and writing symbols according to its transition function, which specifies the next state, the symbol to write, and the direction to move the head. This simple model can compute any algorithm that can be effectively calculated—a property known as Turing completeness.

Formal Computation with Turing Machines

Let us examine Turing machines more rigorously. A configuration of a Turing machine $M$ can be represented as a triple $(q, w, i)$ where $q \in Q$ is the current state, $w \in \Gamma^*$ is the contents of the tape, and $i$ is the position of the read/write head. The computation step relation $\vdash_M$ between configurations is defined as follows:

For configurations $(q, w, i)$ and $(q', w', i')$, we have $(q, w, i) \vdash_M (q', w', i')$ if and only if:

If $\delta(q, w_i) = (q', a, L)$, then $w' = w_1 \ldots w_{i-1} \cdot a \cdot w_{i+1} \ldots w_n$ and $i' = i-1$
If $\delta(q, w_i) = (q', a, R)$, then $w' = w_1 \ldots w_{i-1} \cdot a \cdot w_{i+1} \ldots w_n$ and $i' = i+1$

where $w_i$ represents the symbol at position $i$ in string $w$.

We denote the reflexive transitive closure of $\vdash_M$ as $\vdash_M^*$, meaning zero or more computation steps. Formally, a language $L \subseteq \Sigma^*$ is recognized by a Turing machine $M$ if and only if:

$$L = \lbrace w \in \Sigma^* \mid (q_0, w, 0) \vdash_M^* (q_f, w', i) \text{ for some } q_f \in F, w' \in \Gamma^*, i \geq 0 \rbrace$$

This defines the set of all strings that cause $M$ to reach an accepting state after some finite number of steps.

The power of Turing machines stems from their ability to simulate any effective computational procedure. This includes complex algorithms like searching, sorting, and even simulating other Turing machines. We can formalize the notion of one Turing machine simulating another through the concept of a Universal Turing Machine (UTM), denoted $U$, which takes as input a description of a Turing machine $\langle M \rangle$ encoded as a string, along with an input string $w$, and simulates $M$ on $w$:

$$U(\langle M \rangle, w) = M(w)$$

This simulation property is crucial for understanding the limitations of computation, most notably in the proof of the Halting Problem. Let us denote the Halting Problem as the language:

$$HALT = \lbrace \langle M, w \rangle \mid \text{Turing machine } M \text{ halts on input } w \rbrace$$

Turing proved that $HALT$ is not decidable—there exists no Turing machine that can correctly determine, for all possible inputs $\langle M, w \rangle$, whether $M$ halts on $w$. The proof proceeds by contradiction: assuming such a decider $H$ exists, we can construct a Turing machine $D$ that behaves as follows on input $\langle M \rangle$:

$D$ simulates $H$ on input $\langle M, \langle M \rangle \rangle$
If $H$ accepts (meaning $M$ halts on $\langle M \rangle$), then $D$ enters an infinite loop
If $H$ rejects (meaning $M$ does not halt on $\langle M \rangle$), then $D$ halts

Now we ask: does $D$ halt on input $\langle D \rangle$? If $D$ halts on $\langle D \rangle$, then by construction, $H$ must have determined that $D$ does not halt on $\langle D \rangle$—a contradiction. Conversely, if $D$ does not halt on $\langle D \rangle$, then $H$ must have determined that $D$ halts on $\langle D \rangle$—also a contradiction. This diagonalization argument demonstrates the fundamental limits of computation: not all well-defined problems can be algorithmically solved.

For comparing computational models, we often use the concept of reduction. A problem $A$ is reducible to a problem $B$ (written $A \leq_m B$) if there exists a computable function $f$ such that:

$$x \in A \iff f(x) \in B$$

If $B$ is decidable and $A \leq_m B$, then $A$ is also decidable. Conversely, if $A$ is undecidable and $A \leq_m B$, then $B$ is also undecidable. This allows us to establish hierarchies of computational problems based on their relative difficulty.

These formal properties of Turing machines provide the mathematical foundation for analyzing the computational capabilities of other systems, including LLMs, which we will explore in subsequent sections.

From a formal perspective, the key insight is that computation can be understood as the manipulation of symbols according to fixed rules—a viewpoint that connects directly to our earlier discussions of syntax and semantics in "Syntax, Semantics, and Segfaults: A Cross-Disciplinary Analysis." Just as programming languages have formal grammars and well-defined semantics, computation itself can be characterized as following structured transformation rules on symbolic inputs.

The lambda calculus, which we explored in "Syntax, Semantics, and Segfaults: A Formal Perspective," provides an alternative formulation of computation based purely on function abstraction and application. Its core operations are:

Variable reference: $x$
Abstraction: $\lambda x.M$ (defining a function with parameter $x$ and body $M$)
Application: $M N$ (applying function $M$ to argument $N$)

With just these primitives, lambda calculus achieves Turing completeness through the beta-reduction rule:

$$(\lambda x.M)N \to_\beta M[N/x]$$

This rule captures the essence of computation as substitution—replacing formal parameters with actual arguments—a process that mirrors how meaning is composed in natural languages according to Frege's principle of compositionality, which we discussed in our cross-disciplinary analysis.

These formalisms provide a framework for discussing the limits of computation, including the famous halting problem: the proof that no algorithm can determine, for arbitrary programs and inputs, whether the program will terminate or run forever. This and other undecidability results establish fundamental boundaries on what can be computed, regardless of available resources.

Formal Languages and Analytical Philosophy

The connection between computation and language has been a central concern of analytical philosophy since its inception. Gottlob Frege's distinction between sense and reference, Bertrand Russell's theory of types, and Ludwig Wittgenstein's picture theory of meaning in the Tractatus all sought to establish how symbolic structures create meaning—a project that parallels the development of formal languages and computation.

Wittgenstein's later work on language games is particularly relevant to our discussion of LLMs. He proposed that meaning arises not from abstract correspondence between symbols and reality, but from how language is used in specific contexts or "games" with implicit rules. This perspective helps us understand LLMs as systems that have implicitly learned the rules of various language games from their training data, without necessarily having access to the underlying reality those games describe. As Wittgenstein famously noted, "the limits of my language mean the limits of my world"—a notion that aptly describes the boundaries of what LLMs can meaningfully express based on their training.

Similarly, John Searle's thought experiment of the Chinese Room raises questions about whether symbol manipulation alone constitutes understanding. Searle imagines a person in a room following rules to manipulate Chinese symbols without understanding Chinese. The room produces appropriate outputs that appear to demonstrate understanding to external observers, despite the person inside having no comprehension of the meaning. This parallels modern debates about whether LLMs truly "understand" language or merely manipulate symbols according to statistical patterns without semantic grounding.

Particularly relevant is Rudolf Carnap's work on formal semantics and logical syntax, which attempted to establish a precise relationship between symbolic structures and their interpretations. Carnap's logical positivism sought to reduce meaningful statements to those verifiable through logical analysis or empirical observation—a project with striking similarities to formal semantics in programming languages, where meaning is defined through precise evaluation rules.

This philosophical tradition converges with computation theory in the work of Alfred Tarski, whose semantic theory of truth formalized how symbolic statements relate to their models or interpretations. Tarski's work demonstrated that for sufficiently powerful formal systems, truth cannot be defined within the system itself—a result that foreshadowed Gödel's incompleteness theorems and established limitations on what formal systems can express about themselves.

Information Theory and Computational Complexity

Claude Shannon's information theory provides another crucial perspective on computation. Shannon defined information in terms of entropy and surprise, formalizing how symbols carry information through their probability distributions:

$$H(X) = -\sum_{i=1}^{n} P(x_i) \log_2 P(x_i)$$

This entropy formula quantifies the uncertainty or unpredictability in a distribution—high entropy indicates high unpredictability (each outcome is equally likely), while low entropy indicates predictability (some outcomes are much more likely than others). For LLMs, this formulation is directly relevant because they essentially model the conditional entropy of the next token given all previous tokens:

$$H(X_{t+1} | X_{1:t}) = -\sum_{x_{t+1}} P(x_{t+1} | x_{1:t}) \log_2 P(x_{t+1} | x_{1:t})$$

LLMs are trained to minimize this conditional entropy on their training data, effectively learning to predict the most likely continuations of any given text. This perspective reveals LLMs as statistical models of language rather than logical reasoning systems, which has profound implications for their computational capabilities and limitations.

Shannon's work established fundamental limits on lossless data compression and channel capacity, showing that information processing, regardless of its implementation, faces inherent constraints. These information-theoretic bounds help explain why LLMs, despite their size, cannot perfectly capture all the regularities in their training data, especially for rare patterns or complex logical dependencies that span long contexts.

Complementing these limits, computational complexity theory categorizes problems based on the resources (time, space) required to solve them. The classes P, NP, PSPACE, and others form a hierarchy of increasingly demanding computational tasks. The famous P vs. NP problem asks whether problems whose solutions can be verified in polynomial time can also be solved in polynomial time—a question with profound implications for computation and verification.

For a concrete example of complexity theory applied to LLMs, consider the task of determining whether a logical formula is satisfiable (the SAT problem, which is NP-complete). While traditional algorithms can solve SAT systematically (albeit potentially requiring exponential time), LLMs must rely on pattern recognition from seen examples, making them inherently probabilistic and prone to errors on complex instances. This highlights a fundamental difference between algorithmic computation and the statistical pattern matching performed by LLMs.

These frameworks—computability theory, analytical philosophy, information theory, and complexity theory—collectively establish what computation is, how it relates to language and meaning, and what its fundamental limitations are. They provide the theoretical foundation for examining LLMs as computational systems and evaluating their capabilities against the benchmark of universal computation.

LLMs as Computational Systems

Formal Characterization of LLMs

Large Language Models like GPT, Claude, or LLaMA can be formally characterized as parameterized probability distributions over sequences of tokens. Given a sequence of tokens $x_{1:t} = (x_1, x_2, ..., x_t)$, an LLM defines a conditional probability distribution over the next token:

$$P(x_{t+1} | x_{1:t}; \theta)$$

where $\theta$ represents the model's parameters (weights). The model generates text by repeatedly sampling from this distribution and appending the sampled token to the sequence. This process can be formalized as a stochastic transition system where the state is the current context $x_{1:t}$ and the transitions are governed by the learned probability distribution.

This probabilistic nature fundamentally distinguishes LLMs from traditional algorithmic computation. While deterministic algorithms follow prescribed steps to produce specific outputs, LLMs generate outputs based on learned statistical patterns, making them inherently non-deterministic without additional constraints. This stochasticity is both a strength—enabling creativity and flexibility—and a limitation when precise, deterministic computation is required.

From a computational perspective, LLMs implement a form of finite-state transduction with an extremely large state space. The Transformer architecture that underlies most modern LLMs processes input through a series of self-attention and feed-forward layers, with each layer representing a transformation of the input representation:

$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$

$$\text{MultiHead}(Q, K, V) = \text{Concat}(\text{head}_1, ..., \text{head}_h)W^O$$

These operations, while complex, are fundamentally matrix multiplications and non-linear transformations—operations that fall within the class of efficiently computable functions. Each forward pass through the model has a fixed computational budget, regardless of the complexity of the reasoning required, which imposes significant constraints on the model's capabilities.

To understand LLMs in terms of traditional computational models, we can position them within the Chomsky hierarchy of formal languages. While regular languages are recognized by finite automata (DFAs/NFAs), context-free languages by pushdown automata (PDAs), and recursively enumerable languages by Turing machines, LLMs occupy an interesting middle ground. Their fixed-length context window makes them theoretically less powerful than Turing machines, but their massive parameter space allows them to approximate aspects of more powerful computational models within certain bounds.

Specifically, within their context window, LLMs can sometimes simulate aspects of context-sensitive language recognition (like a linear bounded automaton), but they cannot guarantee precise recognition of arbitrarily complex patterns that would require unbounded memory. This places LLMs somewhere between pushdown automata and linear bounded automata in the formal language hierarchy—more powerful than simple finite-state machines but less powerful than full Turing machines.

Computational Limitations of LLMs

Despite their impressive capabilities, LLMs have several formal limitations as computational systems:

Bounded Context Window: LLMs process a finite context of tokens (e.g., 32K or 100K tokens), which means they cannot directly process arbitrarily long inputs. This contrasts with Turing machines, which have an infinite tape for computation. This limitation manifests in practice when LLMs need to reason about information spread across very long documents or maintain consistency across extended generations. For example, an LLM might fail to detect contradictions between statements made thousands of tokens apart simply because it cannot "see" both statements simultaneously.
Fixed Parameter Space: Once trained, an LLM's parameters $\theta$ are fixed, limiting its representational capacity to what can be encoded in these parameters. This constrains its ability to learn new patterns or rules on the fly. While fine-tuning can adjust these parameters, the fundamental limit remains—there are only so many patterns that can be stored in a fixed number of parameters. This explains why LLMs struggle with rare or novel problem structures that weren't well-represented in their training data.
Lack of External Memory: Standard LLMs cannot write to or read from persistent memory during inference, unlike Turing machines that can read and write to their tape. This restricts their ability to implement algorithms requiring significant working memory. Consider a complex mathematical proof that requires maintaining and systematically modifying a set of intermediate results—LLMs often struggle with such tasks because they cannot reliably store and manipulate this intermediate state.
Probabilistic Output: LLMs generate outputs probabilistically, which makes deterministic computation challenging without additional mechanisms to control randomness. This becomes particularly problematic for tasks requiring exact answers, like arithmetic or formal logic, where even a single error can invalidate the entire computation.
No Guaranteed Convergence: Unlike algorithms with provable termination and correctness properties, LLMs provide no guarantees about convergence to correct solutions. This makes them unreliable for safety-critical applications where formal verification is essential.

To illustrate these limitations concretely, consider the following examples:

Long-Context Reasoning: When asked to analyze inconsistencies in a long document that exceeds its context window, an LLM might miss contradictions simply because the relevant sections cannot be processed simultaneously. This demonstrates the bounded context limitation.
Complex Recursive Computation: When implementing a recursive algorithm like computing the nth Fibonacci number, LLMs often make errors for larger values of n, either through arithmetic mistakes or by losing track of the recursive state. This highlights the lack of reliable working memory.
Novel Problem Structures: When presented with puzzle types or logical structures absent from their training data, LLMs often fall back on superficially similar patterns, leading to incorrect approaches. This shows the limitation of fixed parameters optimized for patterns in the training distribution.

These limitations place LLMs squarely outside the class of Turing-complete systems. Formally, they are closer to bounded-memory finite state machines with stochastic transitions, a strictly weaker computational model than Turing machines.

LLMs and Gödel's Incompleteness

The limitations of LLMs as computational systems can be illuminated through the lens of Gödel's incompleteness theorems. Kurt Gödel demonstrated that in any consistent formal system capable of encoding basic arithmetic, there exist statements that are true but unprovable within that system. His proof involved constructing a self-referential statement that essentially says "This statement is not provable within the system."

LLMs face a parallel limitation. Trained on a fixed corpus and constrained by finite parameters, they cannot transcend certain boundaries in their reasoning capacity. Consider the statement:

"This is a statement that LLM [X] cannot correctly evaluate as true."

If the LLM evaluates it as true, the statement becomes false; if it evaluates it as false, the statement becomes true. This self-referential paradox mirrors Gödel's construction and highlights the fundamental limitations of any fixed symbolic system, including LLMs.

This limitation isn't merely theoretical—it manifests in practical scenarios where LLMs encounter logical paradoxes or self-referential problems. For instance, when asked to reason about their own capabilities or limitations, LLMs often produce inconsistent or circular reasoning. Similarly, when tasked with problems that require going beyond pattern recognition to establish fundamental truths (like certain mathematical proofs), LLMs may struggle without external guidance or verification.

Furthermore, Gödel's second incompleteness theorem shows that sufficiently powerful consistent systems cannot prove their own consistency. Similarly, LLMs cannot provide guaranteed bounds on their own reliability or accuracy—a limitation evident in phenomena like hallucinations or inconsistent reasoning. When LLMs attempt to assess the reliability of their own outputs, they face the same fundamental limitations as formal systems trying to prove their own consistency.

Drawing on analytical philosophy, we might say that LLMs operate within what Wittgenstein called a "language game" with implicit rules learned from training data, but they cannot stand outside that game to evaluate or modify its rules. This mirrors Wittgenstein's observation that "the limits of my language mean the limits of my world"—LLMs are bounded by the patterns and structures implicit in their training data and architectural constraints.

Weak and Indirect Turing Completeness

Defining Weak Turing Completeness

Given the formal limitations of LLMs, how should we characterize their computational capabilities? One useful concept is weak Turing completeness, which describes systems that can simulate Turing machines under certain constraints or with additional resources.

The term "weak Turing completeness" indicates a system that can theoretically simulate any computation a Turing machine can perform, but with important practical limitations that prevent it from being fully Turing-complete in the classical sense. For LLMs, these limitations stem from their bounded context window, fixed parameter space, and lack of external memory—constraints that make unbounded computation impossible without additional mechanisms.

To be more precise, we can contrast this with strong or full Turing completeness, which requires:

The ability to process inputs of arbitrary length
Access to unbounded memory
Deterministic execution
Guaranteed termination for any computation that would terminate on a standard Turing machine

LLMs fail to meet these criteria because they:

Have a fixed context window (typically 32K-100K tokens)
Cannot persistently write to or read from memory beyond their parameter space
Generate outputs probabilistically
Cannot guarantee correct execution of arbitrarily complex computations

We can formalize weak Turing completeness for an LLM $L$ as follows:

For any Turing machine $M$ and input $x$, there exists a prompt $p_{M,x}$ such that $L(p_{M,x})$ produces an output describing the computation of $M$ on $x$, but with the following constraints:

The simulation may be limited to a bounded number of steps of $M$

The simulation may have a non-zero probability of error

The complexity of $p_{M,x}$ may grow with the complexity of $M$ and $x$

This formulation captures the intuition that LLMs can simulate computational processes, but with inherent limitations in accuracy, scale, and reliability. Weak Turing completeness acknowledges that LLMs can reason about and describe arbitrarily complex computations, without claiming they can perfectly implement them.

To illustrate this concept, consider how an LLM might simulate a simple Turing machine that adds two numbers. Given a properly formatted prompt describing the machine's states, transition function, and input, the LLM can walk through the computational steps and produce the expected output. However, if we increase the size of the numbers or the complexity of the machine, the LLM's simulation becomes increasingly prone to errors. For sufficiently complex computations, the LLM might lose track of the machine's state, make arithmetic errors, or simply reach the limits of its context window.

This example highlights the "weak" aspect of LLMs' computational capabilities: they can approximate Turing machine behavior for simple cases but lack the guaranteed correctness and unbounded resources that characterize true Turing completeness. The weaknesses stem not from theoretical limitations in what can be expressed, but from practical constraints on how reliably and extensively the computation can be carried out.

Indirect Turing Completeness Through Code Generation

While LLMs themselves are not Turing-complete, they can generate code that, when executed by a Turing-complete interpreter or compiler, performs arbitrary computation. This capability represents a form of indirect Turing completeness.

Formally, we can define indirect Turing completeness as follows:

For any computable function $f$, there exists a prompt $p_f$ such that when given to an LLM $L$, the output $L(p_f)$ is a program $P$ in some Turing-complete language, and when $P$ is executed by an appropriate interpreter $I$, the composed function $I(L(p_f))$ approximates $f$.

This formulation highlights that the LLM serves as a bridge between natural language specifications and executable code, enabling a form of computation that transcends the model's internal limitations. However, this capability is probabilistic and depends on:

The LLM's understanding of the computational task
Its ability to express that task in a formal programming language
The availability of an external execution environment

The indirection is crucial: the LLM itself doesn't compute the function $f$, but rather produces a specification that, when interpreted by another system, computes $f$. This is analogous to how humans use programming languages to express computations that are then executed by machines—we rely on the Turing completeness of the programming environment, not our own cognitive processes.

In practice, this indirect Turing completeness manifests when users ask LLMs to solve computational problems by generating code. For example, when asked to implement a complex algorithm like QuickSort or to solve a mathematical optimization problem, LLMs often generate functional code that, when executed, correctly solves the problem—even if the LLM itself couldn't perform the computation directly within its parameter space.

This capability creates an interesting hybrid system where natural language serves as a high-level programming interface to computational resources. Users can describe what they want to compute in informal terms, and the LLM translates this into formal code that achieves the desired result. This process capitalizes on the LLM's strength in understanding and generating language while compensating for its weaknesses in performing precise, deterministic computation.

However, this indirect approach has limitations. The code generation process inherits the probabilistic nature of LLMs, leading to potential bugs, inefficiencies, or complete failures for sufficiently complex tasks. Real-world examples include LLMs generating code with off-by-one errors, incorrect boundary condition handling, or logical flaws that only become apparent during execution. These limitations highlight that indirect Turing completeness provides computational power but not computational reliability.

A Formal Framework for Understanding LLM Computation

To better understand the computational capabilities of LLMs, we can develop a formal framework based on interactive computation and oracle machines. In classical computation theory, an oracle machine is a Turing machine augmented with an oracle—a black box that can answer certain questions in a single step. We can model the interaction between an LLM and a computational environment as a system where the LLM serves as an oracle for program synthesis:

$$M^L = \langle Q, \Sigma, \Gamma, \delta, q_0, F, L \rangle$$

where $M^L$ is a Turing machine with access to LLM $L$ as an oracle. The machine can query $L$ with a prompt $p$ and receive program code that it can then execute. This framework captures the hybrid nature of computation with LLMs—combining the statistical pattern recognition of neural networks with the deterministic execution of traditional programs.

In the context of automata theory, we can view LLMs as implementing a form of augmented finite-state transduction. While a standard finite-state transducer maps input strings to output strings based on a fixed set of states and transitions, LLMs implement this mapping with an enormous implicit state space (encoded in their parameters) and context-dependent transitions. This makes them more powerful than traditional finite automata but still bounded compared to Turing machines.

To position LLMs more precisely within the Chomsky hierarchy:

Regular Languages (Type 3) are recognized by finite automata. LLMs clearly exceed this computational class, as they can recognize patterns that no finite automaton could (like matching nested parentheses).
Context-Free Languages (Type 2) are recognized by pushdown automata. LLMs can recognize many context-free patterns, such as properly nested structures, though they may struggle with deeply nested constructs beyond a certain depth.
Context-Sensitive Languages (Type 1) are recognized by linear bounded automata. Within their context window, LLMs can sometimes approximate this class, recognizing patterns that require looking at the entire input simultaneously.
Recursively Enumerable Languages (Type 0) are recognized by Turing machines. LLMs fall short of this class due to their bounded resources.

This positioning helps explain why LLMs excel at certain language tasks but struggle with others. They can handle many natural language patterns effectively (which often fall within Types 2-3) but face challenges with tasks requiring unbounded memory or precise recursion (more characteristic of Type 0).

This relationship to classical computational models helps explain both the capabilities and limitations of LLMs. For tasks that can be effectively captured by finite-state methods (like many natural language processing tasks), LLMs excel because their architecture is well-suited to learning complex state transitions. For tasks requiring unbounded memory or precise recursive structure (like mathematical proofs or deep nested computations), LLMs struggle because these tasks exceed the capabilities of finite-state models.

This interactive view relates to dialogical logic in analytical philosophy, which understands meaning through structured exchanges between participants. In our case, the dialogue is between the human specifying a task, the LLM interpreting and formalizing that task, and the computational environment executing the formalized solution. Each participant has different capabilities and limitations, and the composite system achieves capabilities beyond any individual component.

From an information-theoretic perspective, we can quantify the contribution of the LLM to this computational process. If we define the Kolmogorov complexity $K(P)$ of a program $P$ as the length of the shortest description that produces $P$, then an LLM effectively reduces the description length required from the human. For a task requiring program $P$, the human needs only provide a natural language description $D$ with length $|D| < |P|$, and the LLM bridges the gap between $|D|$ and $|P|$ through its learned parameters. This compression of description length represents a genuine computational contribution, even if the LLM itself isn't Turing-complete.

Modern extensions to base LLM architectures demonstrate practical applications of this framework. Techniques like chain-of-thought prompting enable more complex reasoning by encouraging the model to break down problems into steps, effectively using the generated text itself as a form of external memory. Similarly, retrieval-augmented generation (RAG) systems extend the model's effective context by incorporating relevant retrieved information, partially addressing the bounded context limitation. These approaches don't fundamentally change the computational class of LLMs, but they do expand the range of computations they can reliably perform.

Theoretical and Practical Implications

Connections to Metamathematics and Logic

The computational boundaries of LLMs connect to fundamental questions in metamathematics and logic. Alfred Tarski's undefinability theorem showed that the concept of truth for a sufficiently expressive formal language cannot be defined within that language itself. Similarly, LLMs trained on text data cannot fully formalize their own semantic foundations—they operate within the patterns they've learned but cannot completely characterize the boundaries or reliability of those patterns.

In practical terms, this limitation manifests when LLMs attempt to reason about their own outputs or assess their own reliability. When asked to evaluate whether their response to a complex question is correct, LLMs often display overconfidence or inconsistency because they lack access to an external truth standard. This parallels Tarski's insight that truth cannot be fully defined within the system itself—a definitive assessment requires stepping outside the system.

This limitation relates to Douglas Hofstadter's concept of "strange loops" in Gödel, Escher, Bach, where self-reference creates paradoxes and transcendent capabilities in formal systems. LLMs exhibit a form of strange loop when they reason about their own capabilities or limitations, but this self-reference is necessarily incomplete and potentially misleading.

From a philosophical perspective, this connects to Saul Kripke's work on naming and necessity. Kripke argued that names function as "rigid designators" that refer to the same entity across all possible worlds. LLMs, however, lack rigid designation—their symbolic associations are flexible and contextual, leading to phenomena like hallucinations where references become untethered from ground truth. This illuminates a fundamental difference between formal logical systems (with fixed semantics) and statistical language models (with probabilistic, context-dependent semantics).

For example, when an LLM generates content about historical figures or technical concepts, it may inadvertently blend attributes or create non-existent entities that sound plausible but have no referent in reality. This happens because the model operates on statistical associations rather than rigidly designated references, lacking what philosophers call "causal-historical" connections to the entities being discussed.

Information Complexity and Neural Scaling

The relationship between LLMs and computation also raises questions about information complexity and neural scaling. Recent work on scaling laws suggests that model performance improves predictably with increased parameters, data, and compute, following power laws:

$$\text{Loss} \approx C \cdot (N_\text{params})^{-\alpha} \cdot (N_\text{data})^{-\beta} \cdot (N_\text{compute})^{-\gamma}$$

This empirical relationship has profound implications for the computational capabilities of LLMs. If performance scales predictably with model size, how does this affect the computational class of increasingly large models? Does scale eventually overcome the fundamental limitations we've identified?

This raises a profound question: as we scale models further, do they approach Turing completeness asymptotically? The formal answer appears to be no—the architectural constraints remain regardless of scale. A Transformer with a finite context window and fixed parameters cannot become Turing-complete merely by increasing its size, as it still lacks the unbounded memory and guaranteed precision required for universal computation.

However, the practical capabilities may converge toward universal computation in the limit of infinite scale, creating an interesting tension between theoretical limitations and practical capabilities. As models grow, they can approximate more complex computations with greater reliability, even if they cannot guarantee correctness in all cases.

This phenomenon is evident in how larger models handle tasks like arithmetic, logic puzzles, or code generation. While small models make frequent errors on these tasks, larger models become increasingly reliable—not because their computational class has changed, but because they've memorized more patterns and can perform more extensive in-context computation within their architectural constraints.

For example, consider how different LLMs perform on a complex arithmetic problem like computing $3^{17}$:

Small models (e.g., early GPT versions) often produce completely incorrect results
Medium-sized models might attempt digit-by-digit calculation but make errors in carrying or multiplication
Large models (e.g., GPT-4 or Claude) can often get the correct answer by applying memorized algorithms more reliably

This progression shows how scale improves computational reliability without changing fundamental capabilities—a important distinction when evaluating LLMs as computational systems.

This tension connects to Hilary Putnam's functionalism in the philosophy of mind, which argues that mental states are defined by their functional roles rather than their physical implementation. Under this view, if an LLM can functionally simulate any computation (even indirectly), it may satisfy a functionalist definition of universal computation, regardless of its architectural constraints.

Practical Applications and Future Directions

The indirect Turing completeness of LLMs through code generation has profound practical implications. It enables:

Program synthesis from natural language: Translating informal requirements into executable code, bridging the semantic gap between human intention and formal specification. For example, a data scientist might describe a statistical analysis in plain language, and the LLM generates Python code that implements this analysis using appropriate libraries and techniques.
Interactive programming assistance: Providing context-aware suggestions, explanations, and debugging help that adapts to the programmer's needs and skill level. This includes identifying bugs in existing code, suggesting optimizations, or explaining how complex code works through natural language descriptions.
Automated reasoning about code: Analyzing code for correctness, efficiency, and potential bugs using a combination of pattern recognition and formal reasoning. While not as rigorous as formal verification, this approach can identify many common issues and suggest improvements.
Computational reasoning augmentation: Combining LLMs with external tools like calculators, search engines, or specialized algorithms to overcome their inherent limitations. For instance, a system might use an LLM to interpret a mathematical problem, convert it to a formal representation, and then use a specialized solver to compute the result.

These capabilities are transforming software development, making programming more accessible to non-experts and enhancing the productivity of experienced developers. However, they also raise important questions about verification, reliability, and the division of cognitive labor between humans and AI systems.

To address the limitations of current LLMs, several practical approaches have emerged:

Tool use and augmentation: Integrating LLMs with external tools that compensate for their limitations. For example, calculator tools address arithmetic weaknesses, while web search tools provide access to up-to-date information beyond the training data.
Reinforcement Learning from Human Feedback (RLHF): Training models to better align with human expectations and reduce hallucinations or logical errors. This approach uses human feedback to refine model outputs, potentially improving reliability on computational tasks.
Chain-of-thought and step-by-step reasoning: Encouraging models to break down complex problems into manageable steps, effectively creating an external memory through the generated text itself. This approach has shown significant improvements on mathematical reasoning and logical problem-solving.
Hybrid symbolic-neural systems: Combining the pattern recognition capabilities of LLMs with the precision of symbolic reasoning systems. These hybrid approaches leverage the strengths of both paradigms while mitigating their individual weaknesses.

Unlike purely neural approaches like LLMs, hybrid systems can offer more explicit guarantees about correctness. For example, a neural-symbolic system might use an LLM to generate a solution strategy but rely on a symbolic reasoner to verify each step's correctness. This addresses the "weak" aspect of LLMs' computational capabilities by providing stronger verification mechanisms.

Comparing LLMs to other AI approaches further illuminates their unique computational properties:

Reinforcement Learning vs. LLMs: RL systems learn through interaction with environments, developing goal-directed behaviors through feedback. This differs fundamentally from LLMs, which learn from static text data without interactive feedback. RL excels at sequential decision-making and planning—tasks where LLMs often struggle due to their lack of environmental interaction.
Neural-Symbolic AI vs. LLMs: Neural-symbolic approaches explicitly combine neural networks with symbolic reasoning components. Unlike pure LLMs, these systems can perform explicit logical inference with guaranteed correctness properties. However, they typically lack the linguistic flexibility and generality of LLMs.
Differentiable Programming vs. LLMs: Systems like Neural Turing Machines or Differentiable Neural Computers explicitly incorporate memory and algorithmic structures into neural architectures. These approaches directly address the memory limitations of standard LLMs but have not yet achieved the same scale or generality.

Future research directions include:

Formal verification of LLM-generated code: Developing methods to guarantee the correctness of code produced by LLMs, potentially using interactive theorem proving or automated verification techniques. This would address the reliability concerns of indirect computation through code generation.
Augmented architectures for enhanced computation: Designing systems that combine LLMs with external memory, symbolic reasoning components, or formal verification tools to overcome the inherent limitations of pure transformer architectures. For example, architectures that integrate transformers with explicit memory modules or reasoning engines.
Information-theoretic bounds on LLM capabilities: Establishing theoretical limits on what patterns LLMs can learn and represent, similar to how Shannon's noisy-channel coding theorem established bounds on error correction. This would provide a more rigorous understanding of what computational tasks are fundamentally beyond the reach of statistical language models.
Integration of LLMs with programming language theory: Developing new programming languages and paradigms that are specifically designed to leverage the strengths of LLMs while mitigating their weaknesses. This might include languages with formal verification capabilities or semantics that align naturally with the probabilistic reasoning of neural networks.

Conclusion: Beyond Classical Computation

Our exploration of LLMs and their computational boundaries reveals a fascinating tension between classical theories of computation and emerging AI capabilities. LLMs are not Turing-complete in the traditional sense—they lack the infinite memory, deterministic execution, and guaranteed termination properties that characterize universal computation. Yet through their ability to generate code and interface with traditional programming environments, they achieve a form of indirect or weak Turing completeness that challenges our understanding of computation itself.

This tension echoes the philosophical debates between formalism and pragmatism in the foundations of mathematics. The formalist tradition, exemplified by David Hilbert, sought rigorous proofs and clear boundaries for mathematical objects and operations. The pragmatist tradition, represented by thinkers like Charles Sanders Peirce, focused on the practical consequences and applications of mathematical ideas, sometimes at the expense of absolute rigor.

LLMs embody this pragmatic turn in computation—they achieve remarkable practical results through statistical approximation rather than formal guarantee. They operate in what we might call a "post-Turing" computational paradigm, where the boundaries between specification and implementation, between syntax and semantics, become increasingly blurred. In this paradigm, computation is no longer just the mechanical execution of explicit instructions but also includes the interpretation of ambiguous specifications, the recognition of implicit patterns, and the generation of novel solutions.

When we position LLMs within the traditional hierarchy of computational models, they occupy an intriguing middle ground. They exceed the capabilities of finite automata through their vast parameter space and context window, yet fall short of the unbounded memory and precision of Turing machines. This positions them as a novel computational class that combines aspects of finite-state transducers (transforming input sequences to output sequences) with statistical learning. Unlike traditional automata that operate on exact patterns, LLMs work with fuzzy pattern recognition that allows them to generalize across similar inputs—a capability that sometimes compensates for their formal limitations but introduces new challenges in reliability and verification.

This evolution connects back to our earlier discussions of syntax and semantics in the trilogy of "Syntax, Semantics, and Segfaults" essays. Just as programming languages evolved from low-level machine instructions to high-level abstractions with rich semantics, the integration of LLMs into computational systems represents a further step toward computation that understands and generates meaning, not just manipulates symbols according to fixed rules. The "segfaults" in this new paradigm are not just memory access violations but semantic disconnections between intended meaning and generated outputs—hallucinations, misinterpretations, and logical inconsistencies that result from the probabilistic nature of these systems.

These semantic failures manifest in concrete ways: when an LLM confidently asserts a non-existent fact, incorrectly solves a mathematical problem, or generates code with subtle logical errors. Just as traditional segfaults indicate a mismatch between program logic and memory constraints, these "semantic segfaults" indicate a mismatch between the statistical patterns learned by the model and the logical structure required by the task. Understanding and addressing these failures requires insights from both traditional computer science and the emerging field of AI alignment.

As we continue to develop and refine LLM-based computational systems, the interplay between formal guarantees and statistical approximation will remain a central challenge. Finding the right balance—leveraging the flexibility and creativity of neural language models while maintaining the reliability and verifiability of traditional computation—will define the next frontier in programming languages, software engineering, and artificial intelligence.

In this evolving landscape, the boundaries established by Turing, Church, Gödel, and other pioneers of computation theory remain relevant but require reinterpretation. The indirect Turing completeness of LLMs reminds us that computation is not just a formal mathematical concept but a practical tool for extending human capabilities. Just as Wittgenstein observed that language games evolve with use and context, our understanding of computation must evolve to encompass these new modalities of symbol manipulation and meaning creation.

The journey from syntax to semantics now continues beyond traditional computation, into systems that blur the boundaries between formal and natural languages, between explicit programming and implicit learning, and between human and machine intelligence. In this expanded computational universe, the traditional questions of correctness, meaning, and computational power take on new dimensions—dimensions that will require both rigorous formal analysis and creative philosophical reimagining to fully understand.