The recent exponential advances in natural language processing capabilities from large language models (LLMs) have stirred tremendous excitement about their potential to achieve human-level intelligence. Their ability to produce remarkably coherent text and engage in dialogue after exposure to vast datasets seems to point towards flexible, general purpose reasoning skills.
However, a growing chorus of voices urges caution against unchecked optimism by highlighting fundamental blindspots that limit neural approaches. LLMs still frequently make basic logical and mathematical mistakes that reveal a lack of systematicity behind their responses. Their knowledge remains intrinsically statistical without deeper semantic structures.
More complex reasoning tasks further expose these limitations. LLMs struggle with causal, counterfactual, and compositional reasoning challenges that require going beyond surface pattern recognition. Unlike humans who learn abstract schemas to flexibly recombine modular concepts, neural networks memorize correlations between co-occurring terms. This results in brittle generalization outside narrow training distributions.
The chasm underscores how human cognition employs structured symbolic representations to enable systematic composability and causal models for conceptualizing dynamics. We reason by manipulating modular symbolic concepts based on valid inference rules, chaining logical dependencies, leveraging mental simulations, and postulating mechanisms relating variables. The inherently statistical nature of neural networks precludes developing such structured reasoning.
It remains mysterious how symbolic-like phenomena emerge in LLMs despite their subsymbolic substrate. But clearer acknowledgement of this “hybridity gap” is imperative. True progress requires embracing complementary strengths — the flexibility of neural approaches with structured knowledge representations and causal reasoning techniques — to create integrated reasoning systems.
We first outline the growing chorus of analyses exposing neural networks’ lack of systematicity, causal comprehension, and compositional generalization — underscoring differences from innate human faculties.
Next, we detail salient facets of the “reasoning gap”, including struggles with modular skill orchestration, unraveling dynamics, and counterfactual simulation. We surface innate human capacities…