Reasoning Primitives in Hybrid and Non-Hybrid LLMs: Do Architectural Differences Yield Advantages in State-Tracking and Recall?

Shivam Rawat, Lucie Flek, Florian Mai, Nicholas Kluge Corr\^ea

Published May 27, 2026Featured #6In the daily list Apr 25, 2026

Open on arXiv Read PDF

Daily score68.3

Editorial review7.5

Relevance0.451

Freshness0.722

Why It Matters

What makes this one worth your time

Understanding the underlying mechanisms of reasoning in LLMs can lead to improved model architectures and better performance on complex tasks.

This study reveals the advantages of hybrid architectures in enhancing reasoning capabilities in LLMs.

Summary

The paper investigates reasoning in large language models (LLMs) by analyzing two primitives, recall and state-tracking, and compares hybrid architectures with attention-only models on tasks requiring both capabilities.

Key contributions

Evaluation of hybrid architectures against attention-only models in reasoning tasks.
Identification of the importance of recall and state-tracking as reasoning primitives.
Insights into how architectural design influences reasoning performance under varying task difficulties.

Notable insights

Hybrid models may offer better robustness in reasoning tasks as complexity increases.
Reasoning augmentation can significantly extend the effective operating range of models.

Possible limitations

The study is based on a limited set of models and tasks, which may not generalize to broader applications.

Abstract

arXiv:2604.21454v2 Announce Type: replace-cross Abstract: Reasoning in large language models is often discussed as a single capability, but some of its gains may stem from simpler underlying operations. We examine two such primitives, recall and state-tracking, through five controlled task families centered on state-based recall, and compare matched transformer and hybrid architectures with and without reasoning augmentation. Across the suite, reasoning-augmented variants substantially outperform instruction-only variants, often by large margins. This pattern is consistent with the State over Tokens view: externalized reasoning traces help because they carry the intermediate state forward in token space. By contrast, hybrid inductive bias does not yield a uniform advantage in accuracy once reasoning tokens are available. When architectural differences do appear, they follow task structure: the hybrid Think model is more robust on strictly sequential chained updates, whereas the transformer Think model is more robust on flat multi-hop retrieval. We therefore cast the main contribution of this study as a descriptive account of what drives performance on state-based recall tasks: reasoning-token augmentation appears to be the dominant factor, while hybrid advantages are narrower, task-dependent, and potentially more about inference efficiency than overall capability. We also release the codebase and data required to reproduce these results.