RE: LeoThread 2026-01-29 13-44
You are viewing a single comment's thread:
#ai Continues to Get Better
Smaller models are great for smaller players that have limited hardware + energy.
https://inleo.io/threads/view/khaleelkazi/re-khaleelkazi-ba1apx
!summarize
0
0
0.000
Part 1/11:
Hierarchical and Recursive Models: A New Paradigm in AI Research
In recent discussions across the AI community, a prevailing consensus has emerged: scaling AI—making models bigger and bigger—is the key to achieving artificial general intelligence (AGI). Hundreds of billions of dollars are being poured into this approach, with the hope that increasing parameters and training data will eventually lead to models capable of human-like understanding and reasoning.
Part 2/11:
Yet, despite the impressive capabilities of large language models (LLMs), they still predominantly excel at what are essentially logical puzzles—a far cry from the versatile, human-like intelligence many aspire to. Interestingly, a new line of research challenges this current "scale or die" narrative, proposing that smaller, more efficient models with internal recursive and hierarchical reasoning structures might outperform sprawling architectures.
The Emergence of Hierarchical Reasoning Models
Part 3/11:
This shift began with the development of Hierarchical Reasoning Models (HRM)—a novel architecture that, astonishingly, scores highly on logical puzzles such as those in the ARC AGI benchmark. Despite having only 27 million parameters—a tiny fraction compared to models like GPT-4, which has over a trillion—the HRM model achieved 32% on the ARC AGI benchmark, outperforming larger models and solving tasks like Sudoku puzzles that are challenging for traditional large-scale language models.
What makes HRM remarkable isn't just its performance but its size. Its ability to compete with giants suggests that effective reasoning might be more about model architecture than sheer scale.
How HRM and TRM Work
The Inner Workings
Part 4/11:
HRM employs a recursive reasoning approach inspired by how the brain might approach complex problems. Instead of attempting to solve an entire puzzle in one pass, HRM refines its internal reasoning iteratively. It maintains an internal 'latent state'—a sort of reasoning scratchpad—that it updates multiple times before producing an answer.
This process involves two interconnected transformer networks operating at different speeds:
Fast network: Updates the scratchpad micro-steps, refining details.
Slow network: Periodically intervenes, setting strategic direction and assessing whether the reasoning is sufficient to halt.
This two-timescale process mimics aspects of human thought, akin to how our brains operate on different processing speeds across cortical regions.
Part 5/11:
Biological Inspiration and Critiques
Initially, HRM was thought to mimic biological processes—drawing parallels with cortical dynamics in the mouse brain. However, many researchers have questioned this analogy's strength; the connections are more metaphorical than mechanistic. The hierarchical structure in HRM is primarily heuristic, with the model designed to parse latent features into hierarchical reasoning without explicit neurobiological grounding.
Limitations and Optimizations
Even with strong empirical results, the HRM approach has some weaknesses. Notably:
Part 6/11:
These limitations prompted further research and refinement.
The Birth of TRM: Tiny Recursive Models
Enter TRM (Tiny Recursion Model)—a radically simplified, more robust successor developed by researcher Alexia. Instead of relying on biological analogies or assumptions of equilibrium, TRM directly trains after a fixed number of recursive updates, dropping the assumption that inner loops must settle into a stable state.
Design Differences
Part 7/11:
Performance and Capabilities
Despite its smaller size, TRM has achieved impressive results:
40% on ARC AGI 1, surpassing models like Gemini 2.5 Pro.
6.2% on ARC AGI 2, close to GPT-5 Medium.
Excels in logical puzzles such as Sudoku and maze problems, where it can refine answers through recursive reasoning until confidence is high.
Part 8/11:
An intriguing aspect is that TRM sometimes performs better using MLPs alone—without attention mechanisms—in tasks with limited context, such as 9x9 Sudoku puzzles. Attention, however, becomes advantageous in larger, more complex grids like 30x30 mazes.
Insights from Scaling and Model Size
A surprising discovery during development was that scaling the number of layers in TRM decreased performance, a stark contrast to conventional wisdom in large language models. More layers led to overfitting and worse generalization, whereas fewer layers combined with more recursive steps yielded better results.
Part 9/11:
This suggests that for small, task-specific models, recursive reasoning and clever architecture can outperform larger, monolithic models. The recursive process turns complex problems into sequences of small, manageable edits, reducing the need for vast parameters.
Implications for AI Development
These findings raise critical questions:
The success of HRM and TRM indicates that structural ingenuity can rival or surpass brute-force scaling, especially for logical and rule-based tasks.
Evidence points to better generalization and robustness in small, recursive models, as they are less prone to overfitting and data memorization.
Part 10/11:
The future may lie in hierarchical, recursive, and multi-timescale models that approach reasoning more like humans—refining answers multiple times rather than predicting in one shot.
Conclusion: Rethinking AI Paradigms
The advent of HRM and TRM shows that small, thoughtfully designed recursive models are a promising path forward. They challenge the prevalent narrative that more parameters and bigger models equate to smarter AI. Instead, architecture, recursive reasoning, and efficient internal representations can unlock impressive reasoning capabilities within a fraction of the typical computational footprint.
Part 11/11:
As AI research continues to evolve, exploring these alternative paradigms may bring us closer to truly intelligent machines—without requiring impossibly large models. This approach not only democratizes AI development but also encourages innovation grounded in how our own brains might solve problems.
What do you think? Are recursive and hierarchical models the future of AI? Share your thoughts in the comments.
Oh, this model is very specialized though, and can't be used for text generation... I can't wait if we managed to figure out how to get ChatGPT level of text generation with that size, though.
ChatGPT and competitors all have multiple models that get automatically selected based on the prompt. Having few specialized tiny models can be a game changer.
I meant, this model is too specialized to be useful to most people. It can't even generate text...
But I'm still excited... Instead of 5-10 sub-models like they're doing now, I imagine in the future AI tools will contain thousands of sub-models that are each "too specialized" to be usable on their own, but are insanely small in size.