RE: LeoThread 2026-01-29 13-44

You are viewing a single comment's thread:

#ai Continues to Get Better

Smaller models are great for smaller players that have limited hardware + energy.

https://inleo.io/threads/view/khaleelkazi/re-khaleelkazi-ba1apx

!summarize



0
0
0.000
14 comments
avatar

Part 1/11:

Hierarchical and Recursive Models: A New Paradigm in AI Research

In recent discussions across the AI community, a prevailing consensus has emerged: scaling AI—making models bigger and bigger—is the key to achieving artificial general intelligence (AGI). Hundreds of billions of dollars are being poured into this approach, with the hope that increasing parameters and training data will eventually lead to models capable of human-like understanding and reasoning.

0
0
0.000
avatar

Part 2/11:

Yet, despite the impressive capabilities of large language models (LLMs), they still predominantly excel at what are essentially logical puzzles—a far cry from the versatile, human-like intelligence many aspire to. Interestingly, a new line of research challenges this current "scale or die" narrative, proposing that smaller, more efficient models with internal recursive and hierarchical reasoning structures might outperform sprawling architectures.

The Emergence of Hierarchical Reasoning Models

0
0
0.000
avatar

Part 3/11:

This shift began with the development of Hierarchical Reasoning Models (HRM)—a novel architecture that, astonishingly, scores highly on logical puzzles such as those in the ARC AGI benchmark. Despite having only 27 million parameters—a tiny fraction compared to models like GPT-4, which has over a trillion—the HRM model achieved 32% on the ARC AGI benchmark, outperforming larger models and solving tasks like Sudoku puzzles that are challenging for traditional large-scale language models.

What makes HRM remarkable isn't just its performance but its size. Its ability to compete with giants suggests that effective reasoning might be more about model architecture than sheer scale.

How HRM and TRM Work

The Inner Workings

0
0
0.000
avatar

Part 4/11:

HRM employs a recursive reasoning approach inspired by how the brain might approach complex problems. Instead of attempting to solve an entire puzzle in one pass, HRM refines its internal reasoning iteratively. It maintains an internal 'latent state'—a sort of reasoning scratchpad—that it updates multiple times before producing an answer.

This process involves two interconnected transformer networks operating at different speeds:

  • Fast network: Updates the scratchpad micro-steps, refining details.

  • Slow network: Periodically intervenes, setting strategic direction and assessing whether the reasoning is sufficient to halt.

This two-timescale process mimics aspects of human thought, akin to how our brains operate on different processing speeds across cortical regions.

0
0
0.000
avatar

Part 5/11:

Biological Inspiration and Critiques

Initially, HRM was thought to mimic biological processes—drawing parallels with cortical dynamics in the mouse brain. However, many researchers have questioned this analogy's strength; the connections are more metaphorical than mechanistic. The hierarchical structure in HRM is primarily heuristic, with the model designed to parse latent features into hierarchical reasoning without explicit neurobiological grounding.

Limitations and Optimizations

Even with strong empirical results, the HRM approach has some weaknesses. Notably:

  • Assumption of stabilization: HRM assumes that its inner loop reaches a stable state for gradient calculation, which isn't always true in practice.
0
0
0.000
avatar

Part 6/11:

  • Heuristics: Many design choices are heuristic, making some conclusions more conjectural than experimentally verified.

These limitations prompted further research and refinement.

The Birth of TRM: Tiny Recursive Models

Enter TRM (Tiny Recursion Model)—a radically simplified, more robust successor developed by researcher Alexia. Instead of relying on biological analogies or assumptions of equilibrium, TRM directly trains after a fixed number of recursive updates, dropping the assumption that inner loops must settle into a stable state.

Design Differences

  • Simpler architecture: With only 7 million parameters, nearly four times smaller than HRM, TRM is more computationally efficient.
0
0
0.000
avatar

Part 7/11:

  • Recursion without equilibrium: Instead of assuming the inner loop converges, TRM explicitly trains on the actual recursive steps it performs, leading to more stable and predictable training dynamics.

Performance and Capabilities

Despite its smaller size, TRM has achieved impressive results:

  • 40% on ARC AGI 1, surpassing models like Gemini 2.5 Pro.

  • 6.2% on ARC AGI 2, close to GPT-5 Medium.

  • Excels in logical puzzles such as Sudoku and maze problems, where it can refine answers through recursive reasoning until confidence is high.

0
0
0.000
avatar

Part 8/11:

An intriguing aspect is that TRM sometimes performs better using MLPs alone—without attention mechanisms—in tasks with limited context, such as 9x9 Sudoku puzzles. Attention, however, becomes advantageous in larger, more complex grids like 30x30 mazes.

Insights from Scaling and Model Size

A surprising discovery during development was that scaling the number of layers in TRM decreased performance, a stark contrast to conventional wisdom in large language models. More layers led to overfitting and worse generalization, whereas fewer layers combined with more recursive steps yielded better results.

0
0
0.000
avatar

Part 9/11:

This suggests that for small, task-specific models, recursive reasoning and clever architecture can outperform larger, monolithic models. The recursive process turns complex problems into sequences of small, manageable edits, reducing the need for vast parameters.

Implications for AI Development

These findings raise critical questions:

  • Is scaling always the right approach?

The success of HRM and TRM indicates that structural ingenuity can rival or surpass brute-force scaling, especially for logical and rule-based tasks.

  • Can small models with recursive reasoning generalize better?

Evidence points to better generalization and robustness in small, recursive models, as they are less prone to overfitting and data memorization.

0
0
0.000
avatar

Part 10/11:

  • What is the future of AI architecture?

The future may lie in hierarchical, recursive, and multi-timescale models that approach reasoning more like humans—refining answers multiple times rather than predicting in one shot.

Conclusion: Rethinking AI Paradigms

The advent of HRM and TRM shows that small, thoughtfully designed recursive models are a promising path forward. They challenge the prevalent narrative that more parameters and bigger models equate to smarter AI. Instead, architecture, recursive reasoning, and efficient internal representations can unlock impressive reasoning capabilities within a fraction of the typical computational footprint.

0
0
0.000
avatar

Part 11/11:

As AI research continues to evolve, exploring these alternative paradigms may bring us closer to truly intelligent machines—without requiring impossibly large models. This approach not only democratizes AI development but also encourages innovation grounded in how our own brains might solve problems.


What do you think? Are recursive and hierarchical models the future of AI? Share your thoughts in the comments.

0
0
0.000
avatar

Oh, this model is very specialized though, and can't be used for text generation... I can't wait if we managed to figure out how to get ChatGPT level of text generation with that size, though.

0
0
0.000
avatar

ChatGPT and competitors all have multiple models that get automatically selected based on the prompt. Having few specialized tiny models can be a game changer.

0
0
0.000
avatar

I meant, this model is too specialized to be useful to most people. It can't even generate text...

But I'm still excited... Instead of 5-10 sub-models like they're doing now, I imagine in the future AI tools will contain thousands of sub-models that are each "too specialized" to be usable on their own, but are insanely small in size.

0
0
0.000