
Long-Form Thoughts
Coming Soon - Thinking Machines: A Dual-system Framework for Metacognitive Control and Learning
A central challenge for artificial intelligence is not task-level performance, but the absence of architectural principles that support fast adaptive behavior, structured abstraction, and integrative learning across multiple temporal scales. Capabilities that are all hallmarks of human cognition. While large-scale models achieve superhuman performance in narrow domains, they lack the mechanistic organization required for general intelligence, specifically energy-efficient computation, embodied goal-directed control, and multi-timescale memory consolidation. Building on the theoretical foundations established in Part I, we introduce a neuro-inspired dual-loop architecture that decomposes cognition into two interacting timescales. A fast inner loop enables real-time sensorimotor adaptation and online learning, drawing on hierarchical Wilson–Cowan dynamics, Hopfield attractors, and Kalman-based predictive control to support rapid, embodied interaction with the environment. A slow outer loop supports abstraction, planning, and long-horizon reasoning through Wilson–Cowan consolidation dynamics, modern Hopfield attractors for compositional memory, graph-structured dependencies with spectral operators for high and low-dimensional reasoning, Lyapunov-stable dynamics for attractor convergence, and goal-oriented policy refinement for temporal coherence across timescales. Both loops operate under a shared constraint inspired by the Free Energy Principle, aligning perception, prediction, and action through energy minimization and uncertainty reduction. Together, these mechanisms transform deep learning systems from static function approximators into self-organizing dynamical systems, offering a path towards general-purpose agents that learn continuously, reason compositionally, and generalize flexibly across tasks, contexts, and environments.
Despite significant advances in artificial intelligence, contemporary systems remain limited in their capacity to integrate fast adaptive behavior with structured abstract reasoning, capabilities that in humans emerge through continuous coordination across brain systems. While modern models achieve superhuman proficiency in narrow domains, they continue to falter at compositional reasoning, causal abstraction, and embodied understanding. In this first part of a two-part investigation, we analyze the foundational mechanisms underlying general intelligence through three interdependent dimensions: energy-efficient computation, embodied goal-directed control, and multi-timescale memory consolidation. Drawing on principles from neuroscience and systems theory, we argue that intelligence emerges not from monolithic optimization, but from the dynamic communication amongst processes operating across multiple temporal and representational hierarchies. We introduce a dual-loop computational framework, in which a fast inner loop supports real-time embodied adaptation, while a slower outer loop integrates memory, abstraction, and long-horizon planning. The architecture promotes energy-efficient through sparse, recurrent dynamics, maintains goal-directed behavior through predictive feedback and utility optimization, and constructs internal world models unifying perception, action, and inference. Together, these mechanisms outline a coherent foundation for scalable, adaptive, and energy-efficient general intelligence. The second part of this series extends these principles into a neuro-inspired framework for metacognitive control and self-reflective learning.
The field of artificial intelligence has seen explosive growth and exponential success. The last phase of development showcased deep learnings ability to solve a variety of difficult problems across a multitude of domains. Many of these networks met and exceeded human benchmarks by becoming experts in the domains in which they are trained. Though the successes of artificial intelligence have begun to overshadow its failures, there is still much that separates current artificial intelligence tools from becoming the exceptional general learners that humans are. In this paper, we identify the ten commandments upon which human intelligence is systematically and hierarchically built. We believe these commandments work collectively to serve as the essential ingredients that lead to the emergence of higher-order cognition and intelligence. This paper discusses a computational framework that could house these ten commandments and suggests new architectural modifications that could lead to the development of smarter, more explainable, and generalizable artificial systems inspired by a neuromorphic approach.
We present a theory and neural network model of the neural mechanisms underlying human decision-making. We propose a detailed model of the interaction between brain regions, under a proposer-predictor-actor-critic framework. This theory is based on detailed animal data and theories of action-selection. Those theories are adapted to serial operation to bridge levels of analysis and explain human decision-making. Task-relevant areas of cortex propose a candidate plan using fast, model-free, parallel neural computations. Other areas of cortex and medial temporal lobe can then predict likely outcomes of that plan in this situation. This optional prediction- (or model-) based computation can produce better accuracy and generalization, at the expense of speed. Next, linked regions of basal ganglia act to accept or reject the proposed plan based on its reward history in similar contexts. If that plan is rejected, the process repeats to consider a new option. The reward-prediction system acts as a critic to determine the value of the outcome relative to expectations and produce dopamine as a training signal for cortex and basal ganglia. By operating sequentially and hierarchically, the same mechanisms previously proposed for animal action-selection could explain the most complex human plans and decisions. We discuss explanations of model-based decisions, habitization, and risky behavior based on the computational model.
We describe a neurobiologically informed computational model of phasic dopamine signaling to account for a wide range of findings, including many considered inconsistent with the simple reward prediction error (RPE) formalism. The central feature of this PVLV framework is a distinction between a primary value (PV) system for anticipating primary rewards (Unconditioned Stimuli [USs]), and a learned value (LV) system for learning about stimuli associated with such rewards (CSs). The LV system represents the amygdala, which drives phasic bursting in midbrain dopamine areas, while the PV system represents the ventral striatum, which drives shunting inhibition of dopamine for expected USs (via direct inhibitory projections) and phasic pausing for expected USs (via the lateral habenula). Our model accounts for data supporting the separability of these systems, including individual differences in CS-based (sign-tracking) versus US-based learning (goal-tracking). Both systems use competing opponent-processing pathways representing evidence for and against specific USs, which can explain data dissociating the processes involved in acquisition versus extinction conditioning. Further, opponent processing proved critical in accounting for the full range of conditioned inhibition phenomena, and the closely related paradigm of second-order conditioning. Finally, we show how additional separable pathways representing aversive USs, largely mirroring those for appetitive USs, also have important differences from the positive valence case, allowing the model to account for several important phenomena in aversive conditioning. Overall, accounting for all of these phenomena strongly constrains the model, thus providing a well-validated framework for understanding phasic dopamine signaling.
We address the distinction between habitual/automatic vs. goal-directed/controlled behavior, from the perspective of a computational model of the frontostriatal loops. The model exhibits a continuum of behavior between these poles, as a function of the interactive dynamics among different functionally-specialized brain areas, operating iteratively over multiple sequential steps, and having multiple nested loops of similar decision making circuits. This framework blurs the lines between these traditional distinctions in many ways. For example, although habitual actions have traditionally been considered purely automatic, the outer loop must first decide to allow such habitual actions to proceed. Furthermore, because the part of the brain that generates proposed action plans is common across habitual and controlled/goal-directed behavior, the key differences are instead in how many iterations of sequential decision-making are taken, and to what extent various forms of predictive (model-based) processes are engaged. At the core of every iterative step in our model, the basal ganglia provides a “model-free” dopamine-trained Go/NoGo evaluation of the entire distributed plan/goal/evaluation/prediction state. This evaluation serves as the fulcrum of serializing otherwise parallel neural processing. Goal-based inputs to the nominally model-free basal ganglia system are among several ways in which the popular model-based vs. model-free framework may not capture the most behaviorally and neurally relevant distinctions in this area.