Release of the Brain-Inspired Spiking Large Model “SpikingBrain-1.0” Based on Endogenous Complexity
Font:【B】 【M】 【S】
Recently, Prof. Xu Bo’s team at the Institute of Automation, Chinese Academy of Sciences, and Prof. Li Guoqi’s team at the Laboratory of Brain Atlas and Brain-Inspired Intelligence, based on their series of original works on the theory of endogenous complexity, collaborated with MetaX to develop the brain-inspired spiking large model “SpikingBrain-1.0.” Full-process training and inference were completed on a domestic GPU platform with thousand-card compute power, achieving orders-of-magnitude improvements in efficiency and speed for ultra-long sequence reasoning, and demonstrating the feasibility of building a domestically controlled new non-Transformer large model architecture ecosystem. The research team has open-sourced the SpikingBrain-1.0-7B model and provided a test website for SpikingBrain-1.0-76B, while simultaneously releasing Chinese and English technical reports of SpikingBrain-1.0, a brain-inspired spiking large model validated on an industrial scale.
At present, large models based on the Transformer architecture, driven by the scaling law, improve intelligence levels by increasing network size, compute resources, and data volume. However, the basic computational unit of the model is the simple point-neuron model. We call this pathway the “exogenous complexity”-based approach to general intelligence. The inherent drawbacks of the Transformer architecture are that training costs grow quadratically with sequence length, and memory consumption during inference grows linearly with sequence length. These constitute major bottlenecks in resource consumption, limiting its ability to handle ultra-long sequences.
The research team, inspired by the complex internal working mechanisms of brain neurons, proposed a large model architecture based on “endogenous complexity,” and developed the brain-inspired spiking large model “SpikingBrain-1.0.” Theoretically, they established a connection between spiking neuron endogenous dynamics and linear attention models, revealing that existing linear attention mechanisms are a special simplified form of dendritic computation, thereby clearly demonstrating a new feasible pathway to continuously improving model complexity and performance. The team further built and open-sourced new brain-inspired foundational models based on spiking neurons, including linear (SpikingBrain-1.0-7B) and hybrid linear complexity (SpikingBrain-1.0-76B, with 12B activated parameters). They developed efficient training and inference frameworks for domestic GPUs (MetaX Xiyun C550 cluster), a Triton operator library, model parallelism strategies, and cluster communication primitives.
SpikingBrain-1.0 has achieved breakthroughs in several core performances. First, highly efficient training with extremely small data volumes: during training, the model exhibits linear or near-linear complexity, significantly improving long-sequence training efficiency. By relying on efficient transformation training paradigms, it achieves performance comparable to many open-source Transformer models on multi-task language understanding (MMLU), Chinese multi-task language understanding (CMMLU, C-Eval), and commonsense reasoning (ARC, HS) tasks with only about 2% of the pre-training data required by mainstream large models. Second, orders-of-magnitude improvement in inference efficiency: in inference, by leveraging the event-driven characteristics of spiking neurons, SpikingBrain achieves constant or partially constant complexity and storage cost. The SpikingBrain-7B model achieves a 26.5× acceleration in TTFT (time to first token) compared with Transformer architectures at sequence length of 1 million tokens, and more than 100× acceleration at sequence length of 4 million tokens. On mobile CPU, for sequence lengths of 64k-128k-256k, decoding speed improved 4.04×-7.52×-15.39× over Llama3.2 models of the same size, demonstrating orders-of-magnitude gains in efficiency and speed for ultra-long sequence processing. Third, construction of a domestically controlled brain-inspired large model ecosystem: SpikingBrain has adapted efficient training and inference frameworks, Triton operator libraries, model parallelism strategies, and communication primitives for domestic GPU clusters, demonstrating the feasibility of building a domestically controlled new non-Transformer large model architecture ecosystem. Fourth, multi-scale sparsity mechanism based on dynamic-threshold spiking: a two-stage fine-grained dynamic-threshold spiking strategy was designed, combined with a coarse-grained mixture-of-experts (MoE) scheme. This achieved more than 69.15% sparsity in the 7B model, with long-sequence spike ratio about 1.85%, providing strong support for low-power operation of brain-inspired large models.
This is the first time in China that a large-scale brain-inspired linear foundational model architecture has been proposed, and the first time that a training and inference framework for a brain-inspired spiking large model has been built on a domestic GPU cluster. The proposed model solves the problem of performance degradation of large-scale brain-inspired models under spiking-driven constraints. Its ultra-long sequence processing ability provides significant potential efficiency advantages in scenarios such as legal/medical document analysis, complex multi-agent simulation, high-energy particle physics experiments, DNA sequence analysis, and molecular dynamics trajectories. The release of this model provides a new non-Transformer architectural technical pathway for the development of next-generation artificial intelligence, and will inspire new low-power neuromorphic computing theories and chip design.
For detailed content, please refer to the technical report.
Related Links:
1)Online trial access
2)Chinese technical report
3)English technical report
4)Model code
Copyright Institute of Automation Chinese Academy of Sciences All Rights Reserved
Address: 95 Zhongguancun East Road, 100190, BEIJING, CHINA
Email:brain-ai@ia.ac.cn