Slim-Llama: An Energy-Efficient LLM ASIC Processor Supporting 3-Billion Parameters at Just 4.69mW
Researchers at KAIST have unveiled Slim-Llama, a groundbreaking ASIC processor designed for energy-efficient deployment of large language models with up to 3 billion parameters, consuming only 4.69mW.
Large Language Models (LLMs) have become a cornerstone of artificial intelligence, driving advancements in natural language processing and decision-making tasks. However, their extensive power demands significantly hinder scalability, especially in energy-constrained environments like edge devices. The high computational overhead and reliance on external memory escalate costs and limit accessibility, highlighting the urgent need for energy-efficient solutions to support billion-parameter models.
To address these limitations, researchers at the Korea Advanced Institute of Science and Technology (KAIST) developed Slim-Llama, an ASIC processor utilizing binary/ternary quantization to reduce weight precision from real to 1 or 2 bits. This innovation minimizes memory and computational demands while maintaining performance via a Sparsity-aware Look-up Table (SLT) for efficient data management. Slim-Llama, built on Samsung’s 28nm CMOS technology, is incredibly compact at 20.25mm² and packed with 500KB of on-chip SRAM, eliminating the energy costs associated with external memory. This processor operates at a peak power of just 4.69mW, showcasing remarkable energy efficiency and performance capable of real-time processing for large-scale AI applications.