Slim-Llama: An Energy-Efficient LLM ASIC Processor Supporting 3-Billion Parameters at Just 4.69mW
Large Language Models (LLMs) have become a cornerstone of artificial intelligence, driving advancements in natural language processing and decision-making tasks. However, their extensive power demands, stemming from high computational overhead and frequent external memory access, significantly hinder scalability and deployment, particularly in energy-constrained environments like edge devices. This situation escalates operational costs and limits accessibility, necessitating energy-efficient solutions capable of effectively managing billion-parameter models.
In response to these challenges, researchers at the Korea Advanced Institute of Science and Technology (KAIST) have developed Slim-Llama, a highly efficient ASIC that optimizes LLM deployment through binary and ternary quantization. This innovative approach reduces the precision of model weights to 1 or 2 bits, significantly lowering memory and computational demands while maintaining performance. By employing a Sparsity-aware Look-up Table (SLT), which facilitates efficient sparse data management, Slim-Llama eliminates the reliance on external memory—a primary energy drain in traditional systems—while supporting bandwidths of up to 1.6GB/s at 200MHz. The processor achieves a peak 4.92 TOPS performance with an energy efficiency of 1.31 TOPS/W, addressing the urgent need for sustainable AI hardware.