Slim-Llama: An Energy-Efficient LLM ASIC Processor Supporting 3-Billion Parameters at Just 4.69mW
Slim-Llama revolutionizes AI hardware with its high efficiency, enabling the support of large language models with minimal power consumption.
Large Language Models (LLMs) have become a cornerstone of artificial intelligence, driving advancements in natural language processing and decision-making tasks. However, their extensive power demands significantly hinder their scalability, especially in energy-constrained environments like edge devices. This calls for new energy-efficient approaches that can handle billion-parameter models while remaining accessible and cost-effective.
To tackle these challenges, researchers at the Korea Advanced Institute of Science and Technology (KAIST) have developed Slim-Llama, a highly efficient Application-Specific Integrated Circuit (ASIC). This processor utilizes binary and ternary quantization to significantly reduce memory requirements, only using 1 or 2 bits for model weights. By integrating a Sparsity-aware Look-up Table (SLT), Slim-Llama enhances data management, allowing it to achieve a peak performance of 4.92 TOPS while maintaining an energy consumption as low as 4.69mW at optimal frequencies. With its compact design and efficient architecture, Slim-Llama represents a breakthrough in deploying energy-efficient AI models for real-time applications.