Slim-Llama: An Energy-Efficient LLM ASIC Processor Supporting 3-Billion Parameters at Just 4.69mW
Slim-Llama stands out as a highly efficient ASIC designed for deploying large language models with remarkable energy efficiency, operating at just 4.69mW while supporting up to 3 billion parameters.
Large Language Models (LLMs) have become a cornerstone of artificial intelligence, driving advancements in natural language processing and decision-making tasks. However, their extensive power demands, resulting from high computational overhead and frequent external memory access, significantly hinder their scalability and deployment, especially in energy-constrained environments such as edge devices. This escalates the cost of operation while also limiting accessibility to these LLMs, which therefore calls for energy-efficient approaches designed to handle billion-parameter models.
To address the growing energy demands of LLMs, researchers at the Korea Advanced Institute of Science and Technology (KAIST) have introduced Slim-Llama, an Application-Specific Integrated Circuit (ASIC) that optimizes the deployment of these models. Utilizing binary and ternary quantization techniques, Slim-Llama drives down the model weight precision to 1 or 2 bits, effectively minimizing memory and computation requirements without compromising performance. Furthermore, its compact design, fabricated using Samsung’s 28nm CMOS technology, eliminates dependency on external memory, thereby enhancing energy efficiency significantly while supporting real-time applications with minimal latency.