Slim-Llama: An Energy-Efficient LLM ASIC Processor Supporting 3-Billion Parameters at Just 4.69mW
Slim-Llama is a groundbreaking ASIC processor designed for energy-efficient deployment of large language models, achieving remarkable performance at minimal power consumption.
As large language models (LLMs) continue to advance the field of artificial intelligence, their significant energy demands pose challenges for scalability, especially in resource-constrained environments like edge devices. This poses accessibility issues, necessitating innovative solutions that facilitate the implementation of billion-parameter models without incurring high operational costs or energy consumption.
To tackle these challenges, researchers at the Korea Advanced Institute of Science and Technology (KAIST) have developed Slim-Llama, a highly efficient Application-Specific Integrated Circuit (ASIC) tailored for deploying LLMs with minimal energy footprint. By employing binary and ternary quantization, Slim-Llama reduces model weight precision to just 1 or 2 bits, effectively trimming down memory and computational demands while maintaining performance integrity. Its architecture leverages a Sparsity-aware Look-up Table (SLT) for effective sparse data management, and optimizations such as output reuse and vector indexing, enabling efficient data flow. This groundbreaking processor operates on Samsung’s 28nm CMOS technology, featuring a compact die of 20.25mm² and 500KB of on-chip SRAM, ensuring a streamlined design that eliminates traditional external memory dependencies, further enhancing energy efficiency.