ModernBERT: A Significant Leap in NLP with Enhanced Speed and Accuracy
LightOn and Answer.ai announce ModernBERT, a new series of transformer models that significantly improve upon BERT’s performance in both speed and accuracy, addressing key limitations of earlier models.
The landscape of natural language processing (NLP) has been significantly transformed since BERT's introduction in 2018. While BERT set a foundation for efficient retrieval and classification with its encoder-only transformer architecture, its limitations have become increasingly apparent. With a capped sequence length of 512 tokens and outdated design elements, BERT struggles with long-context tasks and adapting to new hardware advancements. To address these ongoing challenges, researchers from LightOn, Answer.ai, Johns Hopkins University, NVIDIA, and Hugging Face have unveiled ModernBERT, a cutting-edge family of models that not only extends the sequence length to 8,192 tokens but also enhances computational efficiency through innovative architectural changes.
ModernBERT introduces advanced features like Flash Attention 2 and rotary positional embeddings (RoPE) to optimize performance and resource efficiency. With training on a diverse dataset of 2 trillion tokens, this model series, available in base (139M parameters) and large (395M parameters) configurations, has demonstrated superior performance across various benchmarks, including the General Language Understanding Evaluation (GLUE) and retrieval tasks. It excels particularly in long-context tasks and code-related applications, consistently outperforming previous standards set by models like RoBERTa and DeBERTa. Released under the Apache 2.0 license, ModernBERT is accessible for researchers and practitioners aiming for advances in NLP applications ranging from semantic search to sophisticated code retrieval.