In the evolving landscape of artificial intelligence, the race among tech giants to create the largest language models is beginning to shift. A new trend is emerging: smaller language models (SLMs) are gaining prominence, challenging the long-held belief that “bigger is better.”
On August 21, both Microsoft and Nvidia released their latest small language models, Phi-3.5-mini-instruct and Mistral-NeMo-Minitron 8B. These models exemplify a balance between computational resource efficiency and performance, with some even rivaling larger models in specific areas.
Clément Delangue, CEO of AI startup Hugging Face, highlighted that SLMs can address up to 99% of use cases, predicting that 2024 will be “the year of the SLM.” Major tech companies like Meta, Microsoft, and Google have already released nine small models this year alone, signaling a broader industry shift.
The Shift to Smaller Models: Efficiency and Cost-Effectiveness
The rise of SLMs can be attributed to the challenges associated with large language models (LLMs), particularly in terms of diminishing performance improvements and increasing resource demands.
A performance comparison conducted in April by AI startups Vellum and Hugging Face revealed that the performance gap between LLMs is rapidly narrowing, especially in tasks such as multiple-choice questions, reasoning, and math problems. For example, top models like Claude 3 Opus, GPT-4, and Gemini Ultra achieved scores exceeding 83% on multiple-choice questions.
Gary Marcus, former head of Uber AI, noted that recent research on LLMs indicates that most are now in the same performance league as GPT-4, with only marginal improvements.
Meanwhile, the costs associated with training LLMs continue to soar. These models require massive datasets and billions, even trillions, of parameters, leading to exorbitant resource consumption. The computational power and energy required to train and deploy LLMs are staggering, making it difficult for smaller organizations or individuals to participate in core LLM development.
The International Energy Agency estimates that by 2026, the electricity consumption related to data centers, cryptocurrencies, and AI will be roughly equivalent to Japan’s total electricity consumption.
Sam Altman, CEO of OpenAI, disclosed that training GPT-4 cost at least $100 million. Dario Amodei, CEO of Anthropic, projected that the cost of training future models could reach an astounding $100 billion.
In addition, the complexity of the tools and techniques required to use LLMs presents a steep learning curve for developers, slowing down the overall development process. Another significant issue is LLMs’ propensity to hallucinate—generating outputs that appear plausible but are factually incorrect, a byproduct of their design to predict the next most likely word based on data patterns rather than understanding the content.
The Advantages of Smaller Language Models
Concerns over the enormous energy consumption of LLMs and the market opportunity to provide more diverse AI options have shifted tech companies’ focus toward SLMs.
SLMs, being smaller and more efficient, require less data and training time. This makes them more cost-effective and easier to deploy on smaller devices, such as smartphones, without the need for supercomputing resources.
Another significant advantage of SLMs is their specialization. By focusing on specific tasks or domains, SLMs can be more efficient in practical applications. This customization allows businesses to develop models that precisely meet their unique needs.
Despite their smaller size, SLMs can perform as well as larger models in certain areas. For instance, Microsoft’s Phi-3.5-mini-instruct, with only 3.8 billion parameters, outperforms models with significantly more parameters in certain tasks.
Key Takeaways
- The shift towards smaller language models is driven by the need for cost-effectiveness, efficiency, and specialization.
- SLMs offer a balance between performance and resource consumption, making them suitable for a broader range of applications.
- While SLMs have limitations, such as a narrower scope of knowledge, they present a promising alternative to larger, more resource-intensive models.