For quite some time, the prevailing wisdom in the realm of artificial intelligence has been that bigger and more intricate models yield superior results. Larger models, imbued with vast amounts of data, have been thought to guarantee enhanced AI performance.
However, the current market landscape is vastly different from this belief. As it stands, companies delving into generative AI today find that they don’t necessarily require models equipped with a trillion parameters or even the colossal hundreds of billions of parameters that frontier large language models (LLMs) boast. Instead, many organizations are turning to small language models (SLMs) tailored specifically for unique tasks.
These smaller model sizes, ranging from a modest one hundred million to an impressive one hundred billion parameters, are designed to run efficiently on personal computers and even smartphones. The potential applications are broadening by the day—who’s to say SLMs won’t eventually be at the heart of virtual reality headsets?
Typically, the journey of experimentation and learning goes something like this: companies initially deploy LLMs to realize proof-of-concepts but soon understand that they can achieve comparable outcomes at significantly lower costs by leveraging smaller models. Giants like Microsoft, Meta, and Google, along with startups such as Hugging Face, Mistral, and Anthropic, are leading the charge in this direction.
By integrating SLMs with techniques like fine-tuning and retrieval augmented generation (RAG) to refine outputs using proprietary data, companies are beginning to automate processes such as document retrieval and customer service data analysis to align with consumer behaviors, as noted by The Wall Street Journal. The range of applications continues to expand.
Reaping Big Benefits by Going Small
The consideration of cost weighs heavily on the minds of IT leaders contemplating investments in emerging technologies like generative AI. Even OpenAI, a leading frontier model developer whose ChatGPT has sparked widespread interest in GenAI, recently introduced a smaller, more cost-effective model.
But lowering expenses isn’t the sole advantage of using SLMs.
Boosting Speed and Efficiency: LLMs necessitate multiple GPUs, which can slow down the inference time. Conversely, SLMs are capable of running on local machines, delivering prompt responses without needing a cloud connection.
Reducing Latency: A lesser number of parameters generally translates to quicker response times to prompts. While SLMs might not outpace GPT-4o in the near future, depending on an organization’s requirements or specific use cases, such performance might not be necessary.
Domain Specificity: Typically trained on specific domains, SLMs can deliver more pertinent results compared to LLMs, which strive for general applicability. This specificity is beneficial in scenarios where corporate intellectual property is embedded within the model.
Limiting Errors: Concerns about AI models producing inaccurate or biased information persist. Industry consensus suggests that utilizing a smaller set of training data, along with RAG and human-in-the-loop approaches, can minimize inaccuracies. This measure helps protect intellectual property and corporate reputation.
Enhancing Sustainability: Reducing the size of AI models by opting for SLMs over LLMs also benefits the environment. AI traditionally consumes massive amounts of power for token production. Reducing this energy quotient is crucial for adhering to corporate sustainability goals.
Observers tracking the rising popularity of SLMs have made several high-level observations on this trend towards downsizing.
As Andrej Karpathy, a former OpenAI engineer, explains, models had to expand before they could be condensed into formats that now render them more manageable and efficient. Developer Drew Breunig highlights that model creators are becoming more discerning regarding the training data they employ to craft these smaller, more efficient models.
Given the dwindling availability of data to train LLMs, it appears that smaller models are not just a practical necessity but also a boon for IT budgets and overall efficiency.
Harnessing the Dell AI Factory
Regardless of your chosen model path, bringing a proof-of-concept to life can seem overwhelming. Fortunately, it need not be.
Dell Technologies lays out a clear roadmap for AI model and infrastructure deployment, whether you’re integrating an SLM, LLM, or anything in between.
The Dell AI Factory aids organizations in navigating these challenges, providing guidance on preparing corporate data and selecting AI-enabled infrastructure. Through partnerships within the open ecosystem and access to professional services, Dell AI Factory equips you with the necessary use cases and tools for streamlined AI deployment.
The AI era embraces models of varying scales. Remember, achieving significant business outcomes does not always necessitate going big.
Discover more about the Dell AI Factory in this webinar.