• Share this blog :        


  • December 19, 2023
  • Hiba Moideen
Phi-2: Unveiling the Potency of Small Language Models in the Realm of AI

In the developing world of large language models (LLM) such as GPT-4 and Bard, Microsoft has taken a surprising leap with the introduction of Phi-2, a small language model (SLM) that packs a punch with its 2.7 billion parameters. Breaking away from the convention of size equating to capability, Phi-2 emerges as a powerful contender, outperforming larger models like Llama-2, Mistral, and Gemini-2 in various generative AI benchmark tests. Here we delve into the details of Phi-2, exploring its capabilities, training methodology, and the intriguing insights it brings to the forefront.

The Genesis of Phi-2:

Phi-2 made its debut following the announcement by Satya Nadella at Ignite 2023, positioning itself as a testament to Microsoft's commitment to pushing the boundaries of AI research. Developed by the Microsoft research team, Phi-2 is presented as an upgraded version of Phi-1.5, boasting "common sense," "language understanding," and "logical reasoning" capabilities. Despite its relatively modest size in the world of language models, Phi-2 has managed to achieve feats that surpass models 25 times its size, as claimed by Microsoft.

Training and Capabilities:

Microsoft has equipped Phi-2 with a comprehensive training regimen, leveraging "textbook-quality" data that includes synthetic datasets, general knowledge, theory of mind, and daily activities. As a transformer-based model with a next-word prediction objective, Phi-2 stands out for its efficiency in training. Trained on 96 A100 GPUs for a mere 14 days, Phi-2 presents a more cost-effective and streamlined alternative to the formidable GPT-4, which demands extensive resources and time, taking around 90-100 days for training using tens of thousands of A100 Tensor Core GPUs.

Beyond traditional language understanding, Phi-2 exhibits a flair for solving complex mathematical equations and physics problems. Remarkably, it can even identify errors in calculations made by students. This multifaceted capability extends to benchmarks encompassing commonsense reasoning, language understanding, mathematics, and coding, where Phi-2 consistently outperforms models like the 13B Llama-2 and the 7B Mistral. Strikingly, it surpasses the 70B Llama-2 LLM and even outshines the Google Gemini Nano 2, a 3.25B model native to the Google Pixel 8 Pro.

Advantages of Small Language Models:

The significance of Phi-2's achievements becomes even more pronounced when considering the advantages of small language models. Unlike their larger counterparts, smaller models like Phi-2 offer cost-effective solutions with lower power and computing requirements. These models are not only more accessible but also hold the potential to be trained for specific tasks, running natively on devices and reducing output latency. The economic viability and efficiency of small language models position them as formidable players in the AI landscape.

Phi-2 Evaluation:

To gauge Phi-2's performance, Microsoft conducted extensive evaluations across academic benchmarks, covering categories like Big Bench Hard, commonsense reasoning, language understanding, math, and coding. With a modest 2.7 billion parameters, Phi-2 outshines Mistral and Llama-2 models with 7B and 13B parameters on aggregated benchmarks.

Particularly noteworthy is its superior performance compared to the 25x larger Llama-2-70B model on multi-step reasoning tasks, such as coding and math. Furthermore, Phi-2 proves its mettle by matching or outperforming the recently-announced Google Gemini Nano 2, despite its smaller size.

Acknowledging the challenges associated with model evaluation, Microsoft emphasizes the importance of concrete use cases as the ultimate test for language models. In alignment with this philosophy, Phi-2's evaluation extends beyond public benchmarks to include Microsoft's internal proprietary datasets and tasks, consistently showcasing its superiority over Mistral-7B and various Llama-2 models.

Phi-2 emerges as a revelation in the world of language models, challenging the notion that bigger is always better. Microsoft's innovative approach to developing a small language model with remarkable capabilities opens new doors for AI research and application. The advantages of cost-effectiveness, efficiency, and specific task-oriented training make small language models like Phi-2 not just contenders but preferred choices in certain scenarios. As we witness the surprising power packed into Phi-2, it prompts us to rethink the dynamics of language models, paving the way for a future where size may no longer be the sole determinant of AI prowess.