#Meta has launched two new artificial intelligence (AI) models, Llama 3 with 8 billion and 70 billion parameters, surpassing Google’s Gemini 1.5 Pro in performance.
New Delhi : Meta introduced its latest AI models, Llama 3 8B and 70B, on Thursday, claiming enhanced capabilities over its predecessors. The company also implemented new training techniques to optimize model efficiency. Notably, whereas the largest Llama 2 model had 70 billion parameters, Meta now plans for its larger models to exceed 400 billion parameters. Last week, a report indicated Meta would unveil smaller AI models in April, followed by larger ones in the summer.
Meta Llama 3 availability
Meta is adopting a community-first approach with Llama 3, making the new foundation models open source, like its previous models. According to Meta’s blog post, “Llama 3 models will soon be available on various platforms including AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake, with hardware support from AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.”
These partnerships cover major cloud, hosting, and hardware platforms, facilitating accessibility for AI enthusiasts. Furthermore, Meta has integrated Llama 3 with its Meta AI accessible via Facebook Messenger, Instagram, and WhatsApp in supported regions.
Meta Llama 3 performance and architecture
Regarding performance, Meta shared benchmark scores for Llama 3’s pre-trained and instruct models. The pre-trained Llama 3 70B model outperformed Google’s Gemini 1.0 Pro in MMLU (79.5 vs. 71.8), BIG-Bench Hard (81.3 vs. 75.0), and DROP (79.7 vs. 74.1) benchmarks. Additionally, the 70B Instruct model surpassed Gemini 1.5 Pro in MMLU, HumanEval, and GSM-8K benchmarks based on company data.
Meta has adopted a decoder-only transformer architecture for the new AI models, making several improvements over its predecessor. Llama 3 now uses a tokenizer with a vocabulary of 128K tokens and incorporates grouped query attention (GQA) to enhance inference efficiency. GQA ensures that the AI’s attention remains within its designated context when answering queries. Meta claims to have pre-trained the models with over 15 trillion tokens sourced from publicly available data.