• Share this News :        


  • March 19, 2024
  • Sayana Chandran
Apple Unveils MM1: First Family of Multimodal Large Language Models

Apple has introduced its inaugural lineup of Multimodal Large Language Models (MLLMs) under the banner of MM1. Published under the title "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training," the paper elucidates Apple's foray into the realm of MLLMs with the MM1 series. Noteworthy for its prowess in image captioning, visual question answering (VQA), and natural language inference, MM1 showcases remarkable capabilities. The research team attributes its success to meticulous curation of image-caption pairs, yielding unparalleled performance, particularly in scenarios requiring few-shot learning.What distinguishes MM1 from its counterparts is its unparalleled aptitude to adhere to instructions across multiple images and engage in complex scene reasoning. Housing up to 30 billion parameters, MM1 surpasses the GPT-4V component of OpenAI's GPT-4 in vision capabilities, boasting three times the parameter count.

MM1 underwent extensive multimodal pretraining on a colossal dataset comprising 500 million interleaved image-text documents, encompassing 1 billion images and 500 billion text tokens. This vast and diverse pretraining regimen empowers MM1 to deliver remarkable in-context predictions and seamlessly adapt to custom formatting with minimal few-shot examples.While developing MM1, the researchers discovered that the key to its performance lies not solely in the design of the vision-language connector, but rather in factors such as image resolution and the number of image tokens. Apple's transparent approach in sharing its research aims to offer valuable insights and design lessons to the broader AI community, potentially influencing the architectural and pre-training data choices of other MLLM developers.

As MM1's capabilities are unveiled, speculations arise regarding its integration into Apple's products, particularly in enhancing the intelligence of virtual assistant Siri. While the specifics of MM1's implementation remain undisclosed, the showcased examples hint at a future where Siri evolves into a significantly more adept assistant, equipped with the ability to perceive and understand visual information. Apple's strides in MLLM technology promise to reshape the landscape of AI integration, setting a precedent for innovation and collaboration within the field.