Google has announced its next-generation AI model- Gemini 1.5. According to the company, this multimodal large language model (MLLM) showcases "dramatic improvements" in various departments. Google said that this new model could achieve comparable quality to Gemini Ultra 1.0, which is Google's most advanced AI model currently while using less computation.
The first Gemini 1.5 model that the company is releasing for early testing is the Pro model. Gemini 1.5 Pro, which is a mid-size multimodal, will be available to select developers and enterprise customers through AI Studio and Vertex AI in a private preview.
Google CEO Sundar Pichai, in a blog post, stated that the Gemini 1.5 Pro model can process more information compared to the previous generation. "We've been able to significantly increase the amount of information our models can process — running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet," wrote Pichai.
READ: Google to replace Assistant by Gemini on wearable audio accessories: Report
Gemini 1.5: What is new
Google Gemini 1.5 model is based on Mixture-of-Experts (MoE) architecture. Compared to traditional Transfer architecture that works as one large neural network, models based on MoE divide the network into smaller "experts" that are specialised to compute a specific task.
Depending on the type of input provided, these models selectively activate only the most relevant expert to carry out the task. This technique enhances the efficiency of the model and also improves the quality of the output. MoE architecture also allows the model to be trained to carry out more complex tasks.
Google said that the Gemini 1.5 model has a bigger "context window". The context window is made up of tokens, which can be words, images, videos or codes. The bigger the context window, the more information a model can take as an input.
READ: Keyframer: Apple's new AI-editor for generating animations with text input
With Gemini 1.5 Pro, which is currently under testing, Google has increased the context window capacity to 1 million tokens from 32,000 on the Gemini 1.0 model. Google said that the new model is capable of processing one hour of video, 11 hours of audio and over 700,000 words in one go.