Business Standard

Sunday, December 22, 2024 | 04:27 PM ISTEN Hindi

Notification Icon
userprofile IconSearch

Gemini 1.5: Google announces next-gen AI model with new architecture design

Google is releasing the mid-sized multimodal Pro model of Gemini 1.5 for early testing with the ability to process more information compared to the Gemini 1.0.

Google Gemini 1.5

Google Gemini 1.5

Harsh Shivam New Delhi

Listen to This Article

Google has announced its next-generation AI model- Gemini 1.5. According to the company, this multimodal large language model (MLLM) showcases "dramatic improvements" in various departments. Google said that this new model could achieve comparable quality to Gemini Ultra 1.0, which is Google's most advanced AI model currently while using less computation.

The first Gemini 1.5 model that the company is releasing for early testing is the Pro model. Gemini 1.5 Pro, which is a mid-size multimodal, will be available to select developers and enterprise customers through AI Studio and Vertex AI in a private preview.

Google CEO Sundar Pichai, in a blog post, stated that the Gemini 1.5 Pro model can process more information compared to the previous generation. "We've been able to significantly increase the amount of information our models can process — running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet," wrote Pichai.

READ: Google to replace Assistant by Gemini on wearable audio accessories: Report
 

 

Gemini 1.5: What is new

Google Gemini 1.5 model is based on Mixture-of-Experts (MoE) architecture. Compared to traditional Transfer architecture that works as one large neural network, models based on MoE divide the network into smaller "experts" that are specialised to compute a specific task. 

Depending on the type of input provided, these models selectively activate only the most relevant expert to carry out the task. This technique enhances the efficiency of the model and also improves the quality of the output. MoE architecture also allows the model to be trained to carry out more complex tasks.

Google said that the Gemini 1.5 model has a bigger "context window". The context window is made up of tokens, which can be words, images, videos or codes. The bigger the context window, the more information a model can take as an input. 

READ: Keyframer: Apple's new AI-editor for generating animations with text input

With Gemini 1.5 Pro, which is currently under testing, Google has increased the context window capacity to 1 million tokens from 32,000 on the Gemini 1.0 model. Google said that the new model is capable of processing one hour of video, 11 hours of audio and over 700,000 words in one go.

Don't miss the most important news and views of the day. Get them on our Telegram channel

First Published: Feb 16 2024 | 11:27 AM IST

Explore News