Business Standard

OpenAI announces GPT-4o, ChatGPT macOS app, conversational AI in Voice Mode

OpenAI said the GPT-4o is its most advanced model that is trained end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network

OpenAI's GPT-4o

OpenAI's GPT-4o

Harsh Shivam New Delhi
OpenAI has announced GPT-4o, its maiden artificial intelligence model with native support to reason across audio, visual and text. OpenAI said the “o” in GPT-4o stands for “Omni”, since it is much better at understanding and interpreting texts, images and audios than its predecessor. Alongside, the company announced ChatGPT application for Apple’s macOS-based desktops, and previewed conversational AI in Voice Mode. Below are the details:

GPT-4o

OpenAI calls the GPT-4o “a step towards much more natural human-computer interaction”. The new version of the company’s GPT-4 model is capable of taking any combination of text, audio and image as input and producing output along the same lines. The GPT-4o model can respond to audio inputs in 232 milliseconds, which the company said is similar to a human’s response time during a conversation.
 

Comparing it to the existing GPT-4 Turbo model, which is another iteration of the company’s GPT-4 model, the GPT-4o matches its performance for English text understanding and coding, while significantly outperforming it in audio understanding. The GPT-4o model also brings significant improvements on text in non-English languages.

OpenAI said the GPT-4o model brings significant improvements in understanding images. For example, with ChatGPT based on GPT-4o, users can share an image of a food menu in different languages and ask the chatbot to translate it, learn about the food’s history, and get recommendations based on it.

Voice Mode with GPT-4o

Talkback feature in Voice Mode already exists in ChatGPT across both free and paid tires. However, OpenAI said that the new GPT-4o model brings significant improvements to it. OpenAI said the GPT-4o is its most advanced model that is trained end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. It essentially lowers down the latency for natural conversational experience and improves results since all the information is processed over the same neural network.

Prior to GPT-4o, OpenAI said, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. This latency is the result of a data processing pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio. According to OpenAI, this process resulted in loss of lots of information to the main source of intelligence, GPT-4.

ChatGPT app for macOS

Expanding the ChatGPT app ecosystem, OpenAI launched the chatbot app for Apple’s macOS-based desktops. The ChatGPT app for macOS will have deeper integration into the platform. With a keyboard shortcut (Option + Space) users will be directed on to the conversation page of ChatGPT to prompt the chatbot with a query.

OpenAI confirmed that it is currently working on the Windows version of the app that will be launched “later this year”.

The macOS app for ChatGPT is currently rolling-out to Plus subscribers and will also be available to free tier users in the coming weeks.

ChatGPT app for macOS
ChatGPT app for macOS
Expanding more capabilities to free users

The new GPT-4o model is available on ChatGPT for free tier users, but with a limit on the number of messages. This limit will depend on usage and demand at the time of use and ChatGPT will automatically switch to GPT-3.5 once the limit is reached. However, while using chatGPT with GPT-4o, a free tier user will get access to some of the advanced features which were limited to paid tier subscribers earlier.

A free tier user with GPT-4o can upload files and pictures for summarising, analysing, and more. With the new model, free users can leverage the “Memory” feature and ask ChatGPT to remember information for future conversations. Additionally, free tier users will get access to the GPT Store for browsing and using custom bots. The GPT store was launched earlier this year for paid subscribers, allowing users to create their own chatbots, called GPTs, and share them on the store for other users. While free tier users will gain access to the GPT store and custom GPTs, they cannot create and share one.

What remains exclusive to paid-tier users

While free-tier users are getting features that were previously limited to the paid tier, the new Voice Mode with GPT-4o will remain exclusive to paid-tier subscribers. The Voice Mode with GPT-4o model support will be rolled-out to ChatGPT Plus subscribers in the coming weeks, while it will be soon available to Team and enterprise users. OpenAI is also rolling-out the GPT-4o model to paid subscribers with “fewer limitations”.

The new model is rolling-out to ChatGPT Plus and Team users, while it will be available for Enterprise users in the coming days. The company said that Plus users will have a message limit up to 5 times greater than free users, and Team and Enterprise users will have even higher limits.

Don't miss the most important news and views of the day. Get them on our Telegram channel

First Published: May 14 2024 | 11:32 AM IST

Explore News