OpenAI will be rolling out “Voice Mode” for the GPT-4o model in ChatGPT for Plus members starting next week. OpenAI CEO, Sam Altman, while responding to a question on X (formerly Twitter), regarding the feature’s availability said that voice mode for GPT-4o will be available in a limited “alpha” release for ChatGPT Plus subscribers from next week.
alpha rollout starts to plus subscribers next week!
— Sam Altman (@sama) July 25, 2024
When OpenAI released its new flagship AI model GPT-4o in May, it announced significant improvements to its talkback feature for ChatGPT. While Voice Mode already exists in ChatGPT across both free and paid tires, its capability is quite limited.
Also Read
Voice Mode for GPT-4o
In the current version, Voice Mode to talk to ChatGPT works with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. This latency is the result of a data processing pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio. According to OpenAI, this process results in loss of lots of information to the main source of intelligence, GPT-4.
With the GPT-4o model, which the company said is trained end-to-end across text, vision, and audio, all inputs and outputs are processed by the same neural network. This lowers down the latency for natural conversational experience and improves results since all the information is processed over the same neural network. Additionally, OpenAI said that GPT-4o is more capable of handling interruptions, manages group conversations effectively, filters out background noise, and adapts to tone.