Sunday, December 22, 2024 | 03:52 PM ISTEN Hindi

Home / Technology / Tech News / ChatGPT Voice Mode with GPT-4o model coming to Plus members soon: OpenAI

ChatGPT Voice Mode with GPT-4o model coming to Plus members soon: OpenAI

Voice Mode already exists in ChatGPT across both free and paid tiers, but with GPT-4o, the feature is getting low latency natural conversational experience

ChatGPT Voice Mode

Harsh Shivam New Delhi

2 min read Last Updated : Jul 26 2024 | 1:04 PM IST

Listen to This Article

OpenAI will be rolling out “Voice Mode” for the GPT-4o model in ChatGPT for Plus members starting next week. OpenAI CEO, Sam Altman, while responding to a question on X (formerly Twitter), regarding the feature’s availability said that voice mode for GPT-4o will be available in a limited “alpha” release for ChatGPT Plus subscribers from next week.

alpha rollout starts to plus subscribers next week!
— Sam Altman (@sama) July 25, 2024

When OpenAI released its new flagship AI model GPT-4o in May, it announced significant improvements to its talkback feature for ChatGPT. While Voice Mode already exists in ChatGPT across both free and paid tires, its capability is quite limited.

Also Read

OpenAI introduces SearchGPT: What is it, how it works, availability, more

OpenAI enters Google-dominated search engine market with SearchGPT

Meta, Zuckerberg, Mark Zuckerberg, Facebook

Zuckerberg's Meta aims to rival OpenAI, Google with new Llama AI model

Tech wrap Jul 19: Google Pixel 9 series, GPT-4o mini, WhatsApp update, more

OpenAI brings GPT-4o mini AI model targeting app developers: Check details

Voice Mode for GPT-4o

In the current version, Voice Mode to talk to ChatGPT works with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. This latency is the result of a data processing pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio. According to OpenAI, this process results in loss of lots of information to the main source of intelligence, GPT-4.

With the GPT-4o model, which the company said is trained end-to-end across text, vision, and audio, all inputs and outputs are processed by the same neural network. This lowers down the latency for natural conversational experience and improves results since all the information is processed over the same neural network. Additionally, OpenAI said that GPT-4o is more capable of handling interruptions, manages group conversations effectively, filters out background noise, and adapts to tone.