ChatGPT rolls out voice and image capabilities

Parents can now outsource bedtime stories.
By
Cecily Mauran
 on 
ChatGPT app on a smartphone
ChatGPT kind of has eyes and ears now. Credit: Getty Images

Everyone's favorite chatbot can now see and hear and speak. On Monday, OpenAI announced new multimodal capabilities for ChatGPT. Users can now have voice conversations or share images with ChatGPT in real-time.

Audio and multimodal features have become the next phase in fierce generative AI competition. Meta recently launched AudioCraft for generating music with AI and Google Bard and Microsoft Bing have both deployed multimodal features for their chat experiences. Just last week, Amazon previewed a revamped version of Alexa that will be powered by its own LLM (large language model), and even Apple is experimenting with AI generated voice, with Personal Voice.

Voice capabilities will be available on iOS and Android. Like Alexa or Siri, you can tap to speak to ChatGPT and it will speak back to you in one of five preferred voice options. Unlike, current voice assistants out there, ChatGPT is powered by more advanced LLMs, so what you'll hear is the same type of conversational and creative response that OpenAI's GPT-4 and GPT-3.5 is capable of creating with text. The example that OpenAI shared in the announcement is generating a bedtime story from a voice prompt. So, exhausted parents at the end of a long day can outsource their creativity to ChatGPT.

Mashable Light Speed
Want more out-of-this world tech, space and science stories?
Sign up for Mashable's weekly Light Speed newsletter.
By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up!

Multimodal recognition is something that's been forecasted for a while, and is now launching in a user-friendly fashion for ChatGPT. When GPT-4 was released last March, OpenAI showcased its ability to understand and interpret images and handwritten text. Now it will be a part of everyday ChatGPT use. Users can upload an image of something and ask ChatGPT about it — identifying a cloud, or making a meal plan based on a photo of the contents of your fridge. Multimodal will be available on all platforms.

As with any generative AI advancement, there are serious ethics and privacy issues to consider. To mitigate risks of audio deepfakes, OpenAI says it is only using its audio recognition technology for the specific "voice chat" use case. Also, it was created with voice actors they have "directly worked with." That said, the announcement doesn't mention whether users' voices can be used to train the model, when you opt in to voice chat. For ChatGPT's multimodal capabilities, OpenAI says it has "taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy." But the real test of nefarious uses won't be known until it's released into the wild.

Voice chat and images will roll out to ChatGPT Plus and Enterprise users in the next two weeks, and to all users "soon after."

Mashable Image
Cecily Mauran

Cecily is a tech reporter at Mashable who covers AI, Apple, and emerging tech trends. Before getting her master's degree at Columbia Journalism School, she spent several years working with startups and social impact businesses for Unreasonable Group and B Lab. Before that, she co-founded a startup consulting business for emerging entrepreneurial hubs in South America, Europe, and Asia. You can find her on Twitter at @cecily_mauran.


Recommended For You
4 wild ways ChatGPT image generation is being used now that it's free
open ai and chatgpt logo on phone

OpenAI announces native image generation in ChatGPT and Sora
openai livestream with sam altman

I compared Sesame to ChatGPT voice mode and I'm unnerved
microphone icon surrounded by visualizations of sound waves.


ChatGPT isn't responsible for the Los Angeles fires, but it does use a crazy amount of water
Firefighters continue battling Palisades fire in Los Angeles as flames rage out of control

Trending on Mashable
NYT Connections hints today: Clues, answers for April 2, 2025
Connections game on a smartphone

Wordle today: Answer, hints for April 2, 2025
Wordle game on a smartphone

NYT Strands hints, answers for April 2, 2025
A game being played on a smartphone.


NYT Connections Sports Edition today: Hints and answers for April 2
A phone displaying the New York Times game 'Connections.'
The biggest stories of the day delivered to your inbox.
These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up. See you at your inbox!