OpenAI’s ChatGPT is more powerful now. The large language model chatbot now accepts commands via pictures and voice inputs enabling user to interact with ChatGPT in a more natural and intuitive way.
“We are beginning to roll out new voice and image capabilities in ChatGPT. They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about,” OpenAI announced.
Now before you get all excited, you should know that this feature isn’t available for free users. The company is rolling out voice and images in ChatGPT to Plus and Enterprise users only. The roll out will happen over the next two weeks. Voice is coming on iOS and Android (opt-in in your settings) and images will be available on all platforms. The developers are also set to get this feature soon.
ALSO READ l ChatGPT made of water? New report reveals startling details about how OpenAI built its AI chatbot
‘You can now use voice to engage in a back-and-forth conversation with your assistant. Speak with it on the go, request a bedtime story for your family, or settle a dinner table debate.”
To use the new image feature, users simply need to tap the photo button to capture or choose an image. If you’re on iOS or Android, tap the plus button first. You can also discuss multiple images or use the bot’s drawing tool to guide your assistant.
The image understanding is powered by multimodal GPT-3.5 and GPT-4. These models apply their language reasoning skills to a wide range of images, such as photographs, screenshots, and documents containing both text and images.
To get started with voice, head to Settings> Features on the mobile app and opt into voice conversations. Then, tap the headphone button located in the top-right corner of the home screen and choose your preferred voice out of five different voices.
The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from text and short sample speech. OpenAI has collaborated with professional voice actors to create each of the voices. It also uses Whisper, its open-source speech recognition system, to transcribe spoken words into text.
Follow FE Tech Bytes on Twitter, Instagram, LinkedIn, Facebook.
