OpenAI has officially launched real-time video analysis for ChatGPT, enhancing its Advanced Voice Mode with vision. First showcased seven months ago, this groundbreaking feature allows users to interact with objects through their device’s camera and receive instant responses.
The update is available to ChatGPT Plus, Team, and Pro subscribers, further expanding the AI’s multimodal capabilities.
Revolutionizing Interaction: What the Feature Offers
1. Real-Time Object Recognition
Point your phone at an object, and ChatGPT can identify and discuss it in near real time.
2. Screen Analysis
Through screen sharing, the AI can interpret and explain device screens, such as settings menus or complex math problems.
3. Ease of Use
- Activate the feature by tapping the voice icon next to the chat bar in the ChatGPT app.
- Start the video mode by selecting the video icon on the bottom left.
- To share your screen, access the three-dot menu and choose Share Screen.
Rollout Details
OpenAI has begun rolling out the feature, aiming for completion within a week. However, access will initially exclude ChatGPT Enterprise and Edu subscribers, who will have to wait until January. Users in the European Union, Switzerland, Iceland, Norway, and Liechtenstein are also excluded for now, with no release timeline specified.
In Action: A Peek Behind the Curtain
The potential of Advanced Voice Mode with vision was highlighted during a recent demo on 60 Minutes. OpenAI President Greg Brockman put the feature to the test by quizzing journalist Anderson Cooper on anatomy. ChatGPT not only identified hand-drawn body parts but also provided constructive feedback.
“The location is spot on,” ChatGPT responded when Cooper drew the brain. “As for the shape, it’s a good start. The brain is more of an oval.”
However, the demo wasn’t without flaws—ChatGPT made an error on a geometry problem, underscoring the system’s occasional tendency to hallucinate.
The Road to Launch
OpenAI faced significant delays in rolling out the visual component of Advanced Voice Mode. Initially promised in April, the feature underwent months of refinement to meet production standards. In the interim, OpenAI expanded voice-only capabilities to more platforms, particularly in the EU.
Competitive Landscape
Rival tech giants are racing to develop similar technology. This week, Google began testing Project Astra, its video-interpreting AI, with select Android users. Meta is also exploring related advancements, intensifying competition in the conversational AI space.
A Festive Addition
As part of the update, OpenAI introduced Santa Mode, which allows users to switch ChatGPT’s voice to Santa Claus. Accessible via the snowflake icon in the app, this playful feature adds a touch of holiday cheer to the ChatGPT experience.
A Step Toward the Future
The release of real-time video analysis marks a majoAr milestone for OpenAI, solidifying its position as a leader in multimodal AI innovation. While challenges remain, the integration of vision into ChatGPT opens new possibilities for seamless interaction between humans and machines.