In summary
- OpenAI launched streaming video capabilities for ChatGPT, available only to Plus, Team, and Pro subscribers.
- The feature allows you to analyze objects, solve problems and provide visual feedback in real time with low latency.
- Competitors such as Google and Meta are also developing AI assistants with vision, highlighting advances in audiovisual interactions and augmented reality.
OpenAI on Thursday unveiled ChatGPT’s long-promised video capabilities, allowing users to point their phones at objects for real-time AI analysis—a feature that had been on the shelf since its first demo in May.
Previously, you could enter text, graphics, voice, or still photos and interact with GPT. This feature, released Thursday night, allows GPT to observe you in real time and provide conversational feedback. For example, in my tests, this mode was able to solve math problems, give food recipes, tell stories, and even become my daughter’s new best friend, interacting with her while she made pancakes, giving suggestions, and encouraging her learning process. through different games.
The launch comes just a day after Google showed off its own version of a camera-enabled AI assistant powered by the newly created Gemini 2.0. Meta has also been playing in this space, with its own AI that can see and chat through phone cameras.
More Read
However, ChatGPT’s new features are not for everyone. Only Plus, Team, and Pro subscribers can access what OpenAI calls “Advanced Voice Mode with vision.” The Plus subscription costs $20 per month, and the Pro level costs $200.
“We’re excited to announce that we’re bringing video to Advanced voice mode so you can include live video and also live screen sharing in your conversations with ChatGPT,” said Kevin Weil, Chief Product Officer at OpenAI, in a video on Thursday.
The stream was part of their “12 Days of OpenAI” campaign that will show 12 different ads on as many consecutive days. So far, OpenAI has released its o1 model to all users and introduced the ChatGPT Pro plan for $200 per month, introduced boost fine-tuning for custom models, launched its generative video app Sora, updated its canvas feature, and launched ChatGPT to Apple devices through the Apple Intelligence feature.
The company gave a look at what it can do during Thursday’s livestream. The idea is that users can activate video mode, in the same interface as advanced voice, and start interacting with the chatbot in real time. The chatbot has great visual understanding and is able to provide relevant feedback with low latency, making the conversation feel natural.
Getting here wasn’t exactly a smooth road. OpenAI first promised these features “in a few weeks” in late April, but the feature was postponed following controversy for imitating actress Scarlett Johansson’s voice—without her permission—in advanced voice mode. Since the video mode depends on the advanced voice mode, that apparently slowed down the launch.
And rival Google is not idle. Project Astra just landed in the hands of “trusted testers” on Android this week, promising a similar feature: an AI that speaks multiple languages, leverages Google Search and Maps, and remembers conversations for up to 10 minutes.
However, this feature is not yet widely available, with a wider release expected early next year. Google also has more ambitious plans for its AI models, giving them the ability to execute tasks in real time, displaying agentic behavior beyond audiovisual interactions.
Meta is also fighting for a place in the next era of AI interactions. Its assistant, Meta AI, was introduced in September. It displays similar capabilities to new assistants from OpenAI and Google, providing low-latency responses and real-time video understanding.
But Meta is betting on using augmented reality to power its AI offering, with “discreet” smart glasses capable enough to power those interactions, using a small camera built into their frames. Meta calls it Project Orion.
Current ChatGPT Plus users can try out the new video features by tapping the voice icon next to the chat bar, then pressing the video button. Screen sharing takes an extra tap via the three-dot menu (aka “hamburger”).
For ChatGPT Enterprise and Edu users eager to try out the new video features, January is the magic month. As for EU subscribers? They’ll have to watch from afar for now.
Edited by Andrew Hayward
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.
Crypto Keynote USA
For the Latest Crypto News, Follow ©KeynoteUSA on Twitter Or Google News.