In summary
- Gemini 2.0 debuted with multimodal capabilities and a focus on AI agents, seeking to lead in autonomous and personalized interaction.
- Google will integrate Gemini 2.0 into its products from January, rivaling OpenAI and Anthropic in premium AI services.
This week, Google launched Gemini 2.0, giving its latest AI model with autonomous capabilities and multimodal features.
What’s immediately notable in this release is that Google sees AI chatbots evolving into AI Agents—personalized software that uses generative AI to interact with users and understand and execute tasks in real time.
“With new advances in multimodality—such as native image and audio output—and native use of tools, it will allow us to build new AI agents that bring us closer to our vision of a universal assistant,” said Google CEO Sundar Pichai.
The model builds on the multimodal foundations of Gemini 1.5 with new native image and text-to-speech capabilities, along with improved reasoning abilities.
More Read
According to Google, the 2.0 Flash variant outperforms the previous 1.5 Pro model in key benchmarks while running at twice the speed.
This model is currently available to users who pay for Google Advanced—the paid subscription designed to compete against Claude and ChatGPT Plus.
Those willing to dig deeper can enjoy a more complete experience by accessing the model through Google AI Studio.
From there, users can upload up to 1 million context tokens—nearly 10 times the capacity of ChatGPT—along with features like audiovisual input support, link fact checking, code execution, and configurable settings like “temperature” for randomness of responses and “Top P” for lexical variation, allowing control over the creativity or factuality of the model.
It is important to consider that this interface is more complex than the simple, direct and friendly user interface that Gemini provides.
In addition, it is more powerful but much slower. In our tests, we asked it to parse a 74K token document, and it took almost 10 minutes to produce a response.
However, the output was accurate enough, without hallucinations. Longer documents of around 200K tokens (almost 150,000 words) will take considerably longer to parse, but the model is capable of getting the job done if you have enough patience.
Google also implemented a “Deep Dive” feature, available now in Gemini Advanced, to leverage the model’s enhanced long-context and reasoning capabilities to explore complex topics and compile reports.
This allows users to address different topics in more depth than they would using a regular template designed to provide more direct answers. However, it is based on Gemini 1.5, and there is no timeline to follow until there is a version based on Gemini 2.0.
This new feature puts Gemini in direct competition with services like Perplexity, You.com Research Assistant, and even the lesser-known BeaGo, all offering a similar experience. However, Google’s service offers something different. Before providing information, the best approach to the task must be worked out.
Additionally, it presents a plan to the user, who can edit it to include or exclude information, add more research materials, or extract fragments of information. Once the methodology has been established, they can instruct the chatbot to begin its investigation. Until now, no AI service has offered researchers this level of control and customization.
In our tests, a simple prompt like “Investigate the impact of AI on human relationships” triggered an investigation of more than a dozen trusted scientific or official sites, with the model producing a 3-page document based on 8 properly cited sources . Which, by the way, wasn’t bad at all.
Project Astra: Gemini Multimodal AI Assistant
Google also shared a video showing off Project Astra, its experimental AI assistant powered by Gemini 2.0. Astra is Google’s answer to Meta AI: An AI assistant that interacts with people in real time, using the smartphone’s camera and microphone as inputs and providing responses in voice mode.
Google has given Project Astra expanded capabilities, including multilingual conversations with improved accent recognition, integration with Google Search, Lens and Maps, an extended memory that retains 10 minutes of conversation context, long-term memory, and low conversation latency at through new streaming capabilities.
Despite a somewhat cool reception on social media—the Google video has only garnered 90K views since its launch—the launch of the new model family appears to be gaining decent traction among users, with a significant increase in web searches, especially considering it was announced during a major ChatGPT Plus outage.
Google’s announcement this week makes it clear that it is trying to compete against OpenAI to be the industry leader in generative AI.
In fact, its announcement falls in the middle of OpenAI’s “12 Days of Christmas” campaign, in which the company reveals a new product daily.
So far, OpenAI has revealed a new reasoning model (o1), a video generation tool (Sora), and a $200 monthly “Pro” subscription.
Google also revealed its new AI-powered Chrome extension, Project Mariner, which uses agents to navigate websites and complete tasks. In tests against the WebVoyager benchmark for real-world web tasks, Mariner achieved an 83.5% success rate working as a single agent, Google said.
“Over the last year, we have been investing in developing more agentic (or smarter) models, meaning they can understand more about the world around you, think several steps ahead, and take actions on your behalf, with your supervision,” Pichai wrote in the announcement.
The company plans to roll out Gemini 2.0 integration across its entire product line, starting with experimental access to the Gemini app. A broader rollout will follow in January, including integration into Google Search’s AI features, which currently reach more than one billion users.
But don’t forget Claude
The Gemini 2 launch comes as Anthropic quietly revealed its latest update. Claude 3.5 Haiku is a faster version of its family of AI models that claims superior performance in coding tasks, scoring 40.6% in the SWE-bench Verified benchmark.
Anthropic is still training its most powerful model, the Claude 3.5 Opus, which is scheduled to launch later in 2025 after a series of delays.
Image: Anthropic
Both Google and Anthropic’s premium services are priced at $20 per month, matching OpenAI’s base ChatGPT Plus tier.
Anthropic’s Claude 3.5 Haiku proved to be much faster, cheaper and more powerful than Claude 3 Sonnet (Anthropic’s mid-size model from the previous generation), scoring 88.1% in HumanEval encoding tasks and 85.6% in multilingual mathematical problems.
The model shows particular strength in data processing, with companies like Replit and Apollo reporting significant improvements in code refinement and content generation.
Claude 3.5 Haiku is pretty cheap at $0.80 per million entry tokens.
The company claims that users can achieve up to 90% cost savings through prompt caching and an additional 50% reduction using the Message Batch API, positioning the model as a cost-effective option for companies looking to scale. its AI operations and a very interesting option to consider versus OpenAI o1-mini which costs $3 for a million input tokens.
Edited by Sebastian Sinclair and Josh Quittner
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.
Crypto Keynote USA
For the Latest Crypto News, Follow ©KeynoteUSA on Twitter Or Google News.