Introducing Gemini 2.0: What’s New for Google Gemini?

Gemini 2.0: The Future of Agentic AI?

8
Introducing Gemini 2.0: What’s New for Google Gemini?
Artificial IntelligenceGenerative AIInsights

Published: December 25, 2024

Rebekah Brace

Rebekah Carter

In December 2024, Google took the next step in its “Gemini AI” journey, introducing the launch of its first model in the Gemini 2.0 series: Gemini 2.0 Flash. The Gemini portfolio has come a long way since Google started rolling out initial models in December 2023.

Back then, Gemini 1.0 was one of the first AI models designed from the ground up to be genuinely “multimodal.” It captured the attention of millions of developers and paved the way to a new era for many of Google’s products. Not only did Google create its own Gemini bot to compete with the likes of OpenAI’s ChatGPT, but it integrated Gemini into Google Workspace, added new APIs to its Vertex AI system, and even used AI to enhance Google Search.

The Gemini 2.0 collection, starting with the Flash model, will introduce further improvements to all of Google’s AI applications and new capabilities for developing Agentic AI.

Here’s everything you need to know about Gemini 2.0 and what it can do.

What is Gemini 2.0?

Gemini 2.0 is the next generation of models in the “Gemini” ecosystem, created by Google to compete with Microsoft (with Copilot), Anthropic, Amazon, OpenAI, and many others. The first model in the series (Gemini 2.0 Flash) was introduced in December 2024.

Google released an article explaining what Gemini 2.0 can do and how it’s making the new technology accessible to users. In addition to introducing Gemini 2.0 Flash in experimental mode to developers on platforms like Vertex AI, Google said it would begin implementing it into its products immediately, starting with Google Search and the Gemini app.

In the same article, Google announced the launch of a new feature for Gemini models: “Deep Research.” This feature uses long context capabilities and advanced reasoning to perform more effectively as a “research assistant” for users. This feature is already available for Gemini Advanced (the premium Gemini plan) users.

According to Google, the advances in Gemini 2.0 are underpinned by a decade-long series of investments in the company’s AI ecosystem. The models are built on custom hardware (such as Trillium, the sixth-generation TPUs created by Google).

The AI giant further explained that if “Gemini 1.0” was all about helping users organize and understand information, Gemini 2.0 is about making that information more helpful.

Gemini 2.0 Flash: The First 2.0 Model

When Google introduced its Gemini collection, it simultaneously announced a series of models. With Gemini 2.0, the company is taking a slightly more phased approach, starting with Gemini 2.0 Flash. This model builds on the success of the most popular previous model for developers, Gemini 1.5 Flash. Google shared some performance benchmark results for this new model on its website.

For instance, it revealed that 2.0 Flash works at twice the speed of Gemini 1.5, it also achieved several impressive scores in tests such as:

Test Gemini 1.5 Flash Gemini 2.0 Flash
MMLU-Pro 67.3% 76.4%
Natural2Code 79.8% 92.9%
FACTS Grounding 82.9% 83.6%
MATH 77.9% 89.7%
GPQA (diamond) Reasoning 51.0% 62.1%
Image MMU 62.3% 70.7%
Video EgoSchema 66.8% 71.5%

The New Features of Gemini 2.0 Flash for Developers

Gemini 2.0 Flash is currently available to developers as an experimental model through Gemini APIs in Vertex AI and Google AI Studio. Multimodal input and text output are available to all developers, although only early-access partners can access text-to-speech and image generation.

Alongside various improvements in benchmark performance, Gemini 2.0 Flash introduces new features and capabilities, such as:

  • Multimodal Live API: Developers can now create real-time multimodal AI applications with video and audio streaming inputs from screens or cameras. The API supports things like voice detection and interruption recognition. To jumpstart building, Google has also introduced some “starter app” experiences in the AI Studio, with open-source code for video analysis, spatial analysis, and Google Maps.
  • Native Tool Use: Gemini 2.0 has been trained to leverage tools for agentic AI experiences. It can call tools like Google Search and code execution, as well as using custom third-party functions. The solution can even run multiple searches in parallel to improve information retrieval. Plus, developers can leverage multimodal understanding, coding capabilities, and complex instruction following for agentic AI.
  • Enhanced performance: Gemini 2.0 is far more powerful than 1.5 Pro, and it still delivers the speed and efficiency developers want from the Flash environment. It supports improved spatial understanding and enables more accurate generation of elements like bounding boxes.
  • New Output modalities: Gemini 2.0 Flash features new output modalities, allowing bots to respond to prompts with text, images, and audio through a singular API call. These output modalities are currently available for early testers, with broader rollouts coming in 2025. SynthID invisible watermarks will also be enabled in all audio and image outputs in an attempt to reduce the risk of deepfakes.

All the New Experiences Enabled by Gemini 2.0

Google is still in the early stages of introducing everything “enhanced” with Gemini 2.0. On a broad scale, the company is starting with updates to its development ecosystem (as mentioned above) and will introduce new Gemini 2.0 capabilities to the Gemini app.

Gemini users can access a chat-optimized version of 2.0 Flash Experimental by choosing it in the desktop and mobile web drop-down menu. On top of that, Google introduced:

New AI Code Assistance with Gemini 2.0

For developers and programmers, Google is introducing new Gemini 2.0 coding agents that can execute coding tasks on behalf of users. Google has used 2.0 Flash with code execution tools to achieve a 51.8% on SWE-bench verified tests, which examine agent performance on real-world software engineering requirements.

Additionally, Google introduced the new “Jules” assistant, specially designed for coding needs. They say developers will be able to delegate JavaScript and Python coding tasks to Jules, and it will use Gemini 2.0 to complete them autonomously.

The solution can integrate with GitHub workflows and handle various time-consuming tasks and bug fixes while developers concentrate on building new solutions. This autonomous agent uses multi-step, comprehensive plans to modify files efficiently, prepare pull requests, and more.

Though this feature is in its early stages, Google says its internal teams have become more productive thanks to Jules and are more effective at tracking progress throughout tasks. Google notes that users maintain full control over the experience. They can review the plans created by Jules, provide feedback, and make adjustments to ensure they get the right results. You can sign up for an early experience with Jules here.

Colab Data Science Agent

During the I/O 2024 event, Google launched a new experimental “Data Science” agent, that enables users to upload datasets and get actionable insights in minutes, grounded in a Colab notebook. The firm received a lot of positive feedback from the developer community almost instantly.

On a broader scale, Colab will now begin to integrate the same agentic capabilities with Gemini 2.0. Users need only describe the goals of their analysis to the bot in plain language. A notebook will be developed automatically, helping users conduct research rapidly and organize data.

Gemini 2.0 In Project Astra

Project Astra, Google’s vision for the ultimate AI assistant, is still in early development. Although it has generated a lot of attention in the extended reality landscape. A handful of users have begun testing the system on Android phones, and sharing feedback with Google.

As a result, the company has introduced new improvements to Astra with Gemini 2.0, that will start to roll out to beta testers, such as:

  • Improved dialogue: The Astra assistant can now communicate with users in multiple and mixed languages, and it understands uncommon words and accents better.
  • New tool capabilities: Through Gemini 2.0, Project Astra can respond to queries using Google Lens, Google Search, and maps.
  • Enhanced memory: Project Astra can now remember more details of conversations with a ten-minute in-session memory. It can also recall data from previous conversations based on your settings.
  • Greater latency: With new native audio understanding and streaming capabilities, Astra can understand language at a similar latency to human conversation.

Currently, these capabilities will be rolling out to the Astra experience on Android mobiles and within the Gemini app. However, Google is also working on bringing the features to new devices, like smart glasses, in the years ahead.

Project Mariner: A New Gemini 2.0 Prototype

The new research prototype, Project Mariner, built with Gemini 2.0 is designed to explore the future possibilities of generative AI and human interaction. Initially rolling out to browser experiences, this research prototype can understand and reason information across a browser screen, identifying pixels and other web elements like code, text, images, and forms.

Google tested the prototype with the WebVoyager benchmark, achieving an impressive result of 83.5%. This project is still in its very early stages, but Google says it’s evidence of the growing potential of AI tools for browser experiences.

To continue building on this project safely, Google is conducting active research into potential risks and challenges it must overcome. It’s also committed to keeping humans in the loop throughout the development and usage.

For instance, the Mariner solution can only scroll, click, or type within an active tab on a browser. It will always ask for final confirmation from a user before purchasing a product. Trusted testers can currently access Mariner through a Chrome extension.

Google’s Continued Focus on Responsible AI

With new innovations in generative and Agentic AI come increased concerns about ethics, security, governance, and transparency. Like many AI leaders, Google regularly focuses on responsible AI development. In the Gemini 2.0 announcement post, the company said it’s taking a gradual and “exploratory” approach to development to minimize risks.

The firm is conducting extensive research on multiple prototypes and is gradually implementing new safety training strategies, working with external experts and trusted testers. For instance:

  • As part of a comprehensive safety strategy, Google works with a Responsibility and Safety Committee established internally to review models and identify potential risks.
  • The reasoning capabilities built into Gemini 2.0 have allowed Google to advance its AI-assisted red teaming approach. Now, the company can not just identify risks but automatically generate evaluations and training data to help overcome them.
  • Since Gemini 2.0’s multimodal nature increases the complexity of outputs, Google plans to constantly evaluate and train the model across different output modalities.
  • With the Project Astra initiative, Google is experimenting with preventing users from accidentally sharing sensitive information with their agents. The firm is already leveraging privacy controls that allow users to delete sessions and adjust the agent’s memory.
  • With Project Mariner, Google is working on ensuring that the model learns to prioritize user instructions over third-party prompt injection attempts. This will allow the model to identify potentially dangerous intrusions from third-party sources used for fraud and phishing attempts while users browse the web.

The Future of AI Development with Google

The release of the new Gemini 2.0 ecosystem, starting with Gemini 2.0 Flash, is an important time in the artificial intelligence landscape. Countless other AI companies also made big announcements towards the end of the year. For instance, Microsoft introduced a range of autonomous AI bots created with Copilot and new agentic AI capabilities in Copilot Studio.

OpenAI has introduced the world to its Sora image generation model and the o1 models for advanced reasoning. The OpenAI team even introduced a new “voice mode with vision” capability for future AI bots.

Amazon (AWS) also rolled out its Nova collection of foundational models towards the end of the year, promising new ways for people to create and build unique AI experiences.

Obviously, with all of these new technologies, challenges will remain. Google and its competitors will still need to take extensive steps to ensure their models are safe, secure, and aligned with ethical and governance standards. They’ll also need to ensure these systems are truly accessible.

Currently, Google’s Gemini 2.0 solutions are only available in limited form. Plus many of the other competing models have limitations, too. Still, in the year ahead, we can expect to see many exciting upgrades, particularly focusing on Agentic AI and multimodal technologies.

 

 

AI Assistants
Featured

Share This Post