Google has launched the most advanced AI model Gemini, which is the successor to its foundational language models, PaLM and PaLM 2, which have been integral to the development of Bard. What sets Gemini apart is its training across multiple domains – text, programming code, images, audio, and video. This simultaneous training allows Gemini to handle multimedia input more efficiently than separate, interconnected AI models for each type of input.
Google’s AI model Gemini comes in three distinct versions, each designed for different levels of computing power.
The first version, Gemini Nano, is designed to run on mobile phones. It comes in two variants, each built for different levels of available memory. This version will bring new features to Google’s Pixel 8 phones, such as the ability to summarize conversations in its Recorder app and suggest message replies in WhatsApp using Google’s Gboard.
The second version, Gemini Pro, is optimized for quick responses and operates in Google’s data centers. This version will power a new iteration of Bard, starting Wednesday.
The third and most advanced version, Gemini Ultra, is currently limited to a test group. It will be available in a new Bard Advanced chatbot, set to launch in early 2024. While Google has not disclosed pricing details, it is expected that this top-tier capability will come at a premium.
According to Google’s Chief Scientist, Jeff Dean, Gemini Ultra is the first model to achieve such a feat. “Gemini Ultra is the first model to achieve human-expert performance on MMLU across 57 subjects with a score above 90%,” Dean said.
AI model Gemini was designed to be multimodal from the onset, integrating text, vision, and audio encoders. This is a departure from the traditional approach of starting with a purely text model and then adding vision and audio encoders.
In addition to its language understanding capabilities, Gemini also boasts advanced programming skills. It can generate high-quality code using AlphaCode 2, an advanced code-generation system, solve complex programming problems, and collaborate with developers.
AI expert Rowan Cheung has compared Gemini Pro with GPT-3.5, stating that Gemini Pro outperformed GPT-3.5 in six of eight benchmarks. This makes Gemini Pro the most powerful free chatbot available today.
The launch of AI model Gemini represents the biggest upgrade to Bard since its inception. The company plans to make Gemini available in English in more than 170 countries and territories and expand its capabilities to support new languages and locations shortly.
Gemini is also being integrated into Google’s flagship phone, the Pixel 8 Pro. The Pixel 8 Pro is the first smartphone engineered to run Gemini Nano, which powers new features like Summarize in the Recorder app and Smart Reply in Gboard, starting with WhatsApp, with more messaging apps to follow next year.
Google plans to deploy Gemini across more of its products and services, including Search, Ads, and Chrome, in the coming months. The tech giant has also begun experimenting with using AI model Gemini to power its dominant web search engine, aiming to make searching a generative experience.
This new release of AI model Gemini underscores the rapid progress in the generative AI field, where chatbots generate their own responses to prompts written in plain language, moving away from complex programming instructions. Despite OpenAI’s launch of ChatGPT a year ago, Google has already made its third major AI model revision and plans to integrate this technology into widely-used products like Search, Chrome, Google Docs, and Gmail.
Eli Collins, a product vice president at Google’s DeepMind division, shared the company’s vision for the future of AI models. “For a long time, we wanted to build a new generation of AI models inspired by the way people understand and interact with the world. We envision an AI that feels more like a helpful collaborator and less like a smart piece of software. AI model Gemini brings us a step closer to that vision,” said Collins.
Gemini’s capabilities, as outlined in a Google research paper, are quite diverse. For instance, when presented with a series of shapes – a triangle, square, and pentagon – Gemini can accurately predict that the next shape in the series would be a hexagon.
When shown photos of the moon and a hand holding a golf ball and asked to find the connection, Gemini correctly identifies that Apollo astronauts hit two golf balls on the moon in 1971. It can also convert four bar charts depicting waste disposal techniques by country into a labeled table and identify an outlier – the US disposes of significantly more plastic than other regions.
Google has demonstrated Gemini’s ability to process a handwritten physics problem, identify a student’s mistake, and correct it. A more detailed demonstration video showed AI model Gemini recognizing a blue duck, hand puppets, and sleight-of-hand tricks among other videos. However, these demonstrations were not live, and it remains unclear how often AI model Gemini might struggle with such challenges.
Gemini Ultra, the most advanced version, is currently undergoing further testing before its expected launch next year. Google has initiated “red teaming” for Gemini Ultra, a process where individuals are enlisted to identify security vulnerabilities and other issues. This process becomes more complex with multimedia input data. For example, a text message and a photo might each seem harmless individually, but when combined, they could convey a significantly different meaning.