← Back to Library

Google unveils Gemini

A photo illustration of Gemini represented as connected tiles showing applications including a camera and photo roll (Google)
(Google)

I.

Google this morning announced the rollout of Gemini, its largest and most capable large language model to date. Starting today, the company’s Bard chatbot will be powered by a version of Gemini, and will be available in English in more than 170 countries and territories. Developers and enterprise customers will get access to Gemini via API next week, with a more advanced version set to become available next year.

How good is Gemini? Google says the performance of its most capable model “exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in LLM research and development.” Gemini also scored 90.0% on a test known as “Massive Multitask Language Understanding,” or MMLU, which assesses capabilities across 57 subjects including math, physics, history and medicine. It is the first LLM to perform better than human experts on the test, Google said.

Gemini also appears to be a very good software engineer. Last year, using an older language model, DeepMind introduced an AI system named AlphaCode that outperformed 54 percent of human coders in coding competitions. Using Gemini, Google built a next-generation version named AlphaCode 2. The sequel outperformed an estimated 85 percent of humans, the company said.

Competitive coding is meaningfully different from day-to-day software engineering in some important ways: it can be both more and less difficult than what the typical engineer is asked to do. But still, the rate of progress here is striking.

Gemini is natively multimodal, meaning that it can analyze the contents of a picture and answer questions about it, or create an image out of a text prompt. During a briefing on Tuesday, a Google executive uploaded a photo of some math homework in which the student had shown their calculations leading up to the final answer. Gemini was able to identify at which step in the student’s process they had gone awry, and explained their mistake and how to answer the question correctly.

“Multimodal” can read like awkward jargon, but the term comes up constantly in conversation with Google executives. The ability of AI systems to take different kinds of data (text, images, video, audio), analyze them using a single tool, and translate them in and out of various formats is the kind of foundational innovation that makes lots of other progress possible. (All of which is a long way of saying: sorry for the number

...
Read full article on Platformer →