This Google video of Gemini was revealed to have been heavily edited, raising questions about real capabilities of the new AI model. (Google Video)

In the past year, generative AI has taken the world by storm. The pace of advancement has been little short of astounding, with OpenAI — and its partnership with Microsoft — taking much of the attention after ChatGPT’s launch late last year.

Google last week raised the stakes with the unveiling of its latest generative AI model, Gemini, a joint effort of Google Brain and DeepMind and a direct competitor to OpenAI’s GPT-4.

Google is promoting Gemini as natively multimodal. That means its components aren’t stitched together after the fact from separate types of content and data. Instead, it was built from the ground up across different modalities, combining billions of parameters representing text, images, video, audio, and programming code.

With the release of Gemini, some people are asking if perhaps we’ve finally created artificial general intelligence, or AGI, reaching the point when our technology becomes more intelligent than ourselves.

“It’s impossible to judge if they [Google] achieved AGI or not based on heavily edited PR videos without public API access,” said Yejin Choi, a Wissner-Slivka Professor and MacArthur Fellow at the Paul G. Allen School of Computer Science & Engineering at the University of Washington and senior director at the Allen Institute for AI (AI2).

AI2, a nonprofit research institute based in Seattle, is at the forefront of exploring and advancing AI since its founding in 2014. Its mission is to contribute to humanity through high-impact AI research and engineering.

“The technological progress in GPT and its ilk has been nothing short of breathtaking, but we are not yet anywhere near human-level intelligence,” said Oren Etzioni, a UW professor emeritus and the former CEO of AI2. “For instance, we are still struggling to field self-driving cars.”

Large language models (LLM), and other forms of generative AI, have made it possible for artificial intelligence to perform a growing number tasks with capabilities that appear to parallel and sometimes even exceed those of human cognition. From instantly preparing business templates to composing poetry to rapidly exploring new approaches to problem-solving, these recent advances are delivering us into a truly different relationship with our technology.

With Gemini, Google not only appears to have caught up with OpenAI’s GPT-4-based ChatGPT, but to have surpassed it. Test results published by Google show Gemini surpassing ChatGPT and even outperforming many human scores in its grasp of world knowledge and problem solving.

According to Google, with a score of 90%, Gemini Ultra is the first model to outperform human experts on the MMLU (Massive Multitask Language Understanding.) Testing both world knowledge and problem-solving abilities, MMLU covers knowledge across 57 subjects. If these test results hold up, Gemini stands to advance and accelerate human knowledge as no AI has before.

But despite this, we shouldn’t make the mistake of thinking this new AI is actually as smart as a person. In truth, Gemini and other large models appear to still have many challenges ahead of them.

In announcing the release of Gemini, Google produced a set of videos to market the new AI, including a demonstration video showing Gemini responding rapidly and effortlessly to questions from an off-screen human user. But while it was stated in the YouTube description that the video had been edited for latency, these edits were soon revealed to be far more extensive than their statement suggests.

Behind the scenes, prompts also appear to have been far more extensive and detailed than those portrayed in the video. The subsequent media response has been less than generous. While we’ve yet to see exactly how capable Gemini is in the wild, this is not the type of launch anyone wants for their product. In hindsight, Google probably would have been better off with a more realistic portrayal of Gemini’s current abilities.

So, as exciting as all these advances are, AGI probably remains a distant aspiration. This will no doubt be a question that’s routinely asked with every major AI advance for decades to come.

This doesn’t diminish concerns about AI safety in the meantime. Even if these AI falls short of AGI, how will we make sure these powerful new systems are suitable for business and public use?

AI training and safeguards

In training and testing Gemini, Google reportedly used AI2’s Real Toxicity Prompts to ensure its output is appropriate for these purposes. In early 2021, Choi’s team at AI2 developed and released this set of 100,000 prompts, as I reported in GeekWire nearly three years ago.

Because toxicity in language is complex and extensive, particularly when drawn from content on the web, it’s not possible to simply filter for vulgar or hateful words. Real Toxicity Prompts provides a way to train systems to identify and filter for more nuanced forms of toxic language and meaning.

This is extremely important because as these large models have become increasingly complex, more effort has been needed to create safeguards around their output. This will probably be even more crucial as developers come to rely on multimodal approaches.

Gemini “was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video,” Demis Hassabis, CEO and co-founder of DeepMind, wrote in a recent Google blog.

By drawing on the power of the large multimodal model approach, Gemini delivers capabilities that would have been impossible only a few short years ago. In recent years, multimodal approaches have come to be seen as a way to bring major new capabilities to generative AI because of the additional context these additional layers of information can provide.

Unlike the original large language models that constructed their output exclusively from vast collections of text, multimodal models derive more meaning from the many different forms of their underlying data. In some ways, this parallels how we ourselves build much deeper understanding of situations by drawing on our multiple senses – sight, sound, etc. This multimodal approach allows these systems to generate vastly more capable, nuanced and useful output.

Gemini is being released in three levels.

  • Gemini Pro is already being incorporated into Google Bard.
  • Gemini Ultra is currently being refined and safety tested with plans for rolling it out to developers and enterprise customers in early 2024.
  • Gemini Nano, a more compact version that can be used in mobile devices, is now part of Pixel 8 Pro and will be added to a growing number of products in coming months. 

Remaining hurdles for AI

Following OpenAI’s meteoric release of ChatGPT, it seemed to many people that Google was playing catch-up. But the company appears to have chosen a slower approach to building and releasing its new model as it addresses concerns about AI safety seriously, striving to build important safeguards into Gemini, such as reducing its potential for toxic language.

There are many other considerations when it comes to AI safety and ethics. As we saw with ChatGPT, there have been lots of unanticipated use cases, many of them illegal or otherwise harmful.

Google CEO Sundar Pichai at Google I/O in Mountain View, Calif., in May. (GeekWire Photo / Todd Bishop)

While it’s reassuring that Google reportedly took its time and applied a lot of effort to create safeguards around its new technology, it remains to be seen if this will be enough. Given the complexity of the system and the opaqueness of its underlying data, odds are that we now will face a whole new batch of challenges.

On top of all of this, it’s probably safe to say we’re in for another round of enthusiastic PR blitzes and breathless hype within the media as all of us wrap our heads around these latest advances and come to terms with what these new systems actually can and can’t do. Will they destroy jobs or simply transform how we work? Will these models help us better manage the vast amounts of information our world is creating? Or will they lead to an explosion of misinformation and the subsequent distrust this brings?

For all the issues new technology brings, it’s worth remembering that we build these AIs to be our tools. They’re still a very long way off from being self-aware or conscious or having anything like the human motivations that drive our own choices and actions.

The statistical means by which generative AI reasons and generates output is entirely different from the workings of human cognition and will likely remain so for a very long time to come. That’s probably fortunate. In many respects, it’s this difference that makes AI so useful to us as the tools we’ll need for the next stages of progress. What role Gemini will play in all of this, only time will tell.

Like what you're reading? Subscribe to GeekWire's free newsletters to catch every headline

Job Listings on GeekWork

Find more jobs on GeekWork. Employers, post a job here.