Unfolding AI
Posts
Google Gemini | Issue 20

Google Gemini | Issue 20

The Unfolding:ai weekly newsletter about AI for Business Professionals

Paul Bratcher
06 Dec

A large number of updates and releases, as everyone is looking to hit the ‘shipped by end of year promises’ it seems.

This weeks content for everyone

Stop Press - Google Gemini
Image Generation - Updates

This weeks content for free subscribers

Transcription
Have you been naughty or nice

This weeks content for premium subscribers

Code for ‘naughty or nice’
AI In Education

As always back issues are available, just click here and log on (little hamburger menu top right!)

Google Gemini Release

Google AI has unveiled Gemini, its most advanced foundation model to date. Foundation models are a type of large language model that can be trained on a massive dataset of text and code to perform a wide range of tasks. Gemini is designed to be more flexible and efficient than previous models. Its release into bard, and google workspaces (for enterprise) brings significant and the first real competition to chatGPT-4.

Just note that the version in bard at the moment is Gemini-pro which beats gpt3.5 but not gpt-4 the upgrade to ultra should happen early next year

Gemini ultra beats or equals chatGPT-4 on most benchmarks, this is the first released model to do so in over a year.

This suggests that Gemini is a more capable model that can be used for a wider range of applications.

One of the key features of Gemini is its multimodality. This means that it can process and understand information from multiple sources, such as text, code, and images. This makes Gemini well-suited for tasks that require a deep understanding of context, such as summarizing a news article or writing a creative piece.

Gemini has also been shown to be more efficient than previous models. This means that it can run on less powerful hardware, and it can be deployed more easily to mobile devices. There will be three variations, ultra the largest for applications, Pro for typical use, and nano for on device (phone) local execution.

Gemini is still under development, but it has already been used to create a number of new products and services. For example, Gemini is being used to improve the quality of Google Search results, and it is also being used to develop new chatbots and virtual assistants.

New developer features, and application building tools have also being released. It will be interesting to see how quickly a ‘GPTS’ or ‘Agent’ Solution is produced. The build with google gemini launch date is 13th Dec.

‎Bard - Chat Based AI Tool from Google

Discover more about Bard, a collaborative AI tool developed by Google to help bring your ideas to life.

bard.google.com

Introducing Gemini: our largest and most capable AI model

Gemini is our most capable and general model, built to be multimodal and optimized for three different sizes: Ultra, Pro and Nano.

blog.google/technology/ai/google-gemini-ai

How it’s Made: Interacting with Gemini through multimodal prompting

Explore the capabilities of our AI model Gemini with this hands-on guide to multimodal prompting.

developers.googleblog.com/2023/12/how-its-made-gemini-multimodal-prompting.html

Image Generation - updates

Image generation, text to images, was probably one of the most innovative technologies to release this year. Considering where the output quality was in January versus the current latest releases, it’s a remarkable advance.

There are still these limitations, but they are all being improved on in every release. I expect some of these will cease to be problems next year.

These are the current things that are very hard to do

Consistent images, e.g same car, different places, same person, or a specific person in different outfits
Removal of Bias / stereo-types in images
Overall resolution (size)
Multiple people, animals, crowds
Consistent lighting
Text and numbers.

There are plenty of uses where these limitations are not a problem, from colouring books to ideation for product design.

Latest Updates

Leonardo.ai, is our go to image generator for the more complex images and controls. It now has multiple models, style effects, resolutions, prompt suggestions, image as input and multiple ‘control nets’ to govern the output. As a user experience it is a steep learning curve, but the effort is worth it if you want to truly ‘not look like it was made in dalle-3 using chatgpt’

Leonardo.Ai

Generate production quality assets for your creative projects with AI-driven speed and style-consistency.

leonardo.ai

Microsoft has released a new tool ‘designer’. It is very approachable, and for social media, solopreneurs it’s definitely worth exploring. The output quality is similar to midjourney / dalle-3. The interface is very low friction.

Microsoft Designer - Stunning designs in a flash

A graphic design app that helps you create professional quality social media posts, invitations, digital postcards, graphics, and more. Start with your idea and create something unique for you.

designer.microsoft.com

For video updates to runway ml, providing motion brush, advanced camera movements, and higher resolution. Is probably the market leading text to video solution, though at a duration of 12 seconds per clip, and with fairly basic editing it is still more of a niche solution.

Pika Labs (via a waitlist) is releasing its next generation of technology. Providing more robust camera movements and image generation. The show reels look very promising, though I haven’t had hands on time to qualify that, yet. Always beware of AI show reals, they are #LivingTheBestInstagramLife, and reality may not be the same!

Subscribe to keep reading

This content is free, but you must be subscribed to Unfolding AI to continue reading.

Already a subscriber?Sign in.Not now

Reply

or to participate.