- Unfolding AI
- Posts
- Google Gemini | Issue 20
Google Gemini | Issue 20
The Unfolding:ai weekly newsletter about AI for Business Professionals
A large number of updates and releases, as everyone is looking to hit the ‘shipped by end of year promises’ it seems.
This weeks content for everyone
Stop Press - Google Gemini
Image Generation - Updates
This weeks content for free subscribers
Transcription
Have you been naughty or nice
This weeks content for premium subscribers
Code for ‘naughty or nice’
AI In Education
As always back issues are available, just click here and log on (little hamburger menu top right!)
Google Gemini Release
Google AI has unveiled Gemini, its most advanced foundation model to date. Foundation models are a type of large language model that can be trained on a massive dataset of text and code to perform a wide range of tasks. Gemini is designed to be more flexible and efficient than previous models. Its release into bard, and google workspaces (for enterprise) brings significant and the first real competition to chatGPT-4.
Just note that the version in bard at the moment is Gemini-pro which beats gpt3.5 but not gpt-4 the upgrade to ultra should happen early next year
![]() | Gemini ultra beats or equals chatGPT-4 on most benchmarks, this is the first released model to do so in over a year. This suggests that Gemini is a more capable model that can be used for a wider range of applications. |
One of the key features of Gemini is its multimodality. This means that it can process and understand information from multiple sources, such as text, code, and images. This makes Gemini well-suited for tasks that require a deep understanding of context, such as summarizing a news article or writing a creative piece.
Gemini has also been shown to be more efficient than previous models. This means that it can run on less powerful hardware, and it can be deployed more easily to mobile devices. There will be three variations, ultra the largest for applications, Pro for typical use, and nano for on device (phone) local execution.
Gemini is still under development, but it has already been used to create a number of new products and services. For example, Gemini is being used to improve the quality of Google Search results, and it is also being used to develop new chatbots and virtual assistants.
New developer features, and application building tools have also being released. It will be interesting to see how quickly a ‘GPTS’ or ‘Agent’ Solution is produced. The build with google gemini launch date is 13th Dec.
Image Generation - updates
Image generation, text to images, was probably one of the most innovative technologies to release this year. Considering where the output quality was in January versus the current latest releases, it’s a remarkable advance.
There are still these limitations, but they are all being improved on in every release. I expect some of these will cease to be problems next year.
These are the current things that are very hard to do
Consistent images, e.g same car, different places, same person, or a specific person in different outfits
Removal of Bias / stereo-types in images
Overall resolution (size)
Multiple people, animals, crowds
Consistent lighting
Text and numbers.
There are plenty of uses where these limitations are not a problem, from colouring books to ideation for product design.
Latest Updates
Leonardo.ai, is our go to image generator for the more complex images and controls. It now has multiple models, style effects, resolutions, prompt suggestions, image as input and multiple ‘control nets’ to govern the output. As a user experience it is a steep learning curve, but the effort is worth it if you want to truly ‘not look like it was made in dalle-3 using chatgpt’
Microsoft has released a new tool ‘designer’. It is very approachable, and for social media, solopreneurs it’s definitely worth exploring. The output quality is similar to midjourney / dalle-3. The interface is very low friction.
For video updates to runway ml, providing motion brush, advanced camera movements, and higher resolution. Is probably the market leading text to video solution, though at a duration of 12 seconds per clip, and with fairly basic editing it is still more of a niche solution.
Pika Labs (via a waitlist) is releasing its next generation of technology. Providing more robust camera movements and image generation. The show reels look very promising, though I haven’t had hands on time to qualify that, yet. Always beware of AI show reals, they are #LivingTheBestInstagramLife, and reality may not be the same!
Reply