Unfolding AI
Posts
Can you hear me now | Issue 18

Can you hear me now | Issue 18

The Unfolding:ai weekly newsletter about AI for Business Professionals

Following on from last weeks announcements with openAI, GPTs have been released. In the first few days, interesting, amazing, and dreadfully bad examples have already been created.

This is fueled by the ‘app store for GPTs’ which was discussed as part of the keynote. This could be amazing opportunity for both new business ideas and an overwhelming amount of really bad software!

This weeks content for everyone

In an introduction to AI Audio and voice?

This weeks content for free subscribers

Synthetic voices
It’s not just voice,

This weeks content for premium subscribers

Leveraging AI Voice and Music in business.
Giving this a try
Watermarking

As always back issues are available, just click here and log on (little hamburger menu top right!)

An Introduction to AI audio and voice

One of the most rapidly advancing areas of AI is text to audio or voice.

Understanding the Basics

At its core, AI in audio and voice revolves around the creation and manipulation of sound by artificial intelligence systems. This technology ranges from text-to-speech (TTS) conversions, where written words are turned into spoken ones, to more advanced applications like voice deep-fakes, which can mimic a person's voice with startling accuracy.

This falls into the following typical uses.

Voice to Text (transcription), including automatic subtitles (closed captions) which can very useful for increasing accessibility
language translation (text to text)
text to synthetic voice, this is when the ‘script’ is voiced by an entirely synthesised voice. This has progressed significantly from the robotic voices of last year to ones with subtle intonations and colour
text to ‘deep fake’, or ‘human synthesised’ voices, This is similar to text to synthetic voice, but a human’s voice has been recorded and analysed to make a voice ‘footprint’. This can then be used to create new speech from provided text input

Why It Matters

For business leaders and professionals, the implications are profound. Imagine delivering your business reports, not as dry text, but as engaging audio narratives, multi-lingual. Or consider customer service bots that don’t just respond with canned text but engage in natural, human-like conversation.

The Ethical Dimension

However, with great power comes great responsibility. The advent of voice deep-fakes, for instance, raises ethical questions around consent and misuse. It’s a balancing act between embracing innovation and safeguarding against potential misuse.

Can you hear me now?

As we edge closer to a world where AI can speak and listen, this opens up a much more diverse, and inclusive set of solutions and ideas. Everything from training, to advertising. One of the brand questions you should be asking, is literally ‘what is your companies tone of voice’?

Subscribe to keep reading

This content is free, but you must be subscribed to Unfolding AI to continue reading.

Already a subscriber?Sign in.Not now

Reply

or to participate.