How you can change or clone your voice and convert text to audio using ElevenLabs

In this post, I will introduce you to an easy-to-use product that can create lifelike audio using your own samples or by generating a brand-new voice.

How to Change your voice using a Deep Learning AI tool
How to Change your voice using a Deep Learning AI tool

Eleven Labs AI Voice Cloning Demo

⚠️
Please note that the following post is intended solely for educational and informative purposes. I do not endorse or encourage any inappropriate behaviour that may result from the information provided in this post.

The video above was created by Eleven Labs, a start-up founded in 2022 that specializes in voice synthesis technology. Another extremely impressive service out there is play.ht, and just like Eleven Labs, the results sound so real it's almost indistinguishable from a real human. These tools are used to generate voice recordings for a lot of content, including some of your favorite reels, podcasts, and videos.

We're going to explore how you can use the Eleven Labs website to clone your voice or create entirely new voices. I only chose to review Eleven Labs, since it is the most straightforward service that I've tested and more importantly I was able to generate shockingly good voices with it.

Synthesizer Demo

Once we navigate to their website, we can immediately see the synthesizer right there on the home page. We can test it out for free, and without creating an account.

As shown below, go ahead and type in your prompt, then choose a language and a voice from the prebuilt dropdown. To play the generated audio, click on the play button.

Eleven Labs Demo, No Sign-Up Needed
⚠️
You can only generate a limited amount of short samples without creating an account.

Cool, right? After you listen to the generated audio, you can save it by downloading it to your device.

Cost of Service

Let's head over to the pricing page where we can see the plans they offer. The plans start at 5$/mo and can go up to 330$/mo, they do however, offer a free options. The free plan is designed for hobbyists with limited features, including a 10,000-character limit per month. If you need more than that, or if you need to use your audio commercially, you'll need to upgrade. The paid plans are designed for content creators, professionals, and businesses.

If you're like me and you want to try the service before you commit to a monthly payment, go ahead and sign up to get started for free.

Available Features

After you confirm your email, you will sign in and see the below tabs:

  1. Speech Synthesis: This is where the model essentially converts our text to speech. If we expand the settings tab, we can select a predefined voice, or even create our own in the VoiceLab.
  2. VoiceLab: This section contains two main modules. Voice Design, and Instant Voice Cloning. The Voice Design allows us to generate a sample to use for our audio by adjusting predefined parameters, such as age, and accent of the speaker. Alternatively, the Instant Voice Cloning tool enables us to clone any voice, it only requires a clean 1-minute recorded audio of that voice. Unfortunately though, the feature is not available in the free plan, so you'll need to upgrade to give it a try.
  3. History: If you created one or more audios using the Speech Synthesis tool, you can go to the History tab to view, listen, and download your recorded audio. Keep in mind that generating audio recordings is a costly process, even if you have a paid plan, you'll still want to keep a close eye on your monthly character limit and usage. It's always a good idea to download any generated audio files in case you'd like to use them instead of re-generating and using your remaining quota.
  4. Resources: I am not going to go into the details of this section but essentially it provides documentation for developers and general information about the service and Eleven Labs. It's good to take a look at each section if you'd like to learn more.

That's it! Keep in mind that these tools are a work in progress and the models are improving at an ever-increasing rate. So by the time you read this article, the models might have more capabilities and will sound better.

In another post, I will be covering the technical aspect of Neural Synthesis, and how similar models are capable of generating and cloning human speech with such high accuracy. I will go over topics such as Text To Speech, Hidden Markov Models (HMMs), Deep Learning, and more. So stay tuned for more updates.

Thanks for reading!