Cloud Text-to-Speech Beta

Text to speech conversion powered by machine learning

High-Fidelity Speech Synthesis

Google Cloud Text-to-Speech enables developers to synthesize natural-sounding speech with 30 voices, available in multiple languages and variants. It applies DeepMind’s groundbreaking research in WaveNet and Google’s powerful neural networks to deliver the highest fidelity possible. As an easy-to-use API, you can create lifelike interactions with your users, across many applications and devices.

Powered by Google’s Machine Learning

Apply the most advanced deep learning neural network algorithms to synthesize text into a variety of voices and languages. Our neural networks were built based on Google’s speech synthesis expertise.

Includes Exclusive Access to WaveNet Voices from DeepMind

DeepMind has made groundbreaking research in machine learning models to generate speech that mimics human voices and sounds more natural, reducing the gap with human performance by over 50%. Cloud Text-to-Speech offers exclusive access to multiple WaveNet voices and will continue to add more over time.

Select from 30+ Voices

Google Cloud Text-to-Speech offers a selection of 30+ voices in multiple languages and variants.

Easily Integrates with Existing Applications and Devices

Cloud Text-to-Speech supports any application or device that can send a REST or gRPC request including phones, PCs, tablets and IoT devices (e.g., cars, TVs, speakers)

Supports Many Common Use-Cases

As an easy-to-use API, Google Cloud Text-to-Speech is a flexible solution to creating natural experiences for a variety of use-cases. Common use-cases include call center automation, interactive responses from IoT devices, or transforming text into audio that can be consumed as audio.

Cloud Text-to-Speech Features

Supports 32 voices in 12 languages and variants, with more to come soon
Wavenet Voices
Exclusive access to DeepMind WaveNet voices that provide the most natural-sounding speech
Text and SSML support
Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other pronunciation instructions
Speaking Rate Tuning
Customize your speaking rate to be 4x faster or slower than the normal rate
Pitch Tuning
Customize the pitch of your selected voice, up to 20 semitones more or less than the default output
Volume Gain Control
Increase the volume of the output by up to 16db or decrease the volume up to -96db
Audio Format Flexibility
Choose from a number of audio formats including mp3, Linear16 and Ogg Opus

CLOUD Text-to-Speech PRICING

Cloud Text-to-Speech is priced per 1 million characters of text processed after a 1 million character free tier. For details, please see our pricing guide.

Feature Monthly free tier Paid usage
Standard (non-WaveNet) voices 0 to 4 million characters $4.00 USD / 1 million characters
WaveNet voices 0 to 1 million characters $16.00 USD / 1 million characters
If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.
Beta: This is a Beta release of Cloud Text-to-Speech. This feature is not covered by any SLA or deprecation policy and may be subject to backward-incompatible changes.