(cache)Create Speech AI Applications in Multiple Languages and Customize Text-to-Speech with Riva

Create Speech AI Applications in Multiple Languages and Customize Text-to-Speech with Riva

This month, NVIDIA released world-class speech-to-text models for Spanish, German, and Russian in Riva, powering enterprises to deploy speech AI applications globally. In addition, enterprises can now create expressive speech interfaces using Riva’s customizable text-to-speech pipeline.

NVIDIA Riva is a GPU-accelerated speech AI SDK for developing real-time applications like live captioning, adding voice to text-based chatbots, and generating real-time transcription in call centers. For easy implementation, Riva offers highly accurate pretrained models in the NGC catalog.

With the TAO Toolkit, these models can be customized for any industry including telecommunications, finance, unified communications as a service, and healthcare. Developers can use Riva to deploy these models out-of-the-box. They are optimized to run in real time in less than 300 ms in the cloud, data center, and at the edge.

Riva release highlights include

World-class speech recognition skills in Spanish, German, and Russian.
Customizable text-to-speech pipeline for expressive interactions.
Low-code fine-tuning workflow with TAO Toolkit.

Automatic speech recognition in multiple languages

Every conversational AI application, from call centers to virtual assistants, relies heavily on automatic speech recognition. Enterprises can extend these apps globally with Riva automatic speech recognition in English, Spanish, German, and Russian.

This demo show's NVIDIA Riva world-class automatic speech recognition, now available in multiple languages. — *Figure 1: NVIDIA Riva world-class automatic speech recognition is available in English, Spanish, German, and Russian.*

The non-English automatic speech recognition models are trained on a variety of open-source datasets, such as Mozilla Common Voice, as well as private datasets. Riva automatic speech recognition models are developed to provide out-of-the-box accuracy and serve as a great starting point for adapting to industry, jargon, dialect, or even noisy surroundings. On popular evaluation datasets, these models deliver world-class accuracy on several industry applications.

Customizable text-to-speech pipelines

For customers to enjoy lifelike dialogues, speech applications must offer human-like expressions. Using Fastpitch, a new model created by the NVIDIA speech AI research team, Riva helps developers customize the text-to-speech pipeline and create expressive speech interfaces. For example, during inference time, developers can vary voice pitch and speed using SSML tags.

ALT Text: This demo shows NVIDIA Riva customizable text-to-speech capabilities, allowing developers to vary voice pitch and speed using SSML tags. — *Figure 2: NVIDIA Riva provides customizable text-to-speech pipelines for more expressive interactions.*

The latest state-of-the-art models, such as Fastpitch in Riva, help text-to-speech pipelines run several times faster than other competing options in the market.

Resources

Getting Started with NVIDIA Riva
NVIDIA Riva Starter Kits – includes tutorials, notebooks, and documentation:
- Automatic speech recognition
- Text-to-speech

Subscribe to the NVIDIA Developer Blog to stay up to date on all things Conversational AI/NLP.

Notable Replies

ssaikia says:

March 1, 2022

When will you have speech-to-text in Asian languages like
Mandarin
Japanese
Korean
Vietnamese
gneskovic says:

March 1, 2022

Hello, we appreciate your interest :)
Though we cannot comment on our roadmap, the Riva team plans to add Asian languages in the coming months.
Stay tuned: we’ll share more as they become available.
ssaikia says:

March 9, 2022

Thank you
raied.debibi says:

April 4, 2022

hello i would like to know if there a possibility to have french language for TTS ??
gneskovic says:

April 8, 2022

Hi Raied, Thank you for the interest! We are planning to add French. Stay tined :)

Create Speech AI Applications in Multiple Languages and Customize Text-to-Speech with Riva

Riva release highlights include

Automatic speech recognition in multiple languages

Customizable text-to-speech pipelines

Resources

Related resources

Tags

About the Authors

Notable Replies

Continue the discussion at forums.developer.nvidia.com

Participants

Create Speech AI Applications in Multiple Languages and Customize Text-to-Speech with Riva

Riva release highlights include

Automatic speech recognition in multiple languages

Customizable text-to-speech pipelines

Resources

Related resources

Tags

About the Authors

Comments

Notable Replies

Continue the discussion at forums.developer.nvidia.com

Participants

Related posts

Autoscaling NVIDIA Riva Deployment with Kubernetes for Speech AI in Production

New Languages, Enhanced Cybersecurity, and Medical AI Frameworks Unveiled at GTC

Build Speech AI in Multiple Languages and Train Large Language Models with the Latest from Riva and NeMo Framework

NVIDIA Announces Riva Speech AI and Large Language Modeling Software For Enterprise

NVIDIA Accelerates Conversational AI from Research to Production with Latest Updates in NVIDIA NeMo and NVIDIA Riva

Related posts

New Standard for Speech Recognition and Translation from the NVIDIA NeMo Canary Model

Turbocharge ASR Accuracy and Speed with NVIDIA NeMo Parakeet-TDT

Pushing the Boundaries of Speech Recognition with NVIDIA NeMo Parakeet ASR Models

NVIDIA Speech and Translation AI Models Set Records for Speed and Accuracy

New Support for Dutch and Persian Released by NVIDIA NeMo ASR