Conversational AI

Create Speech AI Applications in Multiple Languages and Customize Text-to-Speech with Riva

This month, NVIDIA released world-class speech-to-text models for Spanish, German, and Russian in Riva, powering enterprises to deploy speech AI applications globally. In addition, enterprises can now create expressive speech interfaces using Riva’s customizable text-to-speech pipeline.

NVIDIA Riva is a GPU-accelerated speech AI SDK for developing real-time applications like live captioning, adding voice to text-based chatbots, and generating real-time transcription in call centers. For easy implementation, Riva offers highly accurate pretrained models in the NGC catalog. 

With the TAO Toolkit, these models can be customized for any industry including telecommunications, finance, unified communications as a service, and healthcare. Developers can use Riva to deploy these models out-of-the-box. They are optimized to run in real time in less than 300 ms in the cloud, data center, and at the edge.

Riva release highlights include

  • World-class speech recognition skills in Spanish, German, and Russian.
  • Customizable text-to-speech pipeline for expressive interactions.
  • Low-code fine-tuning workflow with TAO Toolkit.

Automatic speech recognition in multiple languages

Every conversational AI application, from call centers to virtual assistants, relies heavily on automatic speech recognition. Enterprises can extend these apps globally with Riva automatic speech recognition in English, Spanish, German, and Russian.

This demo show's NVIDIA Riva world-class automatic speech recognition, now available in multiple languages.
Figure 1: NVIDIA Riva world-class automatic speech recognition is available in English, Spanish, German, and Russian.

The non-English automatic speech recognition models are trained on a variety of open-source datasets, such as Mozilla Common Voice, as well as private datasets. Riva automatic speech recognition models are developed to provide out-of-the-box accuracy and serve as a great starting point for adapting to industry, jargon, dialect, or even noisy surroundings. On popular evaluation datasets, these models deliver world-class accuracy on several industry applications.

Customizable text-to-speech pipelines

For customers to enjoy lifelike dialogues, speech applications must offer human-like expressions. Using Fastpitch, a new model created by the NVIDIA speech AI research team, Riva helps developers customize the text-to-speech pipeline and create expressive speech interfaces. For example, during inference time, developers can vary voice pitch and speed using SSML tags. 

ALT Text: This demo shows NVIDIA Riva customizable text-to-speech capabilities, allowing developers to vary voice pitch and speed using SSML tags.
Figure 2: NVIDIA Riva provides customizable text-to-speech pipelines for more expressive interactions.

The latest state-of-the-art models, such as Fastpitch in Riva, help text-to-speech pipelines run several times faster than other competing options in the market.  

Resources

Subscribe to the NVIDIA Developer Blog to stay up to date on all things Conversational AI/NLP.

Discuss (5)
+4

Tags

Notable Replies

  1. When will you have speech-to-text in Asian languages like
    Mandarin
    Japanese
    Korean
    Vietnamese

  2. Hello, we appreciate your interest :)
    Though we cannot comment on our roadmap, the Riva team plans to add Asian languages in the coming months.
    Stay tuned: we’ll share more as they become available.

  3. Thank you

  4. hello i would like to know if there a possibility to have french language for TTS ??

  5. Hi Raied, Thank you for the interest! We are planning to add French. Stay tined :)

Continue the discussion at forums.developer.nvidia.com

Participants

Avatar for jwitsoe Avatar for gneskovic Avatar for raied.debibi Avatar for ssaikia

Search