AI & Machine Learning

Build Voice AI into your apps with our top 3 Speech API codelabs

April 20, 2022

Anu Srivastava

Senior Developer Programs Engineer

With voice-controlled touchpoints becoming more and more the norm in human-computer interactions, our Speech-to-Text (STT) API is a great option for developers looking to build voice into their applications. The API processes over 1 billion spoken minutes of speech each month, enough to transcribe all Presidential inauguration speeches in U.S. history over 1 million times. Our customers use STT for everything from auto-generating captions, to generating insights to improve sales calls, to powering robots that help with childhood development.

With Speech-to-Text, you can accurately convert speech into text with several adaptations including:

Model Customization - customize for domain-specific terms
Speech Adaptation - provide context to influence results and formatting
Diarization - separate speakers on different channels or automatically detect when speakers change
Profanity Filtering - configure your request to detect profane words and edit them out of the transcript

Whether you’re using our pre-trained APIs for the first time or you’re a seasoned AI veteran, our codelabs are great resources for practicing and getting even more comfortable with our pre-trained models. In addition to helping you brush up on your skills, Codelabs also provide step-by-step instructions for how to set up your GCP project and get a $300 credit if you need it. They’ll also walk you through everything else you need to get your sample up and running, such as authentication, and installing the client libraries and tooling like the Cloud Shell Editor.

That’s why we’ve decided to round up some our top Speech codelabs, to help you get the most of our Speech-to-Text API, and our Text-to-Speech API as well:

1. Using the Speech-to-Text API with Python lab and C# lab

Speech-to-Text is easy to get started with; in the code snippet below you can see all you need is the client library, an audio file and a few lines of code to get a transcript created:

This lab will also show you how to transcribe in multiple languages. Speech-to-Text supports 137 locales for over 70 languages!

On-prem? No problem: Speech-to-Text is also available on-prem to meet your infrastructure, data residency and compliance requirements.

2. Using the Text-to-Speech API with Python lab and C# lab

On the flip side, if the reverse of STT is what you need for your integration, we have labs to help you get started with Text-to-Speech (TTS) in both Python and C#. With TTS, you can convert text into natural speech using groundbreaking synthesis AI from Google.

TTS lets you train custom voices, in addition to the 220+ voices from 40+ languages and variants that are available out of the box. Further customize your audio output with Speech Synthesis Markup Language (SSML) in your TTS request, which allows for more customization in your audio response by providing details on pauses and audio formatting for acronyms, dates, times, abbreviations or text that should be censored.

3. Using the Google Docs API Machine Learning (Speech-to-Text) lab

If you're looking for an interesting sample to see how to use our APIs to solve business problems, check out this lab on how to create a transcript of your business meetings using Google Docs.

You'll learn how to set up both APIs and send an audio file through the STT API, which then writes to a Google Doc using Java—so you'll never forget what happened in a meeting again!

Try these labs out and use your $300 worth of Cloud credit to get started on the Cloud Speech API today. To learn more about Google Cloud’s Speech API, click here.

AI & Machine Learning