To improve the accuracy of its automated transcription service, Descript switched to Google Cloud Speech for speech-to-text conversion.
Google Cloud Results
- Improves customer experiences with up to 95% word accuracy, matching the human threshold for voice recognition
- Enhances content security and privacy
- Helps enable new product features
Provides completed transcripts in <5 min
Transcribing interviews and meetings used to be a time-consuming job, and editing them down into podcasts required an experienced audio engineer. Not anymore. Machines are learning to perform these tasks just as well as humans, and they’re constantly improving. Naturally, a number of automatic transcription services are taking advantage of these breakthroughs, but they are only as good as the accuracy their speech-to-text API can provide.
“Accurate transcription is foundational to our business,” says Andrew Mason, CEO of Descript, a startup offering the world’s first audio word processor—an app that syncs audio to automated transcripts and provides easy editing capabilities. “We try every speech-to-text API that comes out, and accuracy is the most important consideration by a vast margin.”
“Google has a major advantage because it has massive audio content repositories like YouTube, giving it access to a huge amount of training audio necessary to build a great speech model. That’s a big part of why we found Google Cloud Speech to be more accurate than the others.”—Andrew Mason, CEO, Descript
Customers use Descript to quickly and easily transcribe audio, and edit down interviews without having to be experts with audio production software. Using a Mac-based app, anyone can edit audio by editing the text, and with seamless results. Users can get a transcript in less than five minutes, and then make it fit for print using the company’s correction and formatting tools.
To provide the highest possible level of word accuracy, Descript switched to Google Cloud Speech, the same toolset that supports the speech recognition features in Google Assistant. Descript uploads customers’ audio files to Google Cloud Storage, and from there the files are fed into Google Cloud Speech.
“Google has a major advantage because it has massive audio content repositories like YouTube, giving it access to a huge amount of training audio necessary to build a great speech model,” Andrew explains. “That’s a big part of why we found Google Cloud Speech to be more accurate than the others.”
Up to 95% word accuracy
Google Cloud Speech uses the most advanced deep learning neural network algorithms for speech recognition to provide unparalleled accuracy. Google Cloud Speech accuracy improves over time as Google improves its speech recognition technology. By 2017, it reached 95% for the English language, the threshold for human accuracy. Offering such a high degree of accuracy enables Descript to serve content producers, journalists, radio stations, and legal departments that previously relied on human transcriptionists and engineers, helping them reduce costs and improve their workflows.
“Our customers prefer Google Cloud Speech because it provides the one thing that matters most to them: accuracy,” says Andrew. “We can offer incredibly fast and accurate automated transcription thanks to Google technology. Our benchmarking indicates that we make about half as many errors as the second-best option.”
“We have always had good communication with the Google Cloud Speech team, so we’ve been able to tell them which features would be useful to us.”—Steve Rubin, Ph.D., Software Engineer, Descript
Fast, easy, and more secure
Descript was able to integrate Google Cloud Speech with its app quickly and easily, speeding time to value. “Google Cloud Speech made my life easy when we switched to it,” says Steve Rubin, Ph.D., Software Engineer at Descript. “Other speech-to-text services we evaluated wanted us to jump through more hoops.”
New features on the way
Descript is constantly improving its service with ever-increasing accuracy and new features on its roadmap. Web and multi-platform apps are on the way, and the company plans to use context-aware recognition in Google Cloud Speech to improve speech recognition in different industry contexts by providing separate glossaries of word hints and jargon with each API call.
“We’re excited about the new features in Google Cloud Speech coming out that will help us stay ahead of the curve,” says Andrew.
Adds Steve: “We have always had good communication with the Google Cloud Speech team, so we’ve been able to tell them which features would be useful to us. They’ve given us time estimates on when certain features will be released in beta, which has been great.”
Setting a new standard
Over time, Descript will be able to provide even more accurate automatic transcription and audio editing services, effectively resetting how people create and interact with voice-driven media.
“We will continue to evaluate every new speech-to-text API because we need to know what the best product on the market is,” says Andrew. “So far, Google has been the winner, and as far as we can tell, the gap between Google and the competition is widening.”
Descript is the world’s first audio word processor, allowing users to view and edit any audio file as text. Descript currently offers a Mac app as well as human-powered transcription services.