Create real-time translation overlays
Contributed by Google employees.
This tutorial demonstrates the real-time speech-to-text transcribing and translation features of the Cloud Media Translation API.
In this tutorial, you learn how to overlay translations as subtitles over a live video feed, using a video mixer and a luma keyer. The translated dialogue can be projected onto surfaces as live subtitles using a projector, in effect creating augmented reality (AR) translations.
This tutorial uses the pygame library to control the HDMI output of a Raspberry Pi computer. The HDMI output is then directed either to a projector or to a video mixer for luma key overlays. The overlay can then be used as live subtitles with, for example, video conference systems.
Objectives
- Create a Python client for the Media Translation API.
- Stream microphone or line-level audio to the service and receive real-time translations.
- Use pygame to output the translations through the Raspberry Pi HDMI port.
- Use a video mixer with a luma keyer to overlay translated subtitles on a video feed.
- Use a projector to display AR subtitles.
Watch the companion video
To see this tutorial in action, you can watch the Google Cloud Level Up episode. You can watch the video first, and then follow the steps below yourself.
Costs
This tutorial uses billable components of Google Cloud, including the Media Translation API.
This tutorial should not generate any usage that would not be covered by the free tier, but you can use the pricing calculator to create a cost estimate based on your projected production usage. For details, see Media Translation pricing and free tier.
Before you begin
This tutorial assumes that you already have a Google Cloud account.
Follow the instructions in the Media Translation documentation to set up your Google Cloud projects to enable the Media Translation API.
Required hardware
- Raspberry Pi (Model 3 B / B+ or 4 B recommended, but other models should also work.)
- Display device:
- Small projector for displaying AR translations
- Video mixer hardware or software, with luma keyer capabilities
- Monitor to test this tutorial, instead of a projector
- Microphone to capture dialogue audio; either of the following:
- USB microphone
- USB sound card and an analog microphone or line level audio source
- USB keyboard for the Raspberry Pi
- 8 GB or larger microSD card for the Raspberry Pi operating system
Raspberry Pi installation and configuration
Install Raspbian Lite OS
Follow the instructions to download the latest Raspbian Lite OS image, and write it to your MicroSD card.
Insert the MicroSD card into the Raspberry Pi card slot.
Connect the USB keyboard to the Raspberry Pi.
Connect the Raspberry Pi to a monitor or a projector using an HDMI cable.
(Optional) If you use Ethernet connectivity, connect your ethernet cable to the Raspberry Pi.
Connect the Raspberry Pi power supply to the wall socket and to the Raspberry Pi, which starts Raspbian OS.
Configure Raspbian Lite OS
Log in to the Raspberry Pi with the initial username (
pi
) and password (raspberry
).Start the Raspberry Pi configuration as the superuser:
sudo raspi-config
Select 1 Change User Password to change the Raspberry Pi password.
The recommended network connection method is an ethernet cable that provides a DHCP client IP to the Raspberry Pi. If you are using Wi-fi, select 2 Network Options and then N2 Wi-fi to configure your Wi-fi country settings, SSID, and passphrase.
Select 4 Localisation options and then I1 Change Locale.
Scroll down with the down arrow key, and deselect the default option (
en_GB.UTF-8 UTF-8
) using the space bar.Scroll down to
en_US.UTF-8 UTF-8
and select that option with the space bar.Press Tabulator and select Ok.
Use the arrow keys in the Configuring locales window to select
en_US.UTF-8
from the list of choices as the default locale. Press Tabulator and select Ok.Select 4 Localisation options and then I2 Change Timezone, and use the menu options to find your geographic region and country.
Setting the timezone is important because some Google Cloud services require client devices' system clocks to be within 10 minutes of real time. Raspbian uses global NTP time servers to synchronize the clock.
(Optional) Select 4 Localisation options and then I3 Change Keyboard Layout to match your layout.
If you are not sure, select Generic 105-key (Intl) PC, English (US), The default for the keyboard layout, and No compose key.
Select Interfacing Options, P2 SSH, and then Yes to enable the SSH server on your Raspberry Pi.
Select Finish to exit the
raspi-config
tool, and accept to restart the Raspberry Pi. If you are not prompted to reboot when you exitraspi-config
, reboot the Raspberry Pi:sudo reboot
Log in as the
pi
user with your new custom password.Verify that you have an IP address on your Raspberry Pi:
ifconfig -a
You should see a client IP address from your local network segment in either the
wlan0
oreth0
interface.Verify that your DNS resolution and internet connection work:
ping www.google.com
If you do not have an IP address, or DNS resolution or internet connectivity fail, do not proceed further until you have fixed your Raspberry Pi's network configuration.
If you are using Wi-Fi, check whether your Wi-Fi adapter has power saving enabled:
iwconfig
Power saving might give you sporadic network connectivity issues. Look for
Power Management
under thewlan0
interface. If you seePower Management:on
, then we recommend that you disable it for this tutorial.On Raspberry Pi 3 model B/B+, disable
wlan0
Wi-Fi power management by configuring/etc/network/interfaces
:sudo vi /etc/network/interfaces
(This command uses
vi
. You can usenano
if you prefer.)Add these lines to the end of the file:
allow-hotplug wlan0 iface wlan0 inet manual # disable wlan0 power saving post-up iw dev $IFACE set power_save off
Apply the changes by rebooting the Raspberry Pi:
sudo reboot
Log in again as the
pi
user.Ensure that Wi-fi power management is now off:
iwconfig
You should see
Power Management:off
forwlan0
.Check your internet connectivity:
ping www.google.com
Upgrade the Raspbian packages to latest versions:
sudo apt-get update && sudo apt-get upgrade -y
Install the Cloud SDK
Log in to the Raspberry Pi with an SSH connection from your host computer:
ssh pi@[YOUR_RASPBERRY_PI_IP_ADDRESS]
With an SSH connection, you can copy and paste commands from this tutorial and linked pages to the Raspberry Pi.
On the Raspberry Pi, follow all of the steps to install and initialize the Cloud SDK for Debian systems.
Check that the Cloud SDK is installed and initialized:
gcloud info
Ensure that the
Account
andProject
properties are set correctly.
Install additional operating system packages
Install the required operating system packages:
sudo apt-get update && sudo apt-get install -y git python3-dev python3-pygame python3-venv libatlas-base-dev libasound2-dev python3-pyaudio
Increase console font size
This tutorial uses the HDMI console output of the Raspberry Pi as the main display. If the default console font size is too small, you can use the following steps to increase the font size and set it to Terminus 16x32.
Start the
dpkg-reconfigure
utility:sudo dpkg-reconfigure console-setup
Using the up and down arrow keys, select
UTF-8
. Using the right arrow key, selectOK
. Press Enter.Using the up and down arrow keys, select
Guess optimal character set
. Using the right arrow key, selectOK
. Press Enter.Using the up and down arrow keys, select
Terminus
. Using the right arrow key, selectOK
. Press Enter.Using the up and down arrow keys, select
16×32
. Using the right arrow key, selectOK
. Press Enter.The console is refreshed, and you are returned to the command prompt with the larger console font.
Suppress ALSA errors
On Raspberry Pi, the ALSA sound libraries may give errors when using PyAudio and the pygame library.
To suppress some of the ALSA errors when PyAudio starts, do the following:
Back up the original ALSA configuration file:
sudo cp /usr/share/alsa/alsa.conf /usr/share/alsa/alsa.conf.orig
Edit the ALSA configuration file:
- Search for the segment
# PCM interface
. Comment out the following lines with
#
as shown here:#pcm.front cards.pcm.front #pcm.rear cards.pcm.rear #pcm.center_lfe cards.pcm.center_lfe #pcm.side cards.pcm.side #pcm.surround21 cards.pcm.surround21 #pcm.surround40 cards.pcm.surround40 #pcm.surround41 cards.pcm.surround41 #pcm.surround50 cards.pcm.surround50 #pcm.surround51 cards.pcm.surround51 #pcm.surround71 cards.pcm.surround71 #pcm.iec958 cards.pcm.iec958 #pcm.spdif iec958 #pcm.hdmi cards.pcm.hdmi #pcm.modem cards.pcm.modem #pcm.phoneline cards.pcm.phoneline
- Search for the segment
Connect and configure a microphone
The solution uses a microphone connected to the Raspberry Pi for recording the dialogue. Raspberry Pi does not have analog microphone or line-level inputs.
There are several options to get spoken dialogue audio into the Raspberry Pi:
- USB microphone
- USB sound card with an analog microphone or any line-level audio feed connected to the sound card’s 3.5mm input
- Bluetooth microphone (This can be more complex to set up, and is out of scope for this tutorial.)
Connect and test a microphone with your Raspberry Pi:
Connect the USB microphone to the Raspberry Pi or a USB soundcard to the Raspberry Pi, and an analog microphone to the sound card.
List connected USB devices:
lsusb
The command should display something like the following example, which shows that the second line is the connected USB microphone:
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 005: ID 17a0:0310 Samson Technologies Corp. Meteor condenser microphone Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Identify the card and device numbers for the sound output options:
aplay -l
In this example, the Raspberry Pi built-in headphone output is card
0
, device0
:card 0: Headphones [bcm2835 Headphones], device 0: bcm2835 Headphones [bcm2835 Headphones] Subdevices: 8/8 Subdevice #0: subdevice #0 Subdevice #1: subdevice #1 Subdevice #2: subdevice #2 Subdevice #3: subdevice #3 Subdevice #4: subdevice #4 Subdevice #5: subdevice #5 Subdevice #6: subdevice #6 Subdevice #7: subdevice #7 card 1: Mic [Samson Meteor Mic], device 0: USB Audio [USB Audio] Subdevices: 1/1 Subdevice #0: subdevice #0
Identify the sound input options:
arecord -l
In this example, the USB microphone is card
1
, device0
:**** List of CAPTURE Hardware Devices **** card 1: Mic [Samson Meteor Mic], device 0: USB Audio [USB Audio] Subdevices: 1/1 Subdevice #0: subdevice #0
To configure the operating system to use the correct sound playback and microphone devices, edit or create the configuration file
/home/pi/.asoundrc
. (Note the dot in the file name.asoundrc
. It is a so-called hidden configuration file.) Add the following content to the file and set themic
andspeaker
device numbers to be the same as youraplay -l
andarecord -l
output findings:pcm.!default { type asym capture.pcm "mic" playback.pcm "speaker" } pcm.mic { type plug slave { pcm "hw:1,0" } } pcm.speaker { type plug slave { pcm "hw:0,0" } }
In the preceding example, the microphone is set to
"hw:1,0"
, which means card 1 and device 0, which maps to the USB microphone. The speaker is set to card 0 and device 0, which maps to the Raspberry Pi built-in sound card's 3.5mm audio output.
Test recording with the microphone
Record a 5-second clip in 16KHz raw format:
arecord --format=S16_LE --duration=5 --rate=16000 --file-type=raw out.raw
Connect speakers or headphones to the configured sound output device, such as the Raspberry Pi 3.5mm audio output.
Play the audio clip:
aplay --format=S16_LE --rate=16000 out.raw
Clone the example app and install its dependencies
Using the SSH or console connection to the Raspberry Pi, clone the repository associated with the Google Cloud community tutorials:
git clone https://github.com/GoogleCloudPlatform/community.git
Change to the tutorial directory:
community/tutorials/ar-subs
Create a Python 3 virtual environment:
python3 -m venv venv
Activate the virtual environment:
source venv/bin/activate
Upgrade
pip
andsetuptools
:pip3 install -U pip setuptools
Install the required Python modules:
pip3 install -r requirements.txt
Create a service account and JSON key
In this section, you create a service account in your Google Cloud project and grant sufficient permissions to it so that it can use the AI services. You also need download a JSON key for the service account. The JSON key will be used by the Python utilities to authenticate with the Cloud services.
Create a new service account:
gcloud iam service-accounts create ml-dev --description="ML APIs developer access" --display-name="ML Developer Service Account"
Grant the
ML Developer
role to the service account:gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:ml-dev@$PROJECT_ID.iam.gserviceaccount.com --role roles/ml.developer
Grant the Project Viewer role to the service account:
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:ml-dev@$PROJECT_ID.iam.gserviceaccount.com --role roles/viewer
Create a JSON key for the service account:
gcloud iam service-accounts keys create ./credentials.json --iam-account ml-dev@$PROJECT_ID.iam.gserviceaccount.com
The key file is downloaded to the current working directory.
Identify your USB microphone in python
In this section, you identify the USB microphone by its device number to make it visible to Python. The device numbering for OS sound libraries may not match the device numbering visible to Python apps, which is why you need to find the microphone device again, this time with a Python utility.
Within the Python virtual environment, execute the following command:
python3 mic_identify.py
The command should output something similar to the following, which lists the example USB microphone in the first entry:
(0, 'Samson Meteor Mic: USB Audio (hw:2,0)', 2) (1, 'dmix', 0)
Record the device number. In the example above, it is
0
.
Test recording with the USB microphone in Python
In this section, you record audio with the identified USB microphone device, using PyAudio.
The following command records a 3-second WAV audio file (mono, 16bits, 44.1KHz):
python3 mic_test.py --dev 0
This example uses the device
0
as identified with the previous command.The command should output something similar to the following:
*** Recording 3 seconds with USB device 0 *** *** Finished recording. Wrote output: test1.wav ***
You may get ALSA errors, but you can ignore them if the recording was successful.
Listen to the test recording to make sure that it worked and that the dialogue sound quality is OK:
aplay test1.wav
You may need to configure which interface the Raspberry Pi uses for sound playback
output. You can choose between the HDMI output and the 3.5mm line out / headphone jack,
by executing: sudo raspi-config
, and configuring the setting
under
7: Advanced Options → A4 Audio → Choose the audio output (HDMI | Headphones)
.
Test the Media Translation API example Python client
Now that your USB microphone works with Python, you can call the Media Translation API. This step tests calling the API in streaming mode, piping microphone audio to the service, and displaying the translated live responses in the command-line shell.
You can specify the target language code.
The default target language is de-DE
for German.
To test with Italian, execute the following command, replacing the device number 0
with
your device number if necessary:
python3 translate-microphone.py --lang it-IT --dev 0
Some of the target languages require a Unicode font. By default,
the Raspberry Pi console font cannot display Unicode characters. For this
reason, use a Latin-based language such as German or Italian in this step, to
test the Media Translation API with your microphone. The next sections
show how to use the ar-subs.py
app, which uses the pygame library and
specific fonts to display output in Hindi and Japanese.
Test pygame with an HDMI display
In this section, you make sure that pygame can control the connected HDMI display or projector.
Test this by running an example game included in the pygame package:
If you have been connected with SSH, switch to the Raspberry Pi's HDMI output console—such as a monitor or a projector connected to the Raspberry Pi's HDMI port—and log in there.
Change to the directory containing the example application:
cd community/tutorials/ar-subs
Activate the Python virtual environment:
source venv/bin/activate
Execute the pygame test game:
python3 -m pygame.examples.aliens
You should see a game on the console display:
You can control your car with the arrow keys and shoot with the space bar.
The game will exit when you are hit.
If you can see the game, pygame is working with your display and you can proceed to the next steps.
Run the AR Subtitles application
In this section, you run the AR Subtitles application.
The app has two operating modes:
- Subtitles mode:
- Subtitles are a luma effects video feed through the Raspberry Pi HDMI output, which can be overlaid on the primary video feed with a video mixer with a luma keyer.
- Text is white, with a dark blue background behind the letters.
- The rest of the screen is black, intended to be keyed out.
- Augmented Reality mode:
- Translations are intended to be projected onto a surface in the real world using, for example, a pico projector.
- Text is white, with a large font size to make it easily legible.
- The rest of the screen black, which is not projected due to the pixel values being (0,0,0).
Subtitles mode with a video mixer and luma keyer (--position bottom
)
In this mode, the Raspberry Pi's HDMI output must be connected to a video mixer unit with luma keyer capabilities. The app displays a black background, which is then keyed out. The luma keyer needs to be tuned so that it keys out the background and leaves the translation output text as an overlay on top of a video. With this mode, you can overlay translated subtitles on any video feed, such as presentations and online video conference applications.
To make the translated subtitles more legible on top of light-colored backgrounds such as slides, the app adds a dark blue background behind the translated white fonts. Otherwise white on white would not be seen. In this example, the app adds a dark blue (RGB: 0,0,139) background behind the text, which can still be visible after applying the luma keyer.
To make the translations work, connect the video source’s audio to the Raspberry Pi’s audio input. This requires a USB sound card that has an audio microphone or line-in connector. Then connect the Raspberry Pi HDMI output as a camera source to the video mixer. With this setup, the video mixer has 2 video sources:
- the original video feed
- the Raspberry Pi translated subtitles output
Using the luma keyer in the mixer, key out the black background, and overlay the remaining translated subtitles on the original video feed.
View the command-line options:
python3 ar-subs.py --help
To start the app in subtitles mode, run the following, replacing the values with your desired options:
python3 ar-subs.py --dev 0 --lang hi-IN --maxchars 85 --fontsize 46 --position bottom
After the app starts, you are presented with keys that you can press. The key presses are registered and handled by the pygame library. While the Media Translation API client is streaming an ongoing sentence, execution is blocked; key presses are acted on after the current sentence finishes. To finish a sentence, simply stop talking.
To start translating, press any key. The screen will turn black. As you speak, the translations should start being displayed. You can enable or disable interim results by pressing the
i
key.To quit, press
q
and speak a bit more to register the key press.Now that you have live translations displayed through the Raspberry Pi's HDMI port, you can use your video mixer's luma keyer to key out the black background.
The luma keyer settings are specific to each video mixer, but the general principle is the same: The keyer's input should be set to the Raspberry Pi HDMI output, and the keyer's luminance threshold should be set so that the black background is keyed out (removed), and the text with the blue background should remain as a transparent overlay. In this picture, you can see the Downstream Luma Keyer set to On Air with the example Blackmagic ATEM Mini Pro video mixer:
Now you can switch the mixer to the primary video feed, and have real-time translated subtitles. You can then use the video mixer output as a webcam and, for example, join a video conference with subtitles.
Augmented reality mode with a projector (--position top
)
In this mode, the translated text is intended to be projected onto real-world physical surfaces, using a projector, in effect creating an augmented reality display of the translation output. To make the text easily legible, this mode uses very large font sizes.
Note! Projectors use very bright lamps that can be damaging to your eyes. Never look directly into a projector lens, and never point the projector at a person's head. If you test this solution with a projector, point it away from people, at a surface, such as a wall.
View the command-line options:
python3 ar-subs.py --help
To start the app in AR mode, run the following, replacing the values with your desired options:
python3 ar-subs.py --dev 0 --lang de-DE -maxchars 42 --fontsize 120 --position top
After the app starts, you are presented with keys that you can press. The key presses are registered and handled by the pygame library. While the Media Translation API client is streaming an ongoing sentence, execution is blocked; key presses are acted on after the current sentence finishes. To finish a sentence, simply stop talking.
To start translating, press any key. The screen will turn black. As you speak, the translations should start being displayed. You can enable or disable interim results by pressing the
i
key.To quit, press
q
and speak a bit more to register the key press.After the app starts, point the projector at a surface where you want to display the subtitles.
Experiment with different font sizes. Larger may be better, depending on where you project the text.
Test mode, translating lines in a text file (--testfile
)
The app has a testing mode, with the command line switch --testmode
. In this
mode, the app reads the input text file and displays each line with the
configured font and display mode, line by line. You can use this mode to test
different font sizes and to simulate the app offline. In this mode, the app
displays each line in the file after a key press.
Prepare an input text file. In order to display non-latin characters, you have to store the text in the file in Unicode format. A handy way is to use Google Translate to create translated text in the target language, and then simply copy and paste the translated output into a text file.
Assume you have the following line in a file
test.txt
:日本語テキストテスト
Start the app in test mode:
python3 ar-subs.py --lang ja-JP --maxchars 40 --fontsize --46 --position bottom --testfile test.txt
After the app starts, press any key to display the next line in the input file. The app quits after it has displayed the last line, or if you press
q
.
Example font sizes and line lengths
Top: AR projector mode | Bottom: subtitles overlay mode | |
---|---|---|
Latin languages (--lang de-DE ) |
--maxchars 42 --fontsize 120 --position top |
--maxchars 74 --fontsize 72 --position bottom |
Japanese (--lang ja-JP ) |
--maxchars 20 --fontsize 92 --position top |
--maxchars 40 --fontsize 46 --position bottom |
Hindi (--lang hi-IN ) |
--maxchars 42 --fontsize 92 --position top |
--maxchars 85 --fontsize 46 --position bottom |
Fonts used in the example
- Hindi: Devanagari Noto Sans
- Japanese: Noto Sans JP
Fonts published by Google. Licensed under APACHE 2.0. Available at fonts.google.com.
Cleaning up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, you can delete the project.
Deleting a project has the following consequences:
- If you used an existing project, you'll also delete any other work that you've done in the project.
- You can't reuse the project ID of a deleted project. If you created a custom project ID that you plan to use in the
future, delete the resources inside the project instead. This ensures that URLs that use the project ID, such as
an
appspot.com
URL, remain available.
To delete a project, do the following:
- In the Cloud Console, go to the Projects page.
- In the project list, select the project you want to delete and click Delete project.
In the dialog, type the project ID, and then click Shut down to delete the project.
What's next
- Watch this tutorial's Google Cloud Level Up episode on YouTube.
- Learn more about AI on Google Cloud.
- Learn more about Cloud developer tools.
- Try out other Google Cloud features for yourself. Have a look at our tutorials.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.