AI & Machine Learning
Best practices for translating websites with Translation API
Looking to translate your website? Google Cloud can help!
Google Cloud Translation API is a service that dynamically translates between languages with Google’s state-of-the-art Machine Learning models. It is a highly scalable API that supports over one hundred languages, with built-in language detection. In this blog post, we will share some best practices for optimizing cost, increasing performance, and hardening the security posture while using the Translation API with your websites.
Optimize Architecture for Performance, Cost and Security
A common way to translate websites is to have site visitors select their language of choice, and then display the website in that language. However, on demand translation can be costly as this method requires re-sending the same content repeatedly to be translated.
One of the best practices to reduce cost,increase performance and enhance security posture is to leverage a caching pattern. Caching translated content not only reduces calls to the Google Cloud Translation API but also decreases load and compute usage on your backend web servers and databases. This in turn optimizes the performance of your application and reduces the cost of delivery.
There are many ways to set up caching in your application architecture at different layers of the application. For example, you can:
- Utilize a Content Delivery Network (CDN) for global distribution: Translated content is stored at globally distributed edge locations which reduce network latency by serving content closer to your users. It is worth mentioning that while using a CDN, you’ll want to keep an eye on the cache hit ratio (the percentage of times that a requested object is served from the cache) to ensure contents are cached properly. And if you choose to use Google Cloud CDN you’ll also benefit from default security protection against common security threats such as DDoS and bots activities. You can also add an additional layer of security with a Web Application Firewall (WAF) such as Google Cloud Armor to protect against application layer attacks such as SQL injection (SQLi), Cross-Site Scripting (XSS), or Remote Code Execution (RCE).
- Cache at the proxy layer: Proxy caching is a simple and effective option if you are already utilizing servers such as NGINX or HAProxy. Proxy cache provides an advantage of minimal to no refactoring of the backend application code. By caching translated content at the proxy, it can also help reduce costs as well as the load of backend servers therefore increases performance for your users.
- Cache at the application level: Caching can be configured at the application layer in memory within web servers or via an out-of-process memory cache service such as Google Cloud Memorystore for Redis and Memcached. Note that at this layer, it may be best to store the raw translated content without html markup as you would in a CDN or proxy. This is because typically html rendering happens after the application retrieves the cached content. If you are using a Content Management System (CMS) such as Wordpress and Drupal, you can leverage available translation plugins or even custom plugins to store and retrieve cached content in Memorystore.
Secure access with the principle of least privilege
When it comes to accessing the Google Cloud Translation API, it is always recommended to use Google Cloud Service Account rather than api keys. A service account is a special type of authentication account that represents a non-human user and can be authorized to access data in Google APIs such as Google Cloud Translation API. Service accounts are not assigned passwords and can't be used for browser-based sign-in, minimizing that threat vector. Following the principle of least privilege, you can grant a role with minimum privileges and set of permissions required to access the Translation API. If you are interested in more information, we have documented best practices for using and managing service accounts, securing service accounts, and managing service account keys.
Customize translations and provide transparency
If your content includes domain-specific and context-specific terms or phrases, the Google Cloud Translation API Advanced supports custom terminology with Glossary. You can also build and leverage custom translation models through Google AutoML Translation.
Note, if you are not reviewing and editing translations ahead of publishing, Google recommends letting your customers know that content has been machine translated with Google and provides specific guidelines for attribution. This transparency ensures customers understand content has been automatically translated and it's possible for errors to occur.
Budget: Set it and don’t forget it
The costs associated with Google Cloud Translation API are based primarily on the number of characters sent to the API. For example, given the list price of $10 per million characters, if your web page consists of 20 million characters, and you're translating into 10 languages, then your cost will be $10 x 20 = $200. You can learn more about pricing here and create an estimate via our pricing calculator.
As you move into development, control costs and avoid surprises by setting up budget monitoring and alerts in your environment. This can help you proactively address issues if costs rise unanticipatedly. We recommend setting a smaller quota limit while in testing phases to ensure you stay within budget. Just remember to adjust the quota when you go into production so you don’t have any disruptions to your service due to hitting limits.
Translation is critical in a digital age where people with diverse backgrounds and language are collaborating more than ever. With best practices and guidelines in this blog, you are well equipped to create a competitive edge and reach new audiences.