Best Practices for Microservice Performance

Region ID

The REGION_ID is an abbreviated code that Google assigns based on the region you select when you create your app. The code does not correspond to a country or province, even though some region IDs may appear similar to commonly used country and province codes. For apps created after February 2020, REGION_ID.r is included in App Engine URLs. For existing apps created before this date, the region ID is optional in the URL.

Learn more about region IDs.

Software development is all about tradeoffs and microservices are no exception. What you gain in code deployment and operation independence, you pay for in performance overhead. This section provides some recommendations for steps that you can take to minimize this impact.

Turn CRUD operations into microservices

Microservices are particularly well-suited to entities that are accessed with the create, retrieve, update, delete (CRUD) pattern. When working with such entities, you typically use only one entity at a time, such as a user, and you typically perform only one of the CRUD actions at a time. Therefore, you only need a single microservice call for the operation. Look for entities that have CRUD operations plus a set of business methods that could be utilized in many parts of your application. These entities make good candidates for microservices.

Provide batch APIs

In addition to CRUD-style APIs, you can still provide good microservice performance for groups of entities by providing batch APIs. For example, rather than only exposing a GET API method that retrieves a single user, provide an API that takes a set of user IDs and returns a dictionary of corresponding users:

Request:

/user-service/v1/?userId=ABC123&userId=DEF456&userId=GHI789

Response:

{
  "ABC123": {
    "userId": "ABC123",
    "firstName": "Jake",
    … },
  "DEF456": {
    "userId": "DEF456",
    "firstName": "Sue",
    … },
  "GHI789": {
    "userId": "GHI789",
    "firstName": "Ted",
    … }
}

The App Engine SDK supports many batch APIs, such as the ability to fetch many entities from Cloud Datastore through a single RPC, so servicing these types of batch APIs can be very efficient.

Use asynchronous requests

Often, you will need to interact with many microservices to compose a response. For example, you might need to fetch the logged-in user's preferences as well as their company details. Frequently, these pieces of information are not dependent on one another and you could fetch them in parallel. The Urlfetch library in the App Engine SDK supports asynchronous requests, allowing you to call microservices in parallel.

The following Python example code uses RPCs directly to employ asynchronous requests:

from google.appengine.api import urlfetch

preferences_rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(preferences_rpc,
                         'https://preferences-service-dot-my-app.uc.r.appspot.com/preferences-service/v1/?userId=ABC123')

company_rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(company_rpc,
                         'https://company-service-dot-my-app.uc.r.appspot.com/company-service/v3/?companyId=ACME')

 ### microservice requests are now occurring in parallel

try:
  preferences_response = preferences_rpc.get_result()  # blocks until response
  if preferences_response.status_code == 200:
    # deserialize JSON, or whatever is appropriate
  else:
    # handle error
except urlfetch.DownloadError:
  # timeout, or other transient error

try:
  company_response = company_rpc.get_result()  # blocks until response
  if company_response.status_code == 200:
    # deserialize JSON, or whatever is appropriate
  else:
    # handle error
except urlfetch.DownloadError:
  # timeout, or other transient error

Doing work in parallel often runs counter to good code structure because, in a real world scenario, you often use one class to encapsulate preferences methods and another class to encapsulate company methods. It's difficult to leverage asynchronous Urlfetch calls without breaking this encapsulation. A good solution exists in the App Engine Python SDK's NDB package: Tasklets. Tasklets enable you to keep good encapsulation in your code while still offering a mechanism to achieve parallel microservice calls. Note that tasklets use futures instead of RPCs, but the idea is similar.

Use the shortest route

Depending on how you invoke Urlfetch, you can cause different infrastructure and routes to be used. In order to use the best-performing route, consider the following recommendations:

Use REGION_ID.r.appspot.com, not a custom domain
A custom domain causes a different route to be used when routing through the Google infrastructure. Since your microservice calls are internal, it's easy to do and performs better if you use https://PROJECT_ID.REGION_ID.r.appspot.com.
Set follow_redirects to False
Explicitly set follow_redirects=False when calling Urlfetch, as it avoids a heavier-weight service designed to follow redirects. Your API endpoints should not need to redirect the clients, because they are your own microservices, and endpoints should only return HTTP 200-, 400-, and 500-series responses.
Prefer services within a project over multiple projects
There are good reasons to use multiple projects when building a microservices-based application, but if performance is your primary goal, use services within a single project. Services of a project are hosted in the same datacenter and even though throughput on Google's inter-datacenter network is excellent, local calls are faster.

Avoid chatter during security enforcement

It's bad for performance to use security mechanisms that involve lots of back and forth communication to authenticate the calling API. For example, if your microservice needs to validate a ticket from your application by calling back to the application, you've incurred a number of roundtrips to get your data.

An OAuth2 implementation can amortize this cost over time by using refresh tokens and caching an access token between Urlfetch invocations. However, if the cached access token is stored in memcache, you will need to incur memcache overhead to fetch it. To avoid this overhead, you might cache the access token in instance memory, but you will still experience the OAuth2 activity frequently, as each new instance negotiates an access token; remember that App Engine instances spin up and down frequently. Some hybrid of memcache and instance cache will help mitigate this issue, but your solution starts to become more complex.

Another approach that performs well is to share a secret token between microservices, for example, transmitted as a custom HTTP header. In this approach, each microservice could have a unique token for each caller. Typically, shared secrets are a questionable choice for security implementations, but since all the microservices are in the same application, it becomes less of an issue, given the performance gains. With a shared secret, the microservice only needs to perform a string comparison of the incoming secret against a presumably in-memory dictionary, and the security enforcement is very light.

If all of your microservices are on App Engine, you can also inspect the incoming X-Appengine-Inbound-Appid header. This header is added by the Urlfetch infrastructure when making a request to another App Engine project and cannot be set by an external party. Depending on your security requirement, your microservices could inspect this incoming header to enforce your security policy.

Trace microservice requests

As you build your microservices-based application, you begin to accumulate overhead from successive Urlfetch calls. When this happens, you can use Cloud Trace to understand what calls are being made and where the overhead is. Importantly, Cloud Trace can also help identify where independent microservices are being serially invoked, so you can refactor your code to perform these fetches in parallel.

A helpful feature of Cloud Trace kicks in when you use multiple services within a single project. As calls are made between microservice services in your project, Cloud Trace collapses all the calls together into a single call graph to allow you to visualize the entire end-to-end request as a single trace.

Google Cloud Trace screenshot

Note that in the above example, the calls to the pref-service and the user-service are performed in parallel by using an asynchronous Urlfetch, so the RPCs appear scrambled in the visualization. However this is still a valuable tool for diagnosing latency.

What's next