The Cloud CDN content delivery network works with HTTP(S) load balancing to deliver content to your users. The HTTP(S) load balancer provides the frontend IP addresses and ports that receive requests and the backends that respond to the requests.
Cloud CDN content can be served from various types of backends:
In Cloud CDN, these backends are also called origin servers. The following figure illustrates how responses from origin servers running on VM instances flow through an HTTP(S) load balancer before being delivered by Cloud CDN.
How Cloud CDN works
When a user requests content from an HTTP(S) load balancer, the request arrives at a Google Front End (GFE), which is located at the edge of Google's network as close as possible to the user. If the load balancer's URL map routes traffic to a backend that has Cloud CDN configured, the GFE uses Cloud CDN in the following way:
The GFE first looks in the Cloud CDN cache for a response to the user's request. If the GFE finds a cached response, the GFE sends the cached response to the user. This is called a cache hit.
Otherwise, if the GFE can't find a cached response for the request, the GFE makes a request to the appropriate backend (the origin server). If the response to this request is cacheable, the GFE stores the response in the Cloud CDN cache so that the cache can be used for subsequent requests.
To use Cloud CDN, your HTTP(S) load balancer must use the Premium Network Service Tiers. When Cloud CDN provides a cached response, the response is served by the load balancer's IP address. Cloud CDN doesn't perform any URL redirection; the Cloud CDN cache is located at the GFE. This means that the URL a client requests is the same whether or not Cloud CDN is enabled, and the URL is the same whether or not there's a cache hit.
To remove an item from the cache, you can invalidate cached content.
To bypass Cloud CDN, you can request an object directly from a Cloud Storage bucket or a Compute Engine VM. For example, a URL for a Cloud Storage bucket object looks like this:
Cache hits, misses, fill, and egress
The first time a piece of content is requested, the GFE determines that it can't fulfill the request from the cache. This is called a cache miss. The GFE might attempt to get the content from a nearby cache. If the nearby cache has the content, the GFE sends the content to the first cache via cache-to-cache fill. Otherwise, the GFE forwards the request to the HTTP(S) load balancer. The load balancer in turn forwards the request to one of your backends. This backend is the origin server for the content.
When the cache receives the content, the GFE forwards the content to the user. If the content is cacheable, the cache can store it for future requests. The cache might decline to store new content if inserting the new content would require evicting other content that is more popular or if the cache has insufficient information about the popularity of the new content. For example, a cache might decline to insert large content on its first access.
When your users request content that is already stored in a cache, the GFE looks up the content by its cache key and responds directly to the user, shortening the round trip time and saving the origin server from having to process the request.
Data transfer from a cache to a client is called cache egress. Data transfer to a cache is called cache fill. As illustrated in the following figure, cache fill can originate from another Cloud CDN cache or from the origin server.
On cache hits, you pay for cache egress bandwidth. On cache misses—including misses that resulted in cache-to-cache fill—you additionally pay for cache fill bandwidth. That means, all else being equal, cache hits cost less than cache misses. For detailed pricing information, refer to the pricing page.
Cache hit ratio
The cache hit ratio is the percentage of times that a requested object is served from the cache. If the cache hit ratio is 60%, it means that the requested object is served from the cache 60% of the time and must be retrieved from the origin 40% of the time.
In the Google Cloud Console, the cache hit ratio is reported for each origin in the Cache hit ratio column. Go to the Cloud CDN page The percentage shown on this page represents a ratio calculated for a small time frame (the last few minutes). To view the cache hit ratio for a time period from 1 hour to 30 days, click the origin name and then click the Monitoring tab.
Inserting content into cache
Caching is reactive in that an object is stored in a particular cache if a request goes through that cache and if the response is cacheable. An object stored in one cache does not automatically replicate into other caches; cache fill happens only in response to a client-initiated request. You cannot pre-load caches except by causing the individual caches to respond to requests.
When the origin server supports byte range requests, Cloud CDN can initiate multiple cache fill requests in reaction to a single client request. The section Requests initiated by Cloud CDN has more information about these requests.
Content served from cache
After you enable Cloud CDN, caching happens automatically for all cacheable content. Your origin server uses HTTP headers to indicate which responses should be cached and which should not. When you use a backend bucket, the origin server is Cloud Storage. When you use VM instances, the origin server is the web server software you run on those instances. Caching Details has additional information about what Cloud CDN caches and for how long.
Cloud CDN uses caches in numerous locations around the world. Because of the nature of caches, it is impossible to predict whether a particular request will be served out of cache. You can, however, expect that popular requests for cacheable content will be served from cache a high percentage of the time, yielding significantly reduced latencies, reduced cost, and reduced load on your origin servers.
Eviction and expiration
For content to be served from a cache, it must have been inserted into the cache, it must not be evicted, and it must not be expired.
Eviction and expiration are two different concepts. They both affect what gets served, but they don't directly affect each other.
Eviction: Every cache has a limit on how much it can hold. However, Cloud CDN adds content to caches even after they're full. To insert content into a full cache, the cache first removes something else to make room. This is called eviction. Caches are usually full, so they are constantly evicting content. They generally evict content that hasn't recently been accessed, regardless of the content's expiration time. The evicted content might be expired, and it might not be. Setting an expiration time doesn't affect eviction.
Unpopular content means content that hasn't been accessed in a while. "A while" and "unpopular" are both relative to the bulk of other items in the cache. As caches receive more traffic, they also evict more cached content.
As with all large-scale caches, content can be evicted unpredictably, so no particular request is guaranteed to be served from the cache.
Expiration: Content in HTTP(S) caches can have a configurable expiration time. The expiration time informs the cache not to serve old content, even if the content hasn't been evicted.
For example, consider a picture-of-the-hour URL. Its responses should probably be set to expire in under one hour. Otherwise, the served content might be an old picture from a cache.
Requests initiated by Cloud CDN
When your origin server supports byte range requests, Cloud CDN can send multiple requests to the origin server in reaction to a single client request. As described in Support for byte range requests, Cloud CDN can initiate two types of requests: validation requests and byte range requests. For more information about byte range requests, see that documentation.
Data location settings of other Cloud Platform Services
Note that using Cloud CDN means data may be stored at serving locations outside of the region or zone of your origin server. This is normal and how HTTP caching works on the Internet. Under the Service Specific Terms of the Google Cloud Platform Terms of Service, the Data Location Setting that is available for certain Cloud Platform Services will not apply to Core Customer Data for the respective Cloud Platform Service when used with other Google products and services (in this case the Cloud CDN service). If you do not want this outcome, please do not use the Cloud CDN service.
Support for Google-managed SSL certificates
You can use Google-managed certificates when Cloud CDN is enabled.
Cloud Armor Restriction
Google Cloud Armor is not supported for Cloud CDN. If you try to associate a Google Cloud Armor Security Policy with a backend service and Cloud CDN is enabled, the configuration is rejected. Similarly, if you attempt to enable Cloud CDN on a backend service that has an associated Google Cloud Armor security policy, the configuration process fails.