Overview

Google Cloud CDN and HTTP(S) load balancing

The Cloud CDN content delivery network works with HTTP(S) load balancing to deliver content to your users. The HTTP(S) load balancing configuration specifies the frontend IP addresses and ports on which Cloud CDN receives requests and the backends that originate responses to those requests.

Cloud CDN content can originate from two types of backends:

When a user requests content from your site, the request arrives at a location at the edge of Google's network, which is usually far closer to the user than your backends. Cloud CDN uses caches at these locations to store responses generated by your backends. The backend that generates the response is referred to as the origin server.

The following figure illustrates how responses from origin servers running on VM instances flow through an HTTP(S) load balancer before being delivered by Cloud CDN.

Responses flow from origin servers through Cloud CDN to clients
Cloud CDN response flow

Cache hits, misses, fill, and egress

The first time a piece of content is requested, the cache sees that it can't fulfill the request. This is called a cache miss. The cache might attempt to get the content from a nearby cache. If the nearby cache has the content, it sends it to the first cache via cache-to-cache fill. Otherwise, the request is forwarded to the HTTP(S) load balancer. The load balancer in turn forwards the request to one of your backends. This backend is the origin server for the content.

When the cache receives the content, the cache forwards the content to the user. If the content is cacheable, the cache can store it for future requests. The cache might decline to store new content if inserting it into cache would require evicting content that is more popular or if the cache has insufficient information about the popularity of the new content. For example, a cache might decline to insert large content on its first access.

When your users request content that is already stored in a cache, the cache looks up the content by its cache key and responds directly to the user, shortening the round trip time and saving the origin server from having to process the request. This is called a cache hit.

The initial response is served by the origin server while subsequent
 responses are served from cache
Cache miss and cache hit

Data transfer from a cache to a client is called cache egress. Data transfer to a cache is called cache fill. As illustrated in the following figure, cache fill can originate from another Cloud CDN cache or from the origin server.

Cache fill is data transfer from an origin server to a cache or from a
 cache to another cache. Cache egress is data transfer from a cache to a
 client.
Cache fill and cache egress

On cache hits, you pay for cache egress bandwidth. On cache misses—including misses that resulted in cache-to-cache fill—you additionally pay for cache fill bandwidth. That means, all else being equal, cache hits cost less than cache misses. For detailed pricing information, refer to the pricing page.

Inserting content into cache

Caching is reactive in that an object is stored in a particular cache if a request goes through that cache and if the response is cacheable. An object stored in one cache does not automatically replicate into other caches; cache fill happens only in response to a client-initiated request. You cannot pre-load caches except by causing the individual caches to respond to requests.

When the origin server supports byte range requests, Cloud CDN can initiate multiple cache fill requests in reaction to a single client request. The section Requests initiated by Cloud CDN has more information about these requests.

Content served from cache

After you enable Cloud CDN, caching happens automatically for all cacheable content. Your origin server uses HTTP headers to indicate which responses should be cached and which should not. When you use a backend bucket, the origin server is Cloud Storage. When you use VM instances, the origin server is the web server software you run on those instances. Caching Details has additional information about what Cloud CDN caches and for how long.

Cloud CDN uses caches in numerous locations around the world. Because of the nature of caches, it is impossible to predict whether a particular request will be served out of cache. You can, however, expect that popular requests for cacheable content will be served from cache a high percentage of the time, yielding significantly reduced latencies, reduced cost, and reduced load on your origin servers.

You can view logs to see what Cloud CDN is serving from cache. If you need to remove content from caches, you can initiate a cache invalidation operation.

Eviction and expiration

For content to be served from a cache, it must have been inserted into the cache, it must not be evicted, and it must not be expired.

Eviction and expiration are two different concepts. They both affect what gets served, but they don't directly affect each other.

  • Eviction: Every cache has a limit on how much it can hold. However, Cloud CDN adds content to caches even after they're full. To insert content into a full cache, the cache first removes something else to make room. This is called eviction. Caches are usually full, so they are constantly evicting content. They generally evict content that hasn't recently been accessed, regardless of the content's expiration time. The evicted content might be expired, and it might not be. Setting an expiration time doesn't affect eviction.

    Unpopular content means content that hasn't been accessed in a while. "A while" and "unpopular" are both relative to the bulk of other items in the cache. As caches receive more traffic, they also evicts more cached content.

    As with all large-scale caches, content can be evicted unpredictably, so no particular request is guaranteed to be served from the cache.

  • Expiration: Content in HTTP(S) caches can have a configurable expiration time. The expiration time informs the cache not to serve old content, even if the content hasn't been evicted.

    For example, consider a picture-of-the-hour URL. Its responses should probably be set to expire in under one hour. Otherwise, the served content might be an old picture from a cache.

Requests initiated by Cloud CDN

When your origin server supports byte range requests, Cloud CDN can send multiple requests to the origin server in reaction to a single client request. As described in Support for byte range requests, Cloud CDN can initiate two types of requests: validation requests and byte range requests. For more information about byte range requests, see that documentation.

Data location settings of other Cloud Platform Services

Note that using Cloud CDN means data may be stored at serving locations outside of the region or zone of your origin server. This is normal and how HTTP caching works on the Internet. Under the Service Specific Terms of the Google Cloud Platform Terms of Service, the Data Location Setting that is available for certain Cloud Platform Services will not apply to Core Customer Data for the respective Cloud Platform Service when used with other Google products and services (in this case the Cloud CDN service). If you do not want this outcome, please do not use the Cloud CDN service.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud CDN Documentation