Google Cloud CDN and HTTP(S) load balancing
The Cloud CDN content delivery network works with HTTP(S) load balancing to deliver content to your users. The HTTP(S) load balancing configuration specifies the frontend IP addresses and ports on which Cloud CDN receives requests and the backends that originate responses to those requests.
Cloud CDN content can originate from two types of backends:
When a user requests content from your site, the request arrives at a location at the edge of Google's network, which is usually far closer to the user than your backends. Cloud CDN uses caches at these locations to store responses generated by your backends. The backend that generates the response is referred to as the origin server.
The following figure illustrates how responses from origin servers running on VM instances flow through an HTTP(S) load balancer before being delivered by Cloud CDN.
Cache hits, misses, fill, and egress
The first time a piece of content is requested, the cache sees that it can't fulfill the request. This is called a cache miss. The cache might attempt to get the content from a nearby cache. If the nearby cache has the content, it sends it to the first cache via cache-to-cache fill. Otherwise, the request is forwarded to the HTTP(S) load balancer. The load balancer in turn forwards the request to one of your backends. This backend is the origin server for the content.
When the cache receives the content, the cache forwards the content to the user. If the content is cacheable, the cache can store it for future requests. The cache might decline to store new content if inserting it into cache would require evicting content that is more popular or if the cache has insufficient information about the popularity of the new content. For example, a cache might decline to insert large content on its first access.
When your users request content that is already stored in a cache, the cache looks up the content by its cache key and responds directly to the user, shortening the round trip time and saving the origin server from having to process the request. This is called a cache hit.
Data transfer from a cache to a client is called cache egress. Data transfer to a cache is called cache fill. As illustrated in the following figure, cache fill can originate from another Cloud CDN cache or from the origin server.
On cache hits, you pay for cache egress bandwidth. On cache misses—including misses that resulted in cache-to-cache fill—you additionally pay for cache fill bandwidth. That means, all else being equal, cache hits cost less than cache misses. For detailed pricing information, refer to the pricing page.
Inserting content into cache
Caching is reactive in that an object is stored in a particular cache if a request goes through that cache and if the response is cacheable. An object stored in one cache does not automatically replicate into other caches; cache fill happens only in response to a client-initiated request. You cannot pre-load caches except by causing the individual caches to respond to requests.
When the origin server supports byte range requests, Cloud CDN can initiate multiple cache fill requests in reaction to a single client request. The section Requests initiated by Cloud CDN has more information about these requests.
Content served from cache
After you enable Cloud CDN, caching happens automatically for all cacheable content. Your origin server uses HTTP headers to indicate which responses should be cached and which should not. When you use a backend bucket, the origin server is Cloud Storage. When you use VM instances, the origin server is the web server software you run on those instances. Caching Details has additional information about what Cloud CDN caches and for how long.
Cloud CDN uses caches in numerous locations around the world. Because of the nature of caches, it is impossible to predict whether a particular request will be served out of cache. You can, however, expect that popular requests for cacheable content will be served from cache a high percentage of the time, yielding significantly reduced latencies, reduced cost, and reduced load on your origin servers.
Requests initiated by Cloud CDN
When your origin server supports byte range requests, Cloud CDN can send multiple requests to the origin server in reaction to a single client request. As described in Support for byte range requests, Cloud CDN can initiate two types of requests: validation requests and byte range requests. For more information about byte range requests, see that documentation.
Data location settings of other Cloud Platform Services
Note that using Cloud CDN means data may be stored at serving locations outside of the region or zone of your origin server. This is normal and how HTTP caching works on the Internet. Under the Service Specific Terms of the Google Cloud Platform Terms of Service, the Data Location Setting that is available for certain Cloud Platform Services will not apply to Core Customer Data for the respective Cloud Platform Service when used with other Google products and services (in this case the Cloud CDN service). If you do not want this outcome, please do not use the Cloud CDN service.