How to do serverless pixel tracking

This tutorial explains how to set up a pixel tracking infrastructure. This tutorial isn't about setting up a website. Rather, it shows you how to make a pixel available through a URL that can then be embedded in a publisher's property, such as a web page.

The example infrastructure offers three main features. These features include:

  • Serving: This requires a scalable and fast object storage that can sometimes serve several hundreds of thousands requests per second. This tutorial shows how to use Cloud Storage.
  • Collecting: This part catches HTTP requests made for the pixel. This tutorial uses the Compute Engine HTTP load balancer, leveraging Stackdriver Logging.
  • Analyzing: Pixels are loaded every time a user visits a page. These requests can create hundreds of thousands of events per second. This tutorial uses BigQuery to help make sense of all this data, including the parameters.

Objectives

In this tutorial, you build:

  • A pixel being served at http://[YOUR_IP_ADDRESS]/pixel.png?[PARAMETERS].
  • A way to record the requests made for the pixel URL.
  • A testing environment to act as if the pixel was called by a page. This environment calls URLs with various combinations of key-value pairs in the parameters string, including:

    • User ID
    • Page displayed
    • Products in shopping cart
  • The capabilities of BigQuery logs-export functionality make sure that pixel requests are recorded in BigQuery as they happen.

  • A few example queries that help you understand:

    • Top users on the website.
    • Pages that were visited.
    • Top products in the shopping cart.

Costs

Such an architecture can incur costs as follows:

While this example works by using BigQuery, there are various options incurring different costs that you can choose from depending on your needs:

  • Export to Cloud Storage: the cheapest way to export pixel logs. Data is exported hourly and you pay only for storage pricing.
  • Export to BigQuery: makes data available in real-time into ad-hoc analysis tools. Related costs include using:

    • BigQuery storage
    • BigQuery streaming API
    • BigQuery querying
  • Export to Cloud Pub/Sub: This solution, mentioned towards the end of this tutorial, enables real-time aggregation of the data leveraging Cloud Dataflow. This approach could limit the BigQuery streaming API cost. Costs occurring with this solution include:

    • Cloud Pub/Sub
    • Cloud Dataflow

Use the Pricing Calculator to generate a cost estimate based on your projected usage.

Before you begin

  1. In the GCP Console, go to the Manage resources page.

    Go to the Manage resources page

  2. Select a project, or click Create and create a new GCP project.

Setting up the pixel serving

Create a bucket

You can create a bucket through the Google Cloud Platform Console or with the command-line tool, gsutil. This tutorial assumes you use a bucket name similar to:

    [YOUR_PREFIX]-gcs-pixel-tracking

Where you replace [YOUR_PREFIX] with a unique string.

To create a bucket:

GCP Console

  1. In the GCP Console, go to the Cloud Storage Browser page.

    Go to the Cloud Storage Browser page

  2. Click Create bucket.
  3. In the Create bucket dialog, specify the following attributes:
  4. Click Create.

Command line

    Create a new Cloud Storage bucket. Replace [YOUR_BUCKET] with a bucket name that meets the bucket name requirements:
    gsutil mb gs://[YOUR_BUCKET]

Update bucket permissions

After you create the bucket, the bucket is listed in the GCP Console. Next, you make the bucket publicly accessible.

  1. In the GCP Console, click Moremore_vert to the right of your bucket row, and select Edit bucket permissions.

    Setting permissions on the bucket

  2. In the Add users field, enter allUsers.

  3. In the Select a role list, select Reader.

    Permissions set on the bucket

  4. Click Add.

Upload the pixel

You can copy a pixel directly from Google's public Cloud Storage bucket:

gsutil cp gs://solutions-public-assets/pixel-tracking/pixel.png  gs://[YOUR_PREFIX]-gcs-pixel-tracking

Alternatively, you can create a pixel locally and then upload it in the GCP Console.

  1. In the list of buckets, click [YOUR_PREFIX]-gcs-pixel-tracking.
  2. Click Upload Files, select the files you want to upload in the dialog that appears, and then click Open.
  3. In the list of objects in the bucket, select the Public link checkbox next to the pixel you uploaded.

    Making the pixel public

Set up the load balancer

You now have a Cloud Storage bucket with a single, invisible, publicly accessible pixel. Next, you set up a way to log all the requests made to the bucket by creating an HTTP load balancer in front of that bucket.

  1. In the GCP Console, go to the Load balancing page:

    OPEN LOAD BALANCING PAGE

  2. Click Create Load Balancer.

  3. In the HTTP(S) Load Balancing section, click Start configuration .

  4. Set up a backend for your Cloud Storage bucket. The results are similar to the following image. For more information about creating a backend bucket, see the backend bucket documentation

    Backend configuration

  5. Set up host and path as needed or leave as-is for a basic configuration.

  6. Click Create. You should have an environment similar to the following:

    Bucket configuration

Collecting logs

You collect logs by using Stackdriver Logging export. In this case, you want to export the data directly to BigQuery. To set up exporting, you need to create a dataset to receive the logging data, and then set up the export rules.

Create a receiving dataset in BigQuery

  1. Go to BigQuery:

    OPEN BigQuery

  2. Click the arrow next to your project name.

  3. Click Create new dataset.

  4. For the name, enter gcs_pixel_tracking_analytics.

Set up the export

  1. In the GCP Console, go to the Exports page:

    OPEN THE EXPORTS PAGE

  2. Add a filter for your load balancer. Replace [YOUR_LB_NAME] with the name of your load balancer.

    resource.type = http_load_balancer AND resource.labels.url_map_name = "[YOUR_LB_NAME]"
    
  3. Click Create Export.

  4. Enter a Sink Name.

  5. From the Sink Service list, select BigQuery.

  6. From the Sink Destination list, select the dataset that you created previously.

  7. Click Create Sink.

    Creating the sink

Creating sample data

You now have a pixel ready to be served, but nothing is better than seeing it in action. Of course, your pixel isn't yet on a site that's getting thousands of pageviews. To analyze some traffic with BigQuery and show how this works at scale, you can create some sample data by using custom-made parameters added to the pixel URL.

To do so, you can leverage Vegeta. The tutorial to set up a load testing environment is in GitHub. The load testing sends requests for the pixel URL by adding random values to the URL parameters, as follows:

GET http://[YOUR_IP_ADDRESS]/pixel.png?[YOUR_PARAMETERS]

The parameters might look like the following example:

uid=19679&pn=checkout&purl=http%3A%2F%2Fexample.com%2Fpage&e=pl&pr=prod1;prod2:

In the preceding example:

  • uid is the user ID of the visiting customer.
  • purl is the page URL that they are visiting.
  • e is the event.
  • pr is a list of products that they have in their shopping cart at that time.

Analyzing logs

There are various way to analyze data in BigQuery. This tutorial analyzes the logs through the BigQuery web UI.

The following sections show commonly used queries for the pixel-tracking scenario.

Top 5 returning identified customers

The following query lists user IDs (as uid) and the count of requests made to the URL that hosts the pixel (as c_uid), for each ID. The query limits the results to the 5 highest counts, in descending order.

SELECT
  count(REGEXP_EXTRACT(httpRequest.requestUrl, r"^.+uid=([0-9]*)")) as c_uid,
  REGEXP_EXTRACT(httpRequest.requestUrl, r"^.+uid=([0-9]*)") as uid
FROM
  `YOUR_PROJECT.YOUR_DATASET.request_*`
GROUP BY uid
ORDER BY c_uid DESC
LIMIT 5

The results are as follows:

Bucket configuration

Top 5 products

In this example, the parameter string contains pr=product1;product2;product3. The following query leverages BigQuery arrays to count the appearance of each product across all publishers so you can know which products attracted the interest of the visitor.

SELECT
  DATE(timestamp) day,
  product,
  count(product) c_prod
FROM
  `[PROJECT_ID].gcs_pixel_tracking_analytics.[TABLE_ID]`
CROSS JOIN UNNEST(SPLIT(REGEXP_EXTRACT(httpRequest.requestUrl, r"^.+pr=(.*)"), ";")) as product
GROUP By product, day
ORDER by c_prod desc

Replace [PROJECT_ID] and [TABLE_ID] with appropriate values.

You can perform additional analytics by transforming the data, saving it to another BigQuery table, and creating dashboards by using Google Data Studio, for example.

Load testing

If you are interested in load testing your setup, there is a GitHub repository that contains a list of custom-made URLs. The test reaches 100,000 QPS, but can be done for higher demand as well, if you need more.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

Delete the project

  1. In the GCP Console, go to the Projects page.

    Go to the Projects page

  2. In the project list, select the project you want to delete and click Delete .
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete individual resources

Follow these steps to delete individual resources, instead of deleting the whole project.

Delete the Cloud Storage bucket

  1. In the GCP Console, go to the Cloud Storage Browser page.

    Go to the Cloud Storage Browser page

  2. Click the checkbox for the bucket you want to delete.
  3. Click Delete to delete the bucket.

Delete the BigQuery datasets

  1. Open the BigQuery web UI.

    OPEN BigQuery

  2. Select the BigQuery datasets you created during the tutorial.

What's next

  • Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.
Was this page helpful? Let us know how we did:

Send feedback about...