How to Do Serverless Pixel Tracking

This tutorial explains how to set up a pixel tracking infrastructure. This tutorial is not about setting up a website. Rather, it shows you how to make a pixel available through a URL that can then be embedded in a publisher's property, such as a web page.

The example infrastructure offers three main features. These features include:

  • Serving: This requires a scalable and fast object storage that can sometimes serve several hundreds of thousands requests per second. This tutorial shows how to use Google Cloud Storage.
  • Collecting: This part should be able to catch every HTTP request made for the pixel. This tutorial uses the Google Compute Engine HTTP load balancer, leveraging Stackdriver Logging.
  • Analysing: Pixels are loaded every time a user visits a page. These requests can create hundreds of thousands of events per second. This tutorial uses Google BigQuery to help make sense of all this data, including the parameters.

Objectives

By the end of this tutorial, you will have built:

  • A pixel being served at http://[YOUR_IP_ADDRESS]/pixel.png?[PARAMETERS].
  • A way to record the requests made for the pixel URL.
  • A testing environment to act as if the pixel was called by a page. This will call URLs with various combinations of key-value pairs in the parameters string including:
    • User ID.
    • Page displayed.
    • Products in shopping cart.
  • The capabilities of Google BigQuery logs-export functionality make sure that pixel requests are recorded in BigQuery as they happen.
  • A few example queries that help you understand:
    • Top users on the website.
    • Pages that were visited.
    • Top products in the shopping cart.

Costs

Such an architecture can incur costs as follows:

While this example works by using BigQuery, there are various options incurring different costs that you can choose from depending on your needs:

  • Export to Google Cloud Storage: the cheapest way to export pixel logs. Data is exported hourly and you pay only for storage pricing.
  • Export to BigQuery: makes data available in real-time into ad-hoc analysis tools. Related costs include using:
    • BigQuery Storage
    • BigQuery streaming API
    • BigQuery querying
  • Export to Google Cloud Pub/Sub: This solution, mentioned towards the end of this tutorial, would enable real time aggregation of the data leveraging Google Cloud Dataflow. This approach could limit the BigQuery streaming API cost. Costs occurring with this solution include:
    • Cloud Pub/Sub
    • Cloud Dataflow

Use the Pricing Calculator to generate a cost estimate based on your projected usage.

Before you begin

  1. In the Cloud Platform Console, go to the Projects page.

    Go to the Projects page

  2. Select a project, or click Create Project to create a new Cloud Platform project.
  3. In the dialog, name your project. Make a note of your generated project ID.
  4. Click Create to create a new project.

Setting up the pixel serving

Creating a bucket

Creating a bucket can be done either through the UI or with gsutil command line. This tutorial assumes you use a bucket name similar to:

    [YOUR_PREFIX]-gcs-pixel-tracking

Where you replace [YOUR_PREFIX] with some unique string.

To create a bucket:

Cloud Platform Console

  1. In the Cloud Platform Console, go to the Cloud Storage browser.

    Go to the Cloud Storage browser

  2. Click Create bucket.
  3. In the Create bucket dialog, specify the following attributes:
  4. Click Create.

Command line

    Create a new Cloud Storage bucket. Replace [YOUR_BUCKET] with a bucket name that meets the bucket name requirements:
    gsutil mb gs://[YOUR_BUCKET]

After the bucket is created, you should see the bucket in your list. Next, make sure it is publicly accessible. In the Cloud Platform Console:

  1. Click the three dots on the right hand side of your bucket row.
  2. Select Edit bucket permissions.
  3. Add "allUsers" as "Reader".

Setting permissions on the bucket

Permissions set on the bucket

Uploading the pixel

You can copy a pixel directly from Google's public Cloud Storage bucket:

gsutil cp gs://solutions-public-assets/pixel-tracking/pixel.png  gs://[YOUR_PREFIX]-gcs-pixel-tracking

Remember to replace [YOUR_PREFIX]-gcs-pixel-tracking with the name of your bucket.

Alternatively, you can create a pixel locally and then upload it by using the Upload Files button in your bucket.

Make sure to select the Public link checkbox.

Making the pixel public

Setting up the load balancer

You should now have a Cloud Storage bucket with a single, invisible, publicly accessible, pixel. The next thing that you need to do is to set up a way to log all the requests made to it. Create an HTTP load balancer in front of that bucket.

  1. Go to Networking > Load balancing:

    OPEN LOAD BALANCING

  2. Click Create Load Balancer.

  3. Click Start configuration in the HTTP(S) Load Balancing tile.
  4. Set up a backend for your Cloud Storage bucket as shown here:

    Backend configuration

  5. Set up host and path as needed or leave as-is for a basic configuration.

  6. Click Create.

When done, you should have an environment similar to the following screenshot:

Bucket configuration

Collecting logs

You collect logs by using Stackdriver Logging export. In this case, you want to export the data directly to BigQuery. To set up exporting, you need to create a dataset to receive the logging data, and then set up the export rules.

Creating a receiving dataset in BigQuery

This can easily be done through the BigQuery web UI, as follows:

  1. Go to the BigQuery web UI:

    OPEN BIGQUERY

  2. Click the arrow beside your project name.

  3. Click Create new dataset.
  4. For the name, enter "gcs_pixel_tracking_analytics".

Setting up the export

  1. Go to the Stackdriver Logging page in the Cloud Platform Console:

    OPEN STACKDRIVER

  2. Add a filter for your load balancer. Replace [YOUR_LB_NAME] with the name of your load balancer:

    resource.type = http_load_balancer AND resource.labels.url_map_name = "[YOUR_LB_NAME]"
    
  3. Click Create Export, at the top of the page.

  4. Enter a sink name.
  5. Choose BigQuery as a Sink Service.
  6. Choose the dataset that you created previously as a Sink Destination.
  7. Click the Create Sink button.

Creating the sink

Creating fake data

You now have a pixel ready to be served, but nothing is better than seeing it in action. Of course, your pixel isn't yet on a site that's getting thousands of pageviews. To analyse some traffic with BigQuery and show how this works at scale, you can create some fake data by using custom-made parameters added to the pixel URL.

To do so, you can leverage Vegeta. The tutorial to set up a load testing environment is in GitHub. The load testing sends requests for the pixel URL by adding random values to the URL parameters, as follows:

GET http://[YOUR_IP_ADDRESS]/pixel.png?[YOUR_PARAMETERS]

The parameters might look like the following example:

uid=19679&pn=checkout&purl=http%3A%2F%2Fexample.com%2Fpage&e=pl&pr=prod1;prod2:

In the preceding example:

  • uid is the user ID of the visiting customer.
  • purl is the page URL that they are visiting.
  • e is the event.
  • pr is a list of products that they have in their shopping cart at that time.

Analyzing logs

There are various way to analyze data in BigQuery. This tutorial analyzes the logs through the BigQuery web UI.

The following sections show commonly used queries for the pixel-tracking scenario.

Top 5 returning identified customers

The following query lists user IDs (as uid) and the count of requests made to the URL that hosts the pixel (as c_uid), for each ID. The query limits the results to the 5 highest counts, in descending order.

SELECT
  count(REGEXP_EXTRACT(httpRequest.requestUrl, r"^.+uid=([0-9]*)")) as c_uid,
  REGEXP_EXTRACT(httpRequest.requestUrl, r"^.+uid=([0-9]*)") as uid
FROM
  `YOUR_PROJECT.YOUR_DATASET.request_*`
GROUP BY uid
ORDER BY c_uid DESC
LIMIT 5

Results

Bucket configuration

Top 5 products

In this example, the parameter string contains pr=product1;product2;product3, and so on. It might be interesting to know which products attracted the interest of the visitor. The following query leverages BigQuery arrays in order to count the appearance of each product across all publishers.

SELECT
  DATE(timestamp) day,
  product,
  count(product) c_prod
FROM
  `[PROJECT_ID].gcs_pixel_tracking_analytics.[TABLE_ID]`
CROSS JOIN UNNEST(SPLIT(REGEXP_EXTRACT(httpRequest.requestUrl, r"^.+pr=(.*)"), ";")) as product
GROUP By product, day
ORDER by c_prod desc

Replace [PROJECT_ID] and [TABLE_ID] with appropriate values.

You can perform additional analytics by transforming the data, saving it to another BigQuery table, and creating dashboards by using Google Data Studio, for example.

Load testing

If you are interested in load testing your setup, we have provided a GitHub repository that contains a list of custom made URLs. The test reaches 100,000 QPS, but can be done for higher demand as well, if you need more.

The code is available in the GitHub repo.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

Deleting the project

The easiest way to delete all resources is simply to delete the project you created for this tutorial. If you don't want to delete the project, follow the instructions in the following section.

  1. In the Cloud Platform Console, go to the Projects page.

    Go to the Projects page

  2. In the project list, select the project you want to delete and click Delete project. After selecting the checkbox next to the project name, click
      Delete project
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Deleting individual resources

Follow these steps to delete individual resources, instead of deleting the whole project.

Deleting the storage bucket

  1. In the Cloud Platform Console, go to the Cloud Storage browser.

    Go to the Cloud Storage browser

  2. Click the checkbox next to the bucket you want to delete.
  3. Click the Delete button at the top of the page to delete the bucket.

Deleting the BigQuery datasets

  1. Open the BigQuery web UI.

    OPEN BIQUERY

  2. Select the BigQuery dataset(s) you created during the tutorial.

What's next

  • Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.

Send feedback about...