This tutorial explains how to set up a pixel tracking infrastructure. This tutorial is not about setting up a website. Rather, it shows you how to make a pixel available through a URL that can then be embedded in a publisher's property, such as a web page.
The example infrastructure offers three main features. These features include:
- Serving: This requires a scalable and fast object storage that can sometimes serve several hundreds of thousands requests per second. This tutorial shows how to use Google Cloud Storage.
- Collecting: This part should be able to catch every HTTP request made for the pixel. This tutorial uses the Google Compute Engine HTTP load balancer, leveraging Stackdriver Logging.
- Analysing: Pixels are loaded every time a user visits a page. These requests can create hundreds of thousands of events per second. This tutorial uses Google BigQuery to help make sense of all this data, including the parameters.
By the end of this tutorial, you will have built:
- A pixel being served at
- A way to record the requests made for the pixel URL.
- A testing environment to act as if the pixel was called by a page. This will
call URLs with various combinations of key-value pairs in the parameters
- User ID.
- Page displayed.
- Products in shopping cart.
- The capabilities of Google BigQuery logs-export functionality make sure that pixel requests are recorded in BigQuery as they happen.
- A few example queries that help you understand:
- Top users on the website.
- Pages that were visited.
- Top products in the shopping cart.
Such an architecture can incur costs as follows:
- Load balancer: Charged for one rule and per GB.
- Stackdriver Logging. See the pricing page for details.
While this example works by using BigQuery, there are various options incurring different costs that you can choose from depending on your needs:
- Export to Google Cloud Storage: the cheapest way to export pixel logs. Data is exported hourly and you pay only for storage pricing.
- Export to BigQuery: makes data available in real-time into ad-hoc analysis
tools. Related costs include using:
- BigQuery Storage
- BigQuery streaming API
- BigQuery querying
- Export to Google Cloud Pub/Sub: This solution, mentioned towards the end of
this tutorial, would enable real time aggregation of the data leveraging
Google Cloud Dataflow. This approach could limit the BigQuery streaming API
cost. Costs occurring with this solution include:
- Cloud Pub/Sub
- Cloud Dataflow
Use the Pricing Calculator to generate a cost estimate based on your projected usage.
Before you begin
In the GCP Console, go to the Manage resources page.
Select a project, or click Create Project to create a new GCP project.
In the dialog, name your project. Make a note of your generated project ID.
Click Create to create a new project.
Setting up the pixel serving
Creating a bucket
Creating a bucket can be done either through the UI or with gsutil command line. This tutorial assumes you use a bucket name similar to:
Where you replace
[YOUR_PREFIX] with some unique string.
To create a bucket:
- In the GCP Console, go to the Cloud Storage browser.
- Click Create bucket.
- In the Create bucket dialog, specify the following attributes:
- Click Create.
Create a new Cloud Storage bucket. Replace
[YOUR_BUCKET]with a bucket name that meets the bucket name requirements:
gsutil mb gs://[YOUR_BUCKET]
After the bucket is created, you should see the bucket in your list. Next, make sure it is publicly accessible. In the Cloud Platform Console:
- Click the three dots on the right hand side of your bucket row.
- Select Edit bucket permissions.
- Add "allUsers" as "Reader".
Uploading the pixel
You can copy a pixel directly from Google's public Cloud Storage bucket:
gsutil cp gs://solutions-public-assets/pixel-tracking/pixel.png gs://[YOUR_PREFIX]-gcs-pixel-tracking
Remember to replace
[YOUR_PREFIX]-gcs-pixel-tracking with the name of your
Alternatively, you can create a pixel locally and then upload it by using the Upload Files button in your bucket.
Make sure to select the Public link checkbox.
Setting up the load balancer
You should now have a Cloud Storage bucket with a single, invisible, publicly accessible, pixel. The next thing that you need to do is to set up a way to log all the requests made to it. Create an HTTP load balancer in front of that bucket.
Go to Networking > Load balancing:
Click Create Load Balancer.
- Click Start configuration in the HTTP(S) Load Balancing tile.
Set up a backend for your Cloud Storage bucket as shown here:
Set up host and path as needed or leave as-is for a basic configuration.
- Click Create.
When done, you should have an environment similar to the following screenshot:
You collect logs by using Stackdriver Logging export. In this case, you want to export the data directly to BigQuery. To set up exporting, you need to create a dataset to receive the logging data, and then set up the export rules.
Creating a receiving dataset in BigQuery
This can easily be done through the BigQuery web UI, as follows:
Go to the BigQuery web UI:
Click the arrow beside your project name.
- Click Create new dataset.
- For the name, enter "gcs_pixel_tracking_analytics".
Setting up the export
Go to the Stackdriver Logging page in the Cloud Platform Console:
Add a filter for your load balancer. Replace
[YOUR_LB_NAME]with the name of your load balancer:
resource.type = http_load_balancer AND resource.labels.url_map_name = "[YOUR_LB_NAME]"
Click Create Export, at the top of the page.
- Enter a sink name.
- Choose BigQuery as a Sink Service.
- Choose the dataset that you created previously as a Sink Destination.
- Click the Create Sink button.
Creating fake data
You now have a pixel ready to be served, but nothing is better than seeing it in action. Of course, your pixel isn't yet on a site that's getting thousands of pageviews. To analyse some traffic with BigQuery and show how this works at scale, you can create some fake data by using custom-made parameters added to the pixel URL.
To do so, you can leverage Vegeta. The tutorial to set up a load testing environment is in GitHub. The load testing sends requests for the pixel URL by adding random values to the URL parameters, as follows:
The parameters might look like the following example:
In the preceding example:
uidis the user ID of the visiting customer.
purlis the page URL that they are visiting.
eis the event.
pris a list of products that they have in their shopping cart at that time.
There are various way to analyze data in BigQuery. This tutorial analyzes the logs through the BigQuery web UI.
The following sections show commonly used queries for the pixel-tracking scenario.
Top 5 returning identified customers
The following query lists user IDs (as
uid) and the count of requests made to
the URL that hosts the pixel (as
c_uid), for each ID. The query limits the
results to the 5 highest counts, in descending order.
SELECT count(REGEXP_EXTRACT(httpRequest.requestUrl, r"^.+uid=([0-9]*)")) as c_uid, REGEXP_EXTRACT(httpRequest.requestUrl, r"^.+uid=([0-9]*)") as uid FROM `YOUR_PROJECT.YOUR_DATASET.request_*` GROUP BY uid ORDER BY c_uid DESC LIMIT 5
Top 5 products
In this example, the parameter string contains
and so on. It might be interesting to know which products attracted the interest
of the visitor. The following query leverages BigQuery
in order to count the appearance of each product across all publishers.
SELECT DATE(timestamp) day, product, count(product) c_prod FROM `[PROJECT_ID].gcs_pixel_tracking_analytics.[TABLE_ID]` CROSS JOIN UNNEST(SPLIT(REGEXP_EXTRACT(httpRequest.requestUrl, r"^.+pr=(.*)"), ";")) as product GROUP By product, day ORDER by c_prod desc
[TABLE_ID] with appropriate values.
You can perform additional analytics by transforming the data, saving it to another BigQuery table, and creating dashboards by using Google Data Studio, for example.
If you are interested in load testing your setup, we have provided a GitHub repository that contains a list of custom made URLs. The test reaches 100,000 QPS, but can be done for higher demand as well, if you need more.
The code is available in the GitHub repo.
To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:
Deleting the project
The easiest way to delete all resources is simply to delete the project you created for this tutorial. If you don't want to delete the project, follow the instructions in the following section.
- In the GCP Console, go to the Projects page.
- In the project list, select the project you want to delete and click Delete project.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Deleting individual resources
Follow these steps to delete individual resources, instead of deleting the whole project.
Deleting the storage bucket
- In the GCP Console, go to the Cloud Storage browser.
- Click the checkbox next to the bucket you want to delete.
- Click the Delete button at the top of the page to delete the bucket.
Deleting the BigQuery datasets
Open the BigQuery web UI.
Select the BigQuery dataset(s) you created during the tutorial.
- Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.