This tutorial explains how to set up a pixel tracking infrastructure. This tutorial isn't about setting up a website. Rather, it shows you how to make a pixel available through a URL that can then be embedded in a publisher's property, such as a web page.
The example infrastructure offers three main features. These features include:
- Serving: This requires a scalable and fast object storage that can sometimes serve several hundreds of thousands requests per second. This tutorial shows how to use Cloud Storage.
- Collecting: This part catches HTTP requests made for the pixel. This tutorial uses the Compute Engine HTTP load balancer, leveraging Cloud Logging.
- Analyzing: Pixels are loaded every time a user visits a page. These requests can create hundreds of thousands of events per second. This tutorial uses BigQuery to help make sense of all this data, including the parameters.
In this tutorial, you build:
- A pixel being served at
- A way to record the requests made for the pixel URL.
A testing environment to act as if the pixel was called by a page. This environment calls URLs with various combinations of key-value pairs in the parameters string, including:
- User ID
- Page displayed
- Products in shopping cart
The capabilities of BigQuery logs-export functionality make sure that pixel requests are recorded in BigQuery as they happen.
A few example queries that help you understand:
- Top users on the website.
- Pages that were visited.
- Top products in the shopping cart.
Such an architecture can incur costs as follows:
- Cloud Load Balancing: Charged for one rule and per GB.
- Cloud Logging: See the pricing page for details.
While this example works by using BigQuery, there are various options incurring different costs that you can choose from depending on your needs:
- Export to Cloud Storage: the cheapest way to export pixel logs. Data is exported hourly and you pay only for storage pricing.
Export to BigQuery: makes data available in real-time into ad-hoc analysis tools. Related costs include using:
- BigQuery storage
- BigQuery streaming API
- BigQuery querying
Export to Pub/Sub: This solution, mentioned towards the end of this tutorial, enables real-time aggregation of the data leveraging Dataflow. This approach could limit the BigQuery streaming API cost. Costs occurring with this solution include:
Use the Pricing Calculator to generate a cost estimate based on your projected usage.
Before you begin
In the Google Cloud Console, go to the project selector page.
Select or create a Google Cloud project.
Setting up the pixel serving
Create a bucket
You can create a bucket through the Google Cloud Console or with the command-line
This tutorial assumes you use a bucket name similar to:
Where you replace
[YOUR_PREFIX] with a unique string.
To create a bucket:
- In the Cloud Console, go to the Cloud Storage Browser page.
- Click Create bucket.
- On the Create a bucket page, enter your bucket information. To go to the next
step, click Continue.
- For Name your bucket, enter a name that meets the bucket naming requirements.
For Choose where to store your data, do the following:
- Select a Location type option.
- Select a Location option.
- For Choose a default storage class for your data, select a storage class.
- For Choose how to control access to objects, select an Access control option.
- For Advanced settings (optional), specify an encryption method, a retention policy, or bucket labels.
- Click Create.
Create a Cloud Storage bucket:
gsutil mb gs://BUCKET_NAMEReplace BUCKET_NAME with a bucket name that meets the bucket naming requirements.
Update bucket permissions
After you create the bucket, the bucket is listed in the Cloud Console. Next, you make the bucket publicly accessible.
In the Cloud Console, click Moremore_vert to the right of your bucket row, and select Edit bucket permissions.
In the Add users field, enter
In the Select a role list, select Reader.
Upload the pixel
You can copy a pixel directly from Google's public Cloud Storage bucket:
gsutil cp gs://solutions-public-assets/pixel-tracking/pixel.png gs://[YOUR_PREFIX]-gcs-pixel-tracking
Alternatively, you can create a pixel locally and then upload it in the Cloud Console.
- In the list of buckets, click
- Click Upload Files, select the files you want to upload in the dialog that appears, and then click Open.
In the list of objects in the bucket, select the Public link checkbox next to the pixel you uploaded.
Set up the load balancer
You now have a Cloud Storage bucket with a single, invisible, publicly accessible pixel. Next, you set up a way to log all the requests made to the bucket by creating an HTTP load balancer in front of that bucket.
In the Cloud Console, go to the Load balancing page:
Click Create Load Balancer.
In the HTTP(S) Load Balancing section, click Start configuration .
Set up a backend for your Cloud Storage bucket. The results are similar to the following image. For more information about creating a backend bucket, see the backend bucket documentation
Set up host and path as needed or leave as-is for a basic configuration.
Click Create. You should have an environment similar to the following:
You collect logs by using Cloud Logging export. In this case, you want to export the data directly to BigQuery. To set up exporting, you need to create a dataset to receive the logging data, and then set up the export rules.
Create a receiving dataset in BigQuery
Go to BigQuery:
Click the arrow next to your project name.
Click Create new dataset.
For the name, enter
Set up the export
In the Cloud Console, go to the Exports page:
Add a filter for your load balancer. Replace
[YOUR_LB_NAME]with the name of your load balancer.
resource.type = http_load_balancer AND resource.labels.url_map_name = "[YOUR_LB_NAME]"
Click Create Export.
Enter a Sink Name.
From the Sink Service list, select BigQuery.
From the Sink Destination list, select the dataset that you created previously.
Click Create Sink.
Creating sample data
You now have a pixel ready to be served, but nothing is better than seeing it in action. Of course, your pixel isn't yet on a site that's getting thousands of pageviews. To analyze some traffic with BigQuery and show how this works at scale, you can create some sample data by using custom-made parameters added to the pixel URL.
To do so, you can leverage Vegeta. The tutorial to set up a load testing environment is in GitHub. The load testing sends requests for the pixel URL by adding random values to the URL parameters, as follows:
The parameters might look like the following example:
In the preceding example:
uidis the user ID of the visiting customer.
purlis the page URL that they are visiting.
eis the event.
pris a list of products that they have in their shopping cart at that time.
There are various way to analyze data in BigQuery. This tutorial analyzes the logs through the BigQuery web UI.
The following sections show commonly used queries for the pixel-tracking scenario.
Top 5 returning identified customers
The following query lists user IDs (as
uid) and the count of requests made to
the URL that hosts the pixel (as
c_uid), for each ID. The query limits the
results to the 5 highest counts, in descending order.
SELECT count(REGEXP_EXTRACT(httpRequest.requestUrl, r"^.+uid=([0-9]*)")) as c_uid, REGEXP_EXTRACT(httpRequest.requestUrl, r"^.+uid=([0-9]*)") as uid FROM `YOUR_PROJECT.YOUR_DATASET.request_*` GROUP BY uid ORDER BY c_uid DESC LIMIT 5
The results are as follows:
Top 5 products
In this example, the parameter string contains
The following query leverages
to count the appearance of each product across all publishers so you can know
which products attracted the interest of the visitor.
SELECT DATE(timestamp) day, product, count(product) c_prod FROM `[PROJECT_ID].gcs_pixel_tracking_analytics.[TABLE_ID]` CROSS JOIN UNNEST(SPLIT(REGEXP_EXTRACT(httpRequest.requestUrl, r"^.+pr=(.*)"), ";")) as product GROUP By product, day ORDER by c_prod desc
[TABLE_ID] with appropriate values.
You can perform additional analytics by transforming the data, saving it to another BigQuery table, and creating dashboards by using Google Data Studio, for example.
If you are interested in load testing your setup, there is a GitHub repository that contains a list of custom-made URLs. The test reaches 100,000 QPS, but can be done for higher demand as well, if you need more.
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
- In the Cloud Console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete individual resources
Follow these steps to delete individual resources, instead of deleting the whole project.
Delete the Cloud Storage bucket
- In the Cloud Console, go to the Cloud Storage Browser page.
- Click the checkbox for the bucket that you want to delete.
- To delete the bucket, click Delete.
Delete the BigQuery datasets
Open the BigQuery web UI.
Select the BigQuery datasets you created during the tutorial.
- Explore reference architectures, diagrams, tutorials, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.