Google Cloud Big Data and Machine Learning Blog

Innovation in data processing and machine learning technology

How to do cross-platform analytics with Google BigQuery

Thursday, April 13, 2017

By Tahir Fayyaz, Sales Engineer and Arun Venkatesan, Developer Advocate

Learn how to analyze web (in Google Analytics 360) and mobile (in Firebase Analytics) data together using BigQuery.

It's a common requirement for developers to track user behavior and usage patterns across mobile and web platforms. For example, a developer may need to track users who sign up on a mobile app and complete a purchase on the web app. Another may need to simply track the number of web users that also use the mobile app. Yet another one may want to see the number of pageviews for comparable screens on both the web and mobile apps.

In the context of Google Cloud, Google Analytics 360 offers a comprehensive suite of web analytics products for larger businesses and enterprises, while Firebase Analytics, a no-cost tool, has been built ground-up for mobile app analytics and measurement. So, how do you combine them to do cross-platform analytics reporting like the examples we described? Let’s find out!

Google BigQuery, Google’s cloud-native data warehouse, facilitates some of this cross-platform tracking. In order to view the data in BigQuery:

  1. Google Analytics 360 needs to be linked to BigQuery.
  2. Firebase needs to be linked to BigQuery.

Once linked, BigQuery creates a dataset for each Google Analytics 360 view linked and for each Firebase app linked. These datasets contain tables that each contain a day’s worth of data.

In order to determine a unique user across both GA and Firebase in BigQuery, you need to set a custom user id value in both Google Analytics 360 and Firebase. However, user id based joins are only possible when the user logs in on all devices with the same user id (also sometimes known as customer ID or CRM ID) defined by your backend platform / database. For apps or websites that don't require a login and hence cannot set a custom user id in Google Analytics 360 and Firebase, it makes more sense to analyze aggregated metrics such as total screen views for screens on the web and the corresponding views in the mobile version.

User ID-based joins

There are two options for storing a custom user ID in Google Analytics 360:
  1. As a session or user scoped custom dimension, which is stored in the customDimensions.index and customDimensions.value fields in BigQuery.
  2. As a User ID field in GA — if you use this option, you must link your User ID view to BigQuery and therefore will only have authenticated user data in BigQuery as you can only link one view per property to BigQuery.

In our examples, we'll be making use of Option 1, using a custom dimension to store the User ID in Google Analytics 360.

Firebase’s custom user id value can be set using the setUserId API and is stored in the user_dim.user_id field in BigQuery.

Once the user id is set in both Google Analytics 360 and Firebase, a number of queries can be performed to determine user behavior across both Google Analytics 360 (web) and Firebase (mobile).

In this post, we're using Standard SQL for all our queries. If you've previously used Legacy SQL for Google Analytics 360 or Firebase Analytics, please read our guide on migrating from Legacy SQL.

Example 1: Determine the total number of unique users in the past 30 days, across both the mobile apps (firebase IOS + Android) and the web app (Google Analytics)

In this example, we make use of a customDimensions.index and customDimensions.value field in Google Analytics 360 and the user_dim.user_id field in Firebase. In our example, we're using the custom dimension with index 12 in Google Analytics 360. You can get the index number from the admin section of Google Analytics, where you set up and manage custom dimensions. For your queries, replace it with the relevant index where you've stored the user ID.

In the following query, we're combining data from a single Google Analytics 360 property and two Firebase Apps (iOS and Android). We also have a CASE WHEN statement to check if a user ID is set and if not, we make use of the default fullVisitorId (which is a hashed version of the client ID) in Google Analytics 360 or the default app_info.app_instance_id in Firebase analytics.

Query

#standardSQL
SELECT COUNT(DISTINCT user_id) AS users
FROM (
SELECT
CASE 
    WHEN userId IS NOT NULL
      THEN userId
    ELSE fullVisitorId
  END AS user_id
FROM (
SELECT 
  fullVisitorId,
  (SELECT MAX(IF(index=12, value, NULL))
   FROM UNNEST(customDimensions)) AS userId
FROM `project_id:datasetid.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
)

UNION ALL

SELECT 
  CASE 
    WHEN user_dim.user_id IS NOT NULL
      THEN user_dim.user_id
    ELSE user_dim.app_info.app_instance_id
  END AS user_id
FROM 
    `project_id.com_game_example_IOS.app_events_*`
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  
UNION ALL

SELECT 
  CASE 
    WHEN user_dim.user_id IS NOT NULL
      THEN user_dim.user_id
    ELSE user_dim.app_info.app_instance_id
  END AS user_id
FROM 
    `project_id.com_game_example_ANDROID.app_events_*`
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  
)
Result

Example 2: Determine daily unique users in the past 30 days, broken down by date, across both the mobile apps (Firebase IOS + Android) and the web app (Google Analytics)

To break this report down by date, we need to select the dates for Google Analytics 360 and Firebase data. Google Analytics 360 provides a date column for each session. For Firebase, we can extract this from the event_dim.timestamp_micros field or the event_dim.date field, which requires us to flatten the data using the UNNEST function.

Note: the event_dim.date field was added to the schema from November 2016 onwards, so you can use the event_dim.timestamp_micros as we've shown in the examples below, if you query data from before that date.

Standard SQL - Using WITH clause

Writing long queries can get quite complex, and so for Standard SQL queries, we'll use the WITH clause, which allows us to create one or more named subqueries. We can then reference these named queries in our final SELECT statement.

Query

WITH web_users_event_timestamp AS (
SELECT
CASE 
    WHEN userId IS NOT NULL
      THEN userId
    ELSE fullVisitorId
  END AS user_id,
event_date
FROM (
SELECT 
  date as event_date,
  fullVisitorId,
  (SELECT MAX(IF(index=12, value, NULL)) FROM UNNEST(customDimensions)) AS userId
FROM `project_id.datasetid.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
)
),

android_users_event_timestamp AS (
SELECT 
  CASE 
    WHEN user_dim.user_id IS NOT NULL
      THEN user_dim.user_id
    ELSE user_dim.app_info.app_instance_id
  END AS user_id,
  e.timestamp_micros as eventTimestamp
FROM 
    `project_id.com_game_example_ANDROID.app_events_*`,
     UNNEST(event_dim) as e
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
),

ios_users_event_timestamp AS (
SELECT 
  CASE 
    WHEN user_dim.user_id IS NOT NULL
      THEN user_dim.user_id
    ELSE user_dim.app_info.app_instance_id
  END AS user_id,
  e.timestamp_micros as eventTimestamp
FROM 
    `project_id.com_game_example_IOS.app_events_*`,
     UNNEST(event_dim) as e
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
),
  
union_all_users_event_timestamp AS (
SELECT 
user_id,
FORMAT_TIMESTAMP('%Y%m%d',TIMESTAMP_MICROS(eventTimestamp)) as event_date 
FROM (
  SELECT
  user_id,
  eventTimestamp
  FROM ( SELECT * FROM android_users_event_timestamp)
  UNION ALL 
  ( SELECT * FROM ios_users_event_timestamp)
) 
UNION ALL 
(SELECT * FROM web_users_event_timestamp)
)

SELECT 
event_date AS date,
COUNT(DISTINCT user_id) AS users
FROM union_all_users_event_timestamp
GROUP BY date
ORDER BY date ASC
Result

Example 3: Cross-device usage overlap

This query returns the total number of users on a given device group and the ratio of this number to the overall number of users across different device groups. The results are similar to the cross device overlap report in Google Analytics 360.

In this query, we use the user ID to determine which users have logged in on different devices and define a group of the devices on which they've logged in. For example, if a user logged in on the web, on the desktop and on their Android device, they'll fall into the group "web - desktop, app - android".

We can see the total users that fall into each device category group, and what percentage that device category group represents of the total users using count divided by the total SUM as a WINDOW function in standard SQL and the RATIO_TO_REPORT WINDOW function in Legacy SQL.

Query

#standardSQL
WITH android_users AS (
SELECT
  CONCAT('app', ' - ', LOWER(user_dim.app_info.app_platform)) AS device_category,
  CASE
    WHEN user_dim.user_id IS NOT NULL THEN user_dim.user_id
    ELSE user_dim.app_info.app_instance_id
  END AS user_id
FROM
  `project_id.com_game_example_ANDROID.app_events_*`
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
),

ios_users AS (
SELECT
  CONCAT('app', ' - ', LOWER(user_dim.app_info.app_platform)) AS device_category,
  CASE
    WHEN user_dim.user_id IS NOT NULL THEN user_dim.user_id
    ELSE user_dim.app_info.app_instance_id
  END AS user_id
FROM
  `project_id.com_game_example_IOS.app_events_*`
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
),

web_users AS (
SELECT
  CONCAT('web', ' - ', LOWER(deviceCategory)) AS device_category,
  CASE
    WHEN customDimension12 IS NOT NULL THEN customDimension12
    ELSE fullVisitorId
  END AS user_id
FROM (
  SELECT
    fullVisitorId,
    device.deviceCategory AS deviceCategory,
    (
    SELECT
      MAX(IF(index=12, value, NULL))
    FROM
      UNNEST(customDimensions)) AS customDimension12
  FROM
    `project_id.datasetid.ga_sessions_*`
  WHERE
    _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
    AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) )
),

union_users AS (
SELECT
  device_category,
  user_id
FROM (SELECT * FROM web_users)
UNION ALL (SELECT * FROM ios_users)
UNION ALL (SELECT * FROM android_users) 
),

dedupe_duplicate_devices AS (
SELECT
  user_id,
  device_category
FROM ( SELECT * FROM union_users )
GROUP BY
  user_id,
  device_category
),

concat_device_category AS (
SELECT
  user_id,
  STRING_AGG(device_category, ', ') device_category_group
FROM ( SELECT * FROM dedupe_duplicate_devices )
GROUP BY
  user_id 
)

SELECT
  device_category_group,
  COUNT(*) AS users,
  ROUND( (COUNT(*) / SUM(COUNT(*)) OVER ()), 2) AS ratio_of_total_users
FROM ( SELECT * FROM concat_device_category )
GROUP BY
  device_category_group
ORDER BY
  users DESC
Result

Example 4: Cross-device funnel reports

Next, let’s see how you might figure out what’s causing users to spend money in your app. In order to answer this question, we'll build a sequence of events per user that leads up to that user actually spending money in the app. In effect, we'll build a custom event funnel report.

Step 1: Apply LEAD function to create steps for all users

Here's a query that generates a set of sequences of three events. You can modify this query to include any number of events in the sequence. The PARTITION BY clause breaks input rows into separate partitions, by app instance id in this case. The LEAD function returns the next event based on an ordering criteria, which is the event timestamp, thus creating a series of events ordered by timestamp. This is a WINDOW function, which is a powerful way to analyze your data.

Query

#standardSQL
SELECT
    user_dim.app_info.app_instance_id AS user_id,
    event.timestamp_micros AS event_timestamp,
    event.name AS step1,
    LEAD(event.name , 1) OVER (
      PARTITION BY user_dim.app_info.app_instance_id 
      ORDER BY event.timestamp_micros ASC) as step2,
    LEAD(event.name , 2) OVER (
      PARTITION BY user_dim.app_info.app_instance_id 
      ORDER BY event.timestamp_micros ASC) as step3
FROM
    `project_id.com_game_example_IOS.app_events_20170212`,
    UNNEST(event_dim) AS event
ORDER BY 1, 2 ASC
Result
The results for the Step 1 of our query below show a set of three events for users. As noticed in the interim Result 1, the second event in each row appears as the first event in the next row and so on. User Ids have been masked in the results above.

Step 2: Calculate users for journey that ends with spending virtual currency

In the second step of our query, we simply define our end event as the “spend_virtual_currency” event, since we would like to determine what users did before they spent money in the app. The query below calculates the number of users and the frequency with which this funnel has been navigated. Note how we apply results from Part 1 of our query to the second part of the query below.

Query
#standardSQL
SELECT
  step1, step2, step3,
  COUNT(*) AS funnel_count,
  COUNT(DISTINCT user_id) AS users
FROM (
  SELECT
    user_dim.app_info.app_instance_id AS user_id,
    event.timestamp_micros AS event_timestamp,
    event.name AS step1,
    LEAD(event.name , 1) OVER (
      PARTITION BY user_dim.app_info.app_instance_id 
      ORDER BY event.timestamp_micros ASC) as step2,
    LEAD(event.name , 2) OVER (
      PARTITION BY user_dim.app_info.app_instance_id 
      ORDER BY event.timestamp_micros ASC) as step3
  FROM
    `project_id.com_game_example_IOS.app_events_20170212`,
    UNNEST(event_dim) AS event
  ORDER BY 1, 2 ASC  
)
WHERE step3 = 'spend_virtual_currency'
GROUP BY
    step1, step2, step3
ORDER BY
    funnel_count DESC
Result

In the results above, you can see all the different paths that users took before they spent money in your app. For instance, many users spend money after completing a round, which is great. On the other hand, it also shows that you, the developer, could do more in the app to entice users to spend money after "spinning slots" or "levelling up". The ability to construct closed funnels like the above and dynamically set the start and end events for a funnel offers a powerful way to make data-enabled decisions in your app.

Lets now see a more advanced version of the funnel query above and join with it with our Google Analytics 360 data to create a cross-device funnel in the next example.

Step 3: Create three views for web, iOS app and Android app events

Creating views is a easy and useful way to simplify your queries and make SQL share reusable. (Read the tutorial on creating views.) Create a new data set in the same project to save your views in. Let’s call the new data set cross_device_reporting.

To create this cross-device funnel, we require the event timestamp to be in the same format —​​but Google Analytics 360 and Firebase Analytics provide the event timestamp in different formats. In Google Analytics 360, the hit timestamp is calculated by combining the visitStartTime and hits.time values, while Firebase Analytics uses event_dim.timestamp_micros.

To make the timestamps the same format, we'll convert the Google Analytics 360 timestamp into microseconds using the following logic. This gives us the timestamp in UTC.

visitStartTime*1000000 + hits.time*1000 AS event_timestamp

Use the following queries to create a view for the web, iOS app and android app events.

Google Analytics 360 web events query

Save as view named - ga360_events_last_7days

#standardSQL
SELECT
  CASE 
    WHEN user_id IS NOT NULL
      THEN user_id
    ELSE fullVisitorId
  END AS user_id,
  'web' as platform,
  event_timestamp_micros as event_timestamp,
  event
FROM (
  SELECT 
    fullVisitorId,
    (SELECT MAX(IF(index=12, value, NULL)) FROM UNNEST(ga.customDimensions)) AS user_id,
    hits.eventInfo.eventAction AS event,
    visitStartTime*1000000 + hits.time*1000 AS event_timestamp_micros
  FROM `project_id.datasetid.ga_sessions_*` AS ga, UNNEST(ga.hits) AS hits
  WHERE
    _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY))
    AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
    AND hits.type = 'EVENT'
    #AND hits.eventInfo.eventAction IN ('Percentage', 'Quickview Click')
)

Firebase Analytics Android app events query

Save as view named - fb_android_events_last_7days

#standardSQL
SELECT 
  CASE 
    WHEN user_dim.user_id IS NOT NULL
      THEN user_dim.user_id
    ELSE user_dim.app_info.app_instance_id
  END AS user_id,
  'ios' as platform,
  e.timestamp_micros as event_timestamp,
  e.name as event
FROM 
    `project_id.com_game_example_ANDROID.app_events_*`,
     UNNEST(event_dim) as e
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
#  AND e.name IN ('unlock_achievement','session_start')

Firebase Analytics iOS app events query

Save as view named - fb_ios_events_last_7days

#standardSQL
SELECT 
  CASE 
    WHEN user_dim.user_id IS NOT NULL
      THEN user_dim.user_id
    ELSE user_dim.app_info.app_instance_id
  END AS user_id,
  'ios' as platform,
  e.timestamp_micros as event_timestamp,
  e.name as event
FROM 
    `project_id.com_game_example_IOS.app_events_*`,
     UNNEST(event_dim) as e
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
#  AND e.name IN ('unlock_achievement','session_start')

Step 4: Union data and use LEAD function to create steps for all users

We can now query the data from our views that we created treating them just like tables. In this query, we union all three views and then just as we did in Step 1, use the LEAD function to get all the steps.
#standardSQL
SELECT 
  user_id,
  event_timestamp,
  CONCAT(platform, ' - ', event) AS step1,
  LEAD(CONCAT(platform, ' - ', event), 1) OVER (PARTITION BY user_id ORDER BY event_timestamp ASC) AS step2,
  LEAD(CONCAT(platform, ' - ', event), 2) OVER (PARTITION BY user_id ORDER BY event_timestamp ASC) AS step3
FROM
  (
SELECT * 
FROM `project_id.cross_device_reporting.ga360_events_last_7days`
UNION ALL ( SELECT * FROM `project_id.cross_device_reporting.fb_android_events_last_7days` )
UNION ALL ( SELECT * FROM `project_id.cross_device_reporting.fb_ios_events_last_7days` )
)

Step 5: Aggregate to get the most popular cross device paths

As we're now making use of the user_id we're able to discover the most popular cross-device paths. In this query, we'll filter to the query to only see people who have signed up to a special offer on our website and have then gone on to do something in the app. We do that by applying the WHERE statement earlier in the query to only use the event "currency special offer" from the web, but we want to include all app events.
SELECT
  step1, step2, step3,
  COUNT(*) AS funnel_count,
  COUNT(DISTINCT user_id) AS users
  FROM (
  SELECT
    user_id,
    event_timestamp,
    CONCAT(platform, ' - ', event) AS step1,
    LEAD(CONCAT(platform, ' - ', event), 1) OVER (PARTITION BY user_id ORDER BY event_timestamp ASC) AS step2,
    LEAD(CONCAT(platform, ' - ', event), 2) OVER (PARTITION BY user_id ORDER BY event_timestamp ASC) AS step3
  FROM (
    SELECT *
    FROM `project_id.cross_device_reporting.ga360_events_last_7days`
    UNION ALL ( SELECT * FROM `project_id.cross_device_reporting.fb_android_events_last_7days` )
    UNION ALL ( SELECT * FROM `project_id.cross_device_reporting.fb_ios_events_last_7days` )
  )
  WHERE (event = 'currency special offer' AND platform = 'web')
  OR platform = 'android'
  OR platform = 'ios'
)
WHERE step1 LIKE '%currency special offer%'
GROUP BY
  step1, step2, step3
ORDER BY
  funnel_count DESC
LIMIT 100
Result

Step 6: Calculate percentage going to each step

In this final query, we can build on Step 5 to calculate how many users go to each step using the SUM WINDOW function and then calculate the ratio from one step to the next.

#standardSQL
WITH cross_device_funnel AS (
SELECT
    step1, step2, step3,
    COUNT(*) AS event_count,
    COUNT(DISTINCT user_id) AS users
    FROM (
    SELECT
        user_id,
        event_timestamp,
        CONCAT(platform, ' - ', event) AS step1,
        LEAD(CONCAT(platform, ' - ', event), 1) OVER (PARTITION BY user_id ORDER BY event_timestamp ASC) AS step2,
        LEAD(CONCAT(platform, ' - ', event), 2) OVER (PARTITION BY user_id ORDER BY event_timestamp ASC) AS step3
      FROM
        (
      SELECT *
      FROM `project_id.cross_device_reporting.ga360_events_last_7days`
      UNION ALL ( SELECT * FROM `project_id.cross_device_reporting.fb_android_events_last_7days` )
      UNION ALL ( SELECT * FROM `project_id.cross_device_reporting.fb_ios_events_last_7days` )
      )
      WHERE (event = 'currency special offer' AND platform = 'web')
      OR platform = 'android'
      OR platform = 'ios'
  )
  WHERE step1 LIKE '%currency special offer%'
  GROUP BY
      step1, step2, step3
)

SELECT 
  step1,
  step1_users,
  step2,
  step2_users,
  ROUND( (step2_users/step1_users) * 100 ,2) AS r_1_to_2,
  step3,
  step3_users,
  ROUND( (step3_users/step2_users) * 100 ,2) AS r_2_to_3
FROM (
  SELECT
    step1,
    SUM(users) OVER (PARTITION BY step1) step1_users,
    step2,
    SUM(users) OVER (PARTITION BY step1, step2) step2_users,
    step3,
    users AS step3_users
  FROM ( SELECT * FROM cross_device_funnel )
)
ORDER BY step1_users DESC, step2_users DESC, step3_users DESC
Result

Step 7: Remove the WHERE statement

If we remove the WHERE statement from Step 6, we no longer define the journeys that we want to analyze but rather can discover what the most popular journeys are, and where there might be issues due to users dropping off.

Give it a try and see which user journeys are the most popular for your app!

Next steps

You can try out some of the queries above on your own data set or on the Android and iOS sample data sets. Furthermore:

  • View the BigQuery cookbook for Google Analytics 360 and run the queries on the sample Google Analytics dataset.
  • Learn more on importing Firebase data into BigQuery and listen to the talk on ideas to analyze Firebase data on BigQuery.
  • Big Data Solutions

  • Product deep dives, technical comparisons, how-to's and tips and tricks for using the latest data processing and machine learning technologies.

  • Learn More

12 Months FREE TRIAL

Try BigQuery, Machine Learning and other cloud products and get $300 free credit to spend over 12 months.

TRY IT FREE