Keep up with the latest announcements from Google Cloud Next '21. Click here.

Data Warehousing

SQL Server SSRS, SSIS packages with Google Cloud BigQuery

#databases

After migrating a Data Warehouse to Google Cloud BigQuery, ETL and Business Intelligence developers are often tasked with upgrading and enhancing data pipelines, reports and dashboards. Data teams who are familiar with SQL Server Integration Services (SSIS) and SQL Server Reporting Services (SSRS) are able to continue to use these tools with BigQuery, allowing them to modernize ETL pipelines and BI platforms after an initial data migration is complete. The following blog details patterns and examples on how Data teams can use SSIS and SSRS with BigQuery. 

Using SQL Server Integration Services (SSIS) with BigQuery

SQL Server Integration Services (SSIS) is used to move data between source and target databases. A common simplified pattern is featured below, where data is extracted from an OLTP system (source) and written to a Data Warehouse (target). Note that there are two intermediate steps, Data Transformations and Data Mappings. These steps contain specific business rules, logic and C# code that at this time makes sense to keep in SSIS rather than to move into BigQuery.

SSIS Data Flow

1 SSIS Data Flow.jpg

In the modified SSIS Data Flow below, a Multicast transformation is added along with a Flat File Output destination for the csv file.  Note that while data continues to load into the existing EDW database, it is also written to a file system as a Flat File (csv format) which is subsequently copied to a GCP bucket and loaded into BigQuery. This pattern supports enhancements to the current production EDW in parallel to BigQuery development workstreams.

SSIS Data Flow with BigQuery destination added

2 SSIS Data Flow with BigQuery destination added.jpg

Once generated, the csv file is then copied into a Google Cloud Storage bucket (GCS) via a PowerShell script, which is run via an Execute Process task. A screenshot of a sample SSIS Control Flow follows. 

SSIS Control Flow

3 SSIS Control Flow.jpg

Output: SSIS Execute Process task

4 SSIS Execute Process task.jpg

Configuration: SSIS Execute Process Task

5 SSIS Execute Process Task.jpg

Sample Syntax from the above screenshot

  C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
-ExecutionPolicy Unrestricted -command  "C:\tmp\copy-files.ps1" 
C:\tmp\stg


## Example Copy-Script Syntax.
## Remember to create a GCS bucket and a folder before copying files.
## ex. gsutil mb gs://000-86527-1005/

## If the GoogleCloud powershell module is not installed, you may need to run the following commands:
## [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
## Install-PackageProvider -Name NuGet
## Install-Module GoogleCloud

#Source file staging area
$src = "C:\tmp\stg\*.csv"

#Target GCP bucket destination
$tgt = "gs://000-86527-1005/csv/"

#Copy to GCS bucket
gsutil -m cp $src $tgt

Once the csv files are copied into a GCS Bucket, external tables are created in BigQuery to reference the files. In this scenario, an external table functions as a staging table and can be queried directly or used as a source to merge data into internal tables optimized for reporting and user access. A sample external table schema follows.

  CREATE OR REPLACE EXTERNAL TABLE edw_staging.etl-demo-1(
 personid INT64,
 name STRING,
 rowguid STRING,
 modifieddate DATETIME
)
 OPTIONS ( 
 format = 'CSV',
 uris = ['gs://000-86527-1005/csv/etl-demo-1.csv'],
 field_delimiter = ',',
 max_bad_records = 5
 );

To support data QA and validation, the GCP professional services Data Validation Tool (DVT) can be used to automate testing and data validation, for example, comparing table row counts and column data distributions between the EDW and BigQuery databases.

The Data Validation Tool (DVT) is an open sourced Python CLI tool based on the Ibis framework that compares heterogeneous data source tables with multi-leveled validation functions.

Data validation is a critical step in a Data Warehouse, Database or Data Lake migration project, where structured or semi-structured data from both the source and the destination tables are compared to ensure they are matched and correct after each migration step (e.g. data and schema migration, SQL script translation, ETL migration, etc.). The Data Validation Tool provides an automated and repeatable solution to perform this task.

Using SQL Server Reporting Services (SSRS) with Google BigQuery

SQL Server Reporting Services (SSRS) is used for reports and dashboards, and can be used to generate reports directly from Google BigQuery using the BigQuery Simba ODBC drivers (32-bit and 64-bit) .

Important developer note: Visual Studio 2019 is a 32-bit IDE and SSRS Report Server (SQL 2019) is a 64-bit application. Develop and test SSRS reports using a 32-bit ODBC DSN, and then editing the report data source to reference the 64-bit ODBC DSN after deploying the reports to the SQL Server Report Server.

Visual Studio - Installation Process

  1. Install Visual Studio 2019 Community Edition including SQL Server Data Tools (SSDT)

  2. Install the Visual Studio SSIS project template from the Visual Studio SSIS project template from the visual studio marketplace or alternatively, using Chocolatey

  3. Install the Visual Studio SSRS project template from the visual studio marketplace

  4. Download and Install the BigQuery Simba ODBC drivers (32 bit and 64 bit) and Configure the ODBC System DSN’s.

A BigQuery Service Account allows the BigQuery Simba ODBC DSN to access BigQuery tables and views using a stored credential. You must create a BigQuery Service Account and download the key file (.json) before configuring the ODBC DSN. Also, make sure you grant the Service Account the required BigQuery User permissions.

Screenshot: Service Account Permissions for a BigQuery Service Account.

6 Service Account Permissions.jpg

Configure the ODBC System DSN’s (32-bit and 64-bit)

7 Configure the ODBC System DSN.jpg

*UI Hint: When configuring the DSN, copy and paste your GCP ProjectID in the Catalog (Project) field, then select the Dataset, then click the Catalog (Project) dropdown to re-select the Catalog (Project).

Configure SQL Server Reporting Services

When we execute a report and specify that the data source does not require a username or password, SSRS requires an execution account that is a domain account.  Note that a test domain user account GONTOSO\ssrs-demo was created for this example. This account is a simple domain user account and has no permissions on the SSRS Report Server.
8 Configure SQL Server Reporting Services.jpg

Add the execution account in the Report Server Configuration Manager below.

9 Report Server Configuration Manager.jpg

SSRS Report Development & Deployment

It’s important to remember that a 64-bit DSN is required for the report server. It is recommended to develop/test with the 32-bit DSN and then modify the Data Source on the SSRS server to reference the 64-bit DSN. Note that in the screenshot below under the Credentials subheading “Without Any Credentials” is checked, which causes SSRS to use the Report Server Execution Account we configured previously.  Also, since the BigQuery Service Account we previously created has Read access to the BigQuery dataset that was created, it’s a good idea to revisit Report Server Roles and Permissions to ensure that reports are secured.
10 SSRS Report Development.jpg

Report Demo

11 Report Demo.jpg

In summary, both developers and analysts who are familiar with SSRS and SSIS can use  familiar tool sets with BigQuery after an initial migration, and then plan to modernize ETL/ELT using Data Flow and Reporting and BI with Looker in subsequent project phases.