SQL Server SSRS, SSIS packages with Google Cloud BigQuery
After migrating a Data Warehouse to Google Cloud BigQuery, ETL and Business Intelligence developers are often tasked with upgrading and enhancing data pipelines, reports and dashboards. Data teams who are familiar with SQL Server Integration Services (SSIS) and SQL Server Reporting Services (SSRS) are able to continue to use these tools with BigQuery, allowing them to modernize ETL pipelines and BI platforms after an initial data migration is complete. The following blog details patterns and examples on how Data teams can use SSIS and SSRS with BigQuery.
Using SQL Server Integration Services (SSIS) with BigQuery
SQL Server Integration Services (SSIS) is used to move data between source and target databases. A common simplified pattern is featured below, where data is extracted from an OLTP system (source) and written to a Data Warehouse (target). Note that there are two intermediate steps, Data Transformations and Data Mappings. These steps contain specific business rules, logic and C# code that at this time makes sense to keep in SSIS rather than to move into BigQuery.
SSIS Data Flow
In the modified SSIS Data Flow below, a Multicast transformation is added along with a Flat File Output destination for the csv file. Note that while data continues to load into the existing EDW database, it is also written to a file system as a Flat File (csv format) which is subsequently copied to a GCP bucket and loaded into BigQuery. This pattern supports enhancements to the current production EDW in parallel to BigQuery development workstreams.
SSIS Data Flow with BigQuery destination added
Once generated, the csv file is then copied into a Google Cloud Storage bucket (GCS) via a PowerShell script, which is run via an Execute Process task. A screenshot of a sample SSIS Control Flow follows.
SSIS Control Flow
Output: SSIS Execute Process task
Configuration: SSIS Execute Process Task
Sample Syntax from the above screenshot
-ExecutionPolicy Unrestricted -command "C:\tmp\copy-files.ps1"
## Example Copy-Script Syntax.
## Remember to create a GCS bucket and a folder before copying files.
## ex. gsutil mb gs://000-86527-1005/
## If the GoogleCloud powershell module is not installed, you may need to run the following commands:
## [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
## Install-PackageProvider -Name NuGet
## Install-Module GoogleCloud
#Source file staging area
$src = "C:\tmp\stg\*.csv"
#Target GCP bucket destination
$tgt = "gs://000-86527-1005/csv/"
#Copy to GCS bucket
gsutil -m cp $src $tgt
Once the csv files are copied into a GCS Bucket, external tables are created in BigQuery to reference the files. In this scenario, an external table functions as a staging table and can be queried directly or used as a source to merge data into internal tables optimized for reporting and user access. A sample external table schema follows.
CREATE OR REPLACE EXTERNAL TABLE edw_staging.etl-demo-1(
format = 'CSV',
uris = ['gs://000-86527-1005/csv/etl-demo-1.csv'],
field_delimiter = ',',
max_bad_records = 5
To support data QA and validation, the GCP professional services Data Validation Tool (DVT) can be used to automate testing and data validation, for example, comparing table row counts and column data distributions between the EDW and BigQuery databases.
The Data Validation Tool (DVT) is an open sourced Python CLI tool based on the Ibis framework that compares heterogeneous data source tables with multi-leveled validation functions.
Data validation is a critical step in a Data Warehouse, Database or Data Lake migration project, where structured or semi-structured data from both the source and the destination tables are compared to ensure they are matched and correct after each migration step (e.g. data and schema migration, SQL script translation, ETL migration, etc.). The Data Validation Tool provides an automated and repeatable solution to perform this task.
Using SQL Server Reporting Services (SSRS) with Google BigQuery
SQL Server Reporting Services (SSRS) is used for reports and dashboards, and can be used to generate reports directly from Google BigQuery using the BigQuery Simba ODBC drivers (32-bit and 64-bit) .
Important developer note: Visual Studio 2019 is a 32-bit IDE and SSRS Report Server (SQL 2019) is a 64-bit application. Develop and test SSRS reports using a 32-bit ODBC DSN, and then editing the report data source to reference the 64-bit ODBC DSN after deploying the reports to the SQL Server Report Server.
Visual Studio - Installation Process
Install Visual Studio 2019 Community Edition including SQL Server Data Tools (SSDT)
Install the Visual Studio SSIS project template from the Visual Studio SSIS project template from the visual studio marketplace or alternatively, using Chocolatey.
A BigQuery Service Account allows the BigQuery Simba ODBC DSN to access BigQuery tables and views using a stored credential. You must create a BigQuery Service Account and download the key file (.json) before configuring the ODBC DSN. Also, make sure you grant the Service Account the required BigQuery User permissions.
Screenshot: Service Account Permissions for a BigQuery Service Account.
Configure the ODBC System DSN’s (32-bit and 64-bit)
*UI Hint: When configuring the DSN, copy and paste your GCP ProjectID in the Catalog (Project) field, then select the Dataset, then click the Catalog (Project) dropdown to re-select the Catalog (Project).
Configure SQL Server Reporting ServicesWhen we execute a report and specify that the data source does not require a username or password, SSRS requires an execution account that is a domain account. Note that a test domain user account GONTOSO\ssrs-demo was created for this example. This account is a simple domain user account and has no permissions on the SSRS Report Server.
Add the execution account in the Report Server Configuration Manager below.
SSRS Report Development & DeploymentIt’s important to remember that a 64-bit DSN is required for the report server. It is recommended to develop/test with the 32-bit DSN and then modify the Data Source on the SSRS server to reference the 64-bit DSN. Note that in the screenshot below under the Credentials subheading “Without Any Credentials” is checked, which causes SSRS to use the Report Server Execution Account we configured previously. Also, since the BigQuery Service Account we previously created has Read access to the BigQuery dataset that was created, it’s a good idea to revisit Report Server Roles and Permissions to ensure that reports are secured.