Redivis makes research data accessible, experiences collaborative with BigQuery
Understanding the data we collect is essential—it allows us to identify trends and uncover answers about our world. However, stories in our data frequently go untold. Large datasets are hard to share between research communities due to their size, security restraints, and complexity. Even if these datasets are accessible to users, the tools needed to query them often require deep technical knowledge. This is why Redivis partnered with Google Cloud to help make research data from higher education institutions easier to analyze and more accessible.
Redivis’s mission is to create a frictionless “data commons”—a place where researchers can discover, request access to, and query large datasets to support their studies. To make this goal possible, Redivis began to rethink the traditional data-distribution process.
Challenges to making data more accessible
When Redivis first started, their team interviewed dozens of researchers to understand their biggest problems. Most researchers expressed how difficult it is to find new datasets, and how many steps it takes to access and work with the data—often before knowing if the information the dataset contains is even useful for their study. Additionally, data administrators want their datasets to be utilized but are often concerned about data security.
Storing large amounts of sensitive data requires the right set of security controls. To help keep their data secure, Redivis developed a transparent, tiered access system for datasets. Researchers can request separate access to a dataset’s documentation, variables, sample, and full data, which allows them to assess the usability of the dataset without filing access applications. Moreover, administrators can set rules for how researchers use and combine different datasets depending on their level of access.
Redivis built their platform on top of Google Cloud’s security infrastructure, which allows the company to encrypt data, manage security keys, and helps secure datasets with the operational and physical security layers available. Combined with detailed audit logs (supported by Google Cloud Logging) and robust application-level security controls, Redivis is able to provide data owners with the peace of mind that their data is only being accessed and used as they’ve allowed.
Sharing data to build more compelling stories
When we join multiple sources of data, we can uncover a more complete story, such as in the case of examining environmental conditions. By combining data about historic fires, air quality data, and population health outcomes, researchers are able to offer policy guidance to protect the most at-risk populations. However, if the datasets stayed separate, we would likely lose insight into the impact these events have on each other. With the help of cloud solutions like Cloud Storage and BigQuery, Redivis figured out ways to securely connect the data between public datasets hosted in Big Query with private datasets to unlock enriched insights for their researchers.
Using Cloud Storage, Redivis makes it easy for administrators to upload large amounts of data to the platform. These data records are then stored in BigQuery, Google Cloud’s serverless and scalable data warehouse. When researchers explore their data with Redivis, they can easily see what steps they need to take to request access to existing records. Once authorized, users can query the data using SQL, without needing to know database languages. This will provide the user with manageable data subsets that can be analyzed within the context of their current study. Finally, researchers can integrate a wide array of analytical tools into this data pipeline. Using BigQuery’s ability to one-click export data to Google’s Data Studio, Redivis is able to create interactive data visualizations and integrate with notebook environments through Python and R clients.
With BigQuery managing infrastructure requirements, Redivis scaled to petabytes of data, 1,000 times larger than the terabytes they had previously, without additional infrastructure workloads straining their company. Most importantly, BigQuery’s compute architecture supports real-time analysis across billions of records from both public and restricted datasets, unlocking new ways to discover insights. “Researchers are regularly coming to me to say that queries that once took hours are executing in seconds,” says Ian Mathews, CEO of Redivis. “One can only imagine how transformative this is in understanding new datasets and exploring novel hypotheses.”
The future of data accessibility
As more academic institutions and researchers join Redivis, they will continue to identify ways of minimizing friction at every step of the data-driven research process.