Meet the Authors: Data Governance–The Definitive Guide
Today, we sit down virtually with the authors of the newly released O’Reilly book “Data Governance: The Definitive Guide”. Evren Eryurek is the Director of Product Management at Google Cloud whose portfolio includes data governance, data cataloging & discovery and data marketplace. Uri Gilad is spearheading a cross-functional effort to create the relevant controls, management tools and policy workflows that enable Google Cloud Platform (GCP) customers to apply data governance policies in a unified fashion wherever their data may be in their GCP deployment. Anita Kibunguchy-Grant is a Product Marketing Lead for Google Cloud and used to lead GTM efforts for data security & governance at Google Cloud. Jessi Ashdown is a User Experience Researcher for Google Cloud who conducts user studies with customers from all over the world and uses the findings and feedback from these studies to help inform and shape Google’s data governance products to best serve those users’ needs. Valliappa "Lak" Lakshmanan is the Director of Analytics and AI Solutions, a team that puts together data governance solutions across customers’ entire portfolio.
Q: Evren, what motivated you to write this book?
Evren: As industry after industry begins to explore cloud-based solutions, one common question we get asked is “How do you govern data at Google, and are there some best practices you can share with us?” So Uri, Anita, and I thought it would really help the cloud industry as well as many potential customers who are actively seeking solutions, to publish a whitepaper that outlines simple steps, some examples and a few best practices that we believed would come in handy for many.
An online magazine carried a synopsis of the whitepaper. Jessica Haebrman from O’Reilly saw the article and asked me if I’d ever consider turning that into a book. I tried to get out of it fast -- having written two theses (for my Masters and Ph.D.), I knew how painstaking and time consuming this would be. Jessica was not going to give up easily, so I agreed to do it if we could get a team together.
Besides Uri and Anita, who were my co-authors on the whitepaper, it was clear that we needed Lak and Jessi on the team. They were both playing a huge role in driving the data governance strategies and capabilities in Google Cloud and had tremendous knowledge about the space, challenges-in-hand, and ways to overcome them. Lak was putting together data governance solutions because governance, to be effective, has to span on-premises and other clouds and was working with many of our customers as they put in place their data governance strategy. Jessi Ashdown was conducting surveys of customers on their data estates, and their governance objectives in order to devise our data governance user interface. This survey information was a treasure trove of insights and a great signpost of where the data governance space is headed.
This was a dream team and I am so proud to be co-authoring this book with them; I will cherish this forever, as each of them brought their own unique perspective, their experience and flavor to the process and to the final publication itself.
Q: About the whitepaper and article that launched the book … you must be drawing from some of the lessons learned from your experience at GE Healthcare?
Evren: Indeed. I spent many years in the healthcare industry delivering software solutions from medical devices to EMRs to image processing solutions.
One thing I really appreciated in that industry was how each and every member of the healthcare providers or healthcare technology vendors were trained, informed and well aware of the criticality of handling PHI (protected health information) data. From training employees to providing the right tools to establishing the right processes which helped everyone–no matter what role they played in the industry–be a part of this ecosystem. Everyone’s ultimate goal is serving their patients and delivering healthcare, and because of this mindset, the healthcare industry is ahead of the game when it comes to data governance and dealing with data regulations.
Despite all the heavy regulation, though, the industry has consistently found ways to solve the toughest healthcare problems collectively. Instead of locking down the critical data they each had, they were able to share it in a secure, governed way by applying de-anonymization techniques to unlock the inherent value of the data. This enabled us drive quality outcomes, establish cohort analyses, deliver population healthcare, innovate in personalized care by collectively working on genomics data, find ways to cure tough cancer cases, and bring the world’s experts of their areas together virtually as a member of tumor boards to diagnose and treat the rarest cases one could imagine.
When GDPR became law in European countries, many companies reacted in ways that actually hurt them. We have all read stories about how some companies thought deleting everything was the solution and then realized (too late!) that the deleted data was irrecoverable in some cases. They realized that they should rather be establishing processes, bringing tools and training their people to deal with data governance in their world. My hope was that, by working on this book and sharing our experiences, we could help everyone in their own industries who are trying to deal with these, or similar regulations.
Locking up or deleting your data is not the solution – establishing a proper data governance program is however; this is what I have seen in the healthcare industry and I believe it is applicable to all industries.
Q: Where do you think the data governance space is headed, Uri?
Uri: Security and governance is a mindset, and there are several principles that help you “get security right”. One of these, which we stress in the book is to minimize “friction” between the end user and the enforcement of policies. Essentially, we want to instill in our readers the notion that “data should work for you” and make sure readers don't fall into “data is something that should be locked away”. Friction-free data governance, making sure access is widely available, and (at the same time) data remains safe and not misused is going to be a harbinger of success for any governance program.
I think we are at the early adopter phase. While there are a few companies that do data governance, there are no major cloud providers that include a comprehensive data governance solution in their portfolio. I believe that inevitably, this will change, and we will begin to see a standardized set of capabilities around privacy, security, data quality and data cataloging, as well as sustained investment in broader, end-to-end, and platform-native solutions. Fortunately, we are already seeing early signs of this.
At present, it is unclear whether these standardized set of capabilities will be provided by a cloud provider and the capabilities provided by different clouds will interoperate or whether the standardization will be driven by entities that can provide their tools across multiple clouds and on-premises. I suspect that it will be a mix; cloud platforms will provide native capabilities and these will connect to multi- and hybrid-cloud tools. On Google Cloud, the first approach is exemplified by Data Catalog and the second by partners such as Informatica and Collibra with the connections between the two still quite nascent and evolving.
Q: Data governance is certainly an evolving space. Jessi, could you share some of the gaps in users’ data governance landscape that you hear time and time again?
Jessi: As a person who spends her time talking to strategic cloud customers, I have seen the priorities/trends/focus shift over the years with regards to governance. When we initially began researching customer needs in this space we assumed it was simply a few key features that would enable compliance with GDPR. What we found, however, is that the entire space of managing data both in terms of ensuring compliance but also just for day to day analytics and security were huge pain points.
Throughout my interviews I heard some gaps mentioned time and time again. Many of these are represented in the book, such as a lack of having the right tools, the right people, and a successful process that brings it all together. What we attempted to outline in the book is not just a rehash of these main challenges but to outline how/why each of these areas are important as well as practical, doable strategies that can be implemented regardless of company size, budget, etc.
One of the things I consistently saw was that many data governance frameworks assume that every company (regardless of size or budget) needed to implement these frameworks “just so” in order to be successful. During my research it became painfully obvious that many (and in fact, most) companies simply do not have the means to execute these frameworks exactly and thus feel completely lost in how to not only wrangle and protect their data but also use it to make better business decisions. Having this understanding of company dynamics as well as the people involved who not only decide company governance strategy but also have to implement it gave us a unique perspective making the book approachable and the strategies achievable.
Q: Lak, you claim in the book that data governance is primarily about making data trustworthy. Can you explain?
Lak: I believe that data governance does not need to become a burden, a price that you have to pay in order to be able to use data. Instead, data governance is the set of best practices that you implement so that your data becomes trustworthy. Data governance practices contribute positively to data quality. By making your data discoverable across the organization, you can ensure that you create trustworthy single sources of truth. A key benefit of data protection and audit logs is to ensure that any data that you use in analytics and machine learning models is clean and that any malicious changes can be accounted for, even after the fact.
Data quality is absolutely essential when planning a data program. Organizations very often overestimate the quality of the data they have and underestimate the impact of bad data quality. The same program that governs data life cycle, controls, and usage can be leveraged to govern the quality of data (and plan for the impact and response to bad data quality incidents). This is why data governance is a key part of any organization’s data-centric transformation.
Our Google Cloud teams work with hundreds of customers embarking on becoming more data centric and this is a lens that we encourage them to adopt. Of course, we also help them implement this advice, something made easier because of our data governance solutions like BeyondCorp Zero Trust, Data Catalog, product capabilities like the immutable audit logs in BigQuery, lineage tracking in Dataflow and Data Fusion, and pre-built AI tools like the Data Loss Prevention API.
Q: How does Google Cloud fit into all this?
Anita: There’s no doubt that data governance is continually evolving as the 3Vs of data - velocity, variety and volume - increases. Yes, I know that the 3Vs are cliche, but they do capture the nature of the changes relatively well. Not only that, regulations keep evolving as new ones are added and existing ones are updated. This makes the work of data governance complex and never ending. Organizations need to stay on top of things to stay compliant and really put their data to work to reap the benefits of governance. And so how does Google Cloud fit into all this?
You’ve probably heard us say that at Google, big data is in our DNA. We’ve built products for billions of users and have some experience doing this. We’ve brought these products to life for our customers and they are the backbone for our Google Cloud services. We start by offering a secure-by-design infrastructure across hardware, services, user identity, storage, internet communication and operations to deliver true defense-in-depth. We’re constantly staying up-to-date on regulations and undergo independent verification of our security, privacy and compliance controls to help you meet your regulatory and policy objectives. In addition, we’re providing services that have built-in capabilities, so you can focus on running your business. For example, we offer encryption in transit and at rest, by default so your data is protected at all times. Customers can leverage Cloud IAM for fine-grained access control and visibility for centrally managing cloud resources. Finally, we work to earn your trust through transparency. We state and adhere to a concrete set of trust principles that govern our approach to security. We know that security and governance are complicated, and time-consuming, and these are some of the ways we’re continuing to be your trusted partner in this journey.
Q: All proceeds from this book benefit the Nature Conservancy. Can you share why you picked this organization?
Lak: When you have a large group of authors, and want to donate all the royalties to a non-profit, it is important to pick a non-profit that is not overtly political or religious. Within these constraints, we wanted to pick an organization that matched our interests. We are all outdoors people, and saving the best places on Earth for future generations is something we all care about. The Nature Conservancy has an interesting approach in that they actively promote collaboration between environmentalists and local industries and take very innovative approaches instead of relying on regulation. Close to the topic of data governance and analytics, a new effort by the Nature Conservancy called nature-based solutions tries to quantify the benefits of natural features in providing water infrastructure. So, the Nature Conservancy was our choice because it is inclusive, innovative, and applies data analysis methods to conservation.