We are Legend: How Goldman Sachs' open-source data platform democratizes information
Ephrim Stanley
Technology Fellow, Goldman Sachs
Goldman Sachs released its Legend data management system to the world so others could break down silos and make better financial decisions, too.
Having quality data is a core component to running any business, and this is especially true in financial services.
At Goldman Sachs, everyone relies on data. This includes data for financial products, standardized reference data, market data, employee data, and more. Our data platform, Legend, was developed as a tool to enable anyone, from business users to technologists, to safely and efficiently access the data they need.
Internally, the platform has been instrumental in standardizing conversations about data and improving access to information across teams. In 2020, we open-sourced Legend to enable our clients and other financial services firms to benefit from the platform.
Integrating with Legend opens possibilities
We’re excited to partner with Google Cloud and other organizations to extend the capabilities of Legend to more customers. Legend integrates with a variety of data services from Google Cloud, including BigQuery and BigLake. Here’s a look at where we believe the industry can benefit from access to these tools.
Data modeling and analytics: Legend Data Model + BigQuery/BigLake
With a unified data platform, data producers can store massive datasets and data consumers can use Legend to ensure everyone is viewing a standardized version of that information that’s ready for business use. This means that data stored in BigQuery can be easily queried through a Legend data model, so users can benefit from the data management capabilities of BigQuery while still benefiting from a logical data model.
Data sharing: Legend Lambda + Analytics Hub
A Legend Lambda is a concise way of expressing a logical data query using the Legend language. These lambdas can be pushed down to physical data platforms such as BigQuery or shared via data sharing services such as Analytics Hub. This enables data sharing across the Google Cloud ecosystem while using the governance capabilities of a Legend data model.
Data security: Legend Connectors + Identity and Access Management (IAM)
Data has to be secured, and this requires both authentication and authorization. Legend enables Google Cloud customers to natively integrate with IAM services like workload/workforce authentication, Oauth integration, and more. We’ve been able to reduce integration and onboarding times for new tools that interact with our data because Legend manages authentication and authorization as a one-time onboarding step.
This is especially important because maintaining the highest level of data governance standards is our top priority when it comes to how we use cloud-based solutions. Effectively managing entitlements to make sure users can only access data they should have access to is vital to staying compliant.
A more powerful open-source data platform
The collaboration between Legend and Google Cloud has helped us build turnkey, ergonomic integrations. This benefits anyone looking for a powerful data platform like Legend while expanding the capabilities of those using open cloud platforms like Google’s.
Opening image created with Midjourney, running on Google Cloud, using the prompt: "a modern office tower glowing with data, drawn in a cartoonish style that looks like the headquarters for a major investment firm."