PayPal's historically large data migration is the foundation for its gen AI innovation

Mani Iyer
SVP & Global Head of Data, AI & ML Technology, PayPal
Vaishali Walia
Sr Director Data Analytics, PayPal
Try Gemini 3.1 Pro
Our most intelligent model available yet for complex tasks on Gemini Enterprise and Vertex AI
Try nowWith the dawn of the gen AI era, businesses are facing unprecedented opportunities for transformative products, demanding a strategic shift in their technology infrastructure. A few years ago, PayPal, a digital-native company serving hundreds of millions of customers, faced a significant challenge. After 25 years of success in expanding services and capabilities, we’d created complexity in our data analytics infrastructure. Some 400 petabytes of data was spread across a dozen siloed systems due to limitations of scale and acquisitions of companies like Venmo, Braintree, and others.
Our very success in growth and innovation had created complexity that threatened our next evolution.
To continue leading the next wave of innovation in financial services, we knew we had to modernize our data foundation. Today, we’re proud to share how PayPal successfully completed what’s arguably one of the largest data migrations in history, culminating with the move of our analytics to BigQuery, Google Cloud’s enterprise data warehouse. This effort marks a significant leap in creating the robust data framework we’ll need to expand and advance our business priorities and meet the ever-evolving financial needs of our customers.
This migration was essential, but the scale was daunting. In fact, by some measures, such as our now sunset Teradata system, we believe this was one of the biggest data migrations in history. Befitting of such history, we wanted to offer some insights into how we tackled this migration and what others might consider when undertaking a significant migration of their own.
Untapped potential of data
As one of the original digital payment pioneers, PayPal processes billions of transactions, and houses decades of valuable customer insights. We have a mountain of data — really a mountain range — that had developed over decades without being fully leveraged in the service of our customers and merchants.
Each acquisition and new service added valuable capabilities but also introduced new data challenges. For example, a small business owner might use PayPal for online sales and Venmo for local transactions. However, providing a unified view of their business required complex processes that were costly and slow.
The fragmentation of data limited our ability to offer personalized experiences to consumers, thereby reducing the potential to maximize the value of their money and hindering our ability to gain deeper insights from the data.
As the gen AI era dawned, our digital fragmentation was becoming more than just a technical inconvenience. With AI becoming a transformative force in financial services with huge potential ROI, we knew fragmented data would severely limit our ability to create the intelligent experiences customers have come to expect. These could run from further strengthening our industry-leading fraud detection models to providing a best-in-class commerce platform for merchants to help them succeed in the competitive global economy.
To get there, we had to get our disparate data platforms in order, first.
Legacy systems, modern ambitions
The scope was massive. We needed to consolidate multiple data platforms, including what’s believed to be the world’s largest Teradata deployment, along with Hadoop clusters, Redshift, Snowflake, and various other systems processing petabytes of transaction data. This migration also had to be executed while maintaining the uninterrupted security and reliability our customers depend on.
As a technology company, PayPal has considerable internal resources, so we first had to decide whether to tackle this challenge ourselves. We weighed the costs and benefits and decided that if we were to unify and scale our on-premise infrastructure to meet our future needs, the cost and time-to-complete would have been prohibitive. Plus, the innovations in AI were happening at a rapid pace in the cloud. To truly leverage the power of our data, we needed to be where that innovation is happening.
We assessed various data warehousing solutions and chose BigQuery due to its numerous advantages. It is a fully managed, cloud native platform with disaggregated compute and storage that can scale independently. It has powerful capabilities at the scale and performance we needed, and a familiar SQL interface meant a gentler learning curve for our developer community.
Most importantly, BigQuery’s native integrations with AI enable seamless and efficient data analytics.
The journey to unified data
After choosing Google Cloud as our data partner, we embarked on our historic data migration. This may sound hyperbolic, but when you consider the scale of PayPal’s business, the geographies across which we operate, the regulations within each, the sensitive and quite literally valuable nature of this data, the scope of the challenge starts to be clear.
With the help of partners and experts from Google Cloud Consulting, we migrated more than 300 petabytes of data and streamlined operations, decommissioning around 25% of workloads. And we managed this all while maintaining zero downtime of our business operations and with no impact to customers. Here are some key factors that contributed to our success.
Alignment: The first hurdle in achieving transformations at scale is aligning stakeholders on a shared goal. So, we made it an enterprise-wide priority.
Discovery and analysis: Detailed inventories of data, workloads and inbound/outbound data streams is crucial for defining scope, effort and forecasting budget. Establishing lineage allowed us to trace the origins and relationships of various components, thereby providing a clear and comprehensive view of the dependency graphs.
Strategy: It is crucial to establish fundamental principles for the migration process, such as deciding between lift-and-shift versus modernization, defining security principles, setting governance guardrails, and determining how consumption will be tracked.
Execution: We automated every possible task and developed live dashboards to continuously monitor the progress of migrations. FinOps was integrated through the migration process with clear visibility of consumption and performance.
Benefits from BigQuery and beyond
We’ve achieve faster insights. Queries are 2.5x to 10x faster, including complex queries used by data scientists. This unlocks real-time insights, enabling PayPal to personalize product recommendations, offers, and customer support.
We’ve built new AI foundations. Data accessible for model training is 16x fresher. Feature engineering, a crucial step in AI development, is improved by instant access to clean, governed data. This accelerates the development personalized financial guidance, and predictive analytics for both consumers and businesses.
We’ve optimized operations. By migrating to BigQuery Data infrastructure vendors were reduced from four to one, streamlining operations and reducing complexity. Data duplication between platforms was entirely eliminated.
Our new unified data platform in BigQuery has become the source for PayPal's next wave of innovation, enabling us to create more intuitive, personalized experiences across our entire ecosystem and to leverage the power of gen AI.
AI-powered innovation unleashed
Looking ahead, we're exploring how this unified data platform will enable us to deliver AI-powered experiences that weren't possible before, including:
-
Predictive fraud prevention that spots potential issues before they affect our customers.
-
Personalized financial insights that help merchants optimize their businesses.
-
Seamless payment experiences that adapt to each customer's preferences and patterns.
-
More intelligent risk assessment that could help expand financial access to underserved communities.
Agentic commerce and future possibilities we are now able to imagine.
Lessons for the AI era
While our migration may be extraordinary in its scale, we are not alone in our needs or ambitions. There are ample considerations for companies within and well beyond financial services who may be pondering their own data foundations at this time.
First off, do not underestimate how under-utilized your data may be, and how unorganized.
Making sure your data is centralized, accurate, and consistent paves the way for AI experimentation and deployment. Organizations that spend time cleaning up their data fabric will be able to bring machine learning and generative AI applications to market more quickly, and do so at scale.
Second, ensuring data is accessible to everyone within your organization, with the proper controls, unlocks so much potential. Data orchestration and enterprise search, coupled with generative AI, has the potential to break down longstanding organizational silos and speed up decision-making across your organization. It’s one of the most promising applications of AI.
The financial world will continue to evolve, driven by new technologies and changing customer expectations. PayPal’s data transformation shows how even established companies can reinvent themselves to stay ahead of this change — provided they're willing to tackle the fundamental challenges that stand in their way.
In doing so, we've not only preserved our position as a digital payments pioneer but set ourselves up to continue leading the next wave of innovation in digital commerce.



