AI & Machine Learning

How our commitment to open source unlocks AI and ML innovation

October 6, 2022

https://storage.googleapis.com/gweb-cloudblog-publish/images/AIML_VbefCPO.max-2600x2600.jpg

Sachin Gupta

Vice President & GM, Infrastructure, Google Cloud

Danu Mbanga

Group Product Manager, Engineering, Core ML Product

At Google, we believe anyone should be able to quickly and easily turn their artificial intelligence (AI) idea into reality. Open source software (OSS) has become increasingly important to this goal, heavily influencing the pace of innovation in AI and machine learning (ML) ecosystems. Over the last two decades, ML has transformed Google services including Search, YouTube, Assistant, and Maps, and the basis for this transformation has always been our "open first" approach through investments in projects and ecosystems like TensorFlow, Jax, and PyTorch.

These OSS efforts are important because many AI technologies rely on closed or exclusive approaches. This wall-garden approach creates high barriers to entry for developers; limits efforts to make AI explainable, ethical, equitable; and stunts innovation. We’re committed to open ecosystems, as we firmly believe no one company should own AI/ML innovation. In this blog post, we’ll explore some of Google’s most significant OSS AI and ML contributions from recent years, as well as how our commitment to open technologies can help organizations innovate faster and more flexibly.

Openness is the way to operate as an ecosystem, not a single project

Google’s OSS initiatives extend and enable AI initiatives according to three pillars:

Access — OSS allows developers, researchers and organizations of all sizes to leverage the latest ML technology. It is a key part of democratizing innovation in ML, fostering software diversity and choice for customers, and lowering operating cost while accelerating scale for everyone.
Transparency — Open source datasets, ML algorithms, training models, frameworks, and compilers ensure due diligence and validation by the larger community. This is paramount when it comes to ML as it bolsters reproducibility, interpretability, ensures equity, and boosts security.
Innovation — With more access and transparency, more innovation comes naturally. Our customers and partners take advantage of open source ML toolsets and frameworks to create more innovation in the field by contributing their own OSS.

Google’s ongoing commitment to open source AI

Google’s commitment to open standards spans over two decades of OSS contributions like TensorFlow, JAX, TFX, MLIR, KubeFlow, and Kubernetes, as well as sponsorship for critical OSS data science initiatives like Project Jupyter and NumFOCUS. Initiatives like these have helped Google become the leading Cloud Native Computing Foundation (CNCF) contributor—and by building on these efforts, Google Cloud seeks to be the best platform for the OSS AI community and ecosystem.

The perils of closed technologies can emerge at many points across ML pipelines, which is why Google’s OSS strategy encompasses the entire “idea-to-production” lifecycle, from acquiring data, to training models, to managing infrastructure, to facilitating experimentation and model refinement:

Data acquisition: starting the journey from idea to production-ready ML model

The journey from an idea to a production ML model starts with data. TensorFlow Datasets not only help users acquire ready-to-use, customizable, and highly-optimized datasets (including image, audio, and text), but also provides a set of helpful APIs that make it easy for users to organize their own datasets, regardless of whether they build with TensorFlow, Jax, or other ML frameworks.

Model development and training: shortening the path from data to useful ML

OSS libraries help developers and researchers design, implement, train, test, and debug ML algorithms. Our contributors on this front include:

The TensorFlow core framework, which offers APIs to help data scientists and developers build and train production-grade ML models on distributed and accelerated infrastructure powered by GPUs or TPUs;
Google’s founding membership of the PyTorch Foundation, which positions us to increase adoption of ML by building an ecosystem of open-source projects with PyTorch;
Keras, a simple and powerful ML framework, well integrated with TensorFLow, that makes it easy for developers to quickly build and train ML models, or to leverage pre-trained AI applications;
Model Garden, which provides implementations of many state-of-the-art computer vision and natural language processing models, maintained by Google and accessible to all, alongside APIs to accelerate training and experiments;
Jax, a lean, intuitive, and composable system that brings together automatic differentiation (Autograd) and the Accelerated Linear Algebra (XLA) optimizing compiler to offer high-performance ML for fast research and production;
TensorFlow Hub, a repository of trained ML models ready for fine-tuning and deployment; and,
MediaPipe open source cross-platform, which lets users leverage customizable ML solutions for live and streaming media, including text and video.

ML infrastructure management: scaling valuable models with powerful backends

Accessing and managing infrastructure for ML, especially at scale, can be a blocker for many organizations, which is why Google has invested in initiatives including:

The TFX (or TensorFlow Extended) platform, which offers software frameworks and tooling for full MLOps deployments, helping developers with data automation, model tracking, performance monitoring, and model retraining;
Kubeflow, which makes deployments of ML workflows on Kubernetes simple, portable and scalable; and,
TRC (TPU Research Cloud), which gives access to a cluster of more than 1,000 Cloud TPU devices at no charge to selected researchers who publish peer-reviewed papers and/or open source code.

Experimentation and model optimization: encouraging discovery and iteration

Data, tools for model training, and infrastructure can achieve only so much without strong processes for experimentation and optimization—which is why we’ve contribution to projects like xManager, which enables anyone to run and keep track of ML experiments locally or on Vertex AI and Tensorboard, which simplifies tracking and visualizing of model performance metrics.

These areas of focus will help not only our customers but the open-source AI community as a whole, and we’re excited to share more OSS news in coming days and months. To start exploring why many organizations choose Google Cloud for their open-source AI needs, visit our “open cloud” page and be sure to register for Google Cloud Next ‘22 for all our latest news.

^{Thanks to all the contributors to this blog post: Matt Vasey, George Elissaios, Warren Barkley, Manvinder Singh, James Rubin, Abhishek Ratna, Thea Lamkin, Amin Vahdat, Andrew Moore, Max Sapozhnikov, Gandhi, Vikram Kasivajhula}

Posted in