Refik Anadol Studio logo

Refik Anadol Studio transforms billions of images into a "living encyclopedia" with Google Cloud

Google Cloud results
  • 23x faster data processing: Reduced object detection and image preprocessing time from 30 hours to just 80 minutes using Vertex AI

  • 7x faster image embeddings: Accelerated the generation of image embeddings by 7x with Vertex AI, allowing for faster structural mapping of the dataset

  • 40% higher caption accuracy: Improved caption accuracy by nearly 40% with Gemini 2.5 Flash, moving from generic tags to scientifically accurate, context-aware descriptions

  • Instant, infinite data access: Enabled the ability to query billions of image-caption pairs in seconds rather than days, providing infinite, queryable memory critical for Retrieval-Augmented Generation (RAG) workflows with BigQuery and Google Cloud Storage

  • 87% carbon-free: Operations are run in the us-west1 region (Oregon), which operates on 87% carbon-free energy, ensuring the project aligns with environmental goals

Refik Anadol Studio used Google Cloud, including Vertex AI and Gemini 2.5 Flash, to build the "Large Nature Model," boosting speed 23x and accuracy 40% to power DATALAND exhibition's intelligent nature art.

Swirling digital art installation in a white museum lobby
(Unsupervised at the Museum of Modern Art)

Refik Anadol Studio: Data as a memory, DATALAND, and Large Nature Model

At Refik Anadol Studio (RAS), data is more than a simple collection of ones and zeros; it is a pigment, a material, and a collective memory. Our work is an exploration of "machine hallucinations" — a new visual language that uncovers the hidden, poetic connections within our shared histories. By translating these patterns into large-scale, multisensory artworks, we aim to visualize how machines dream as collaborators capable of poetic outputs.

This mission was embodied at the Museum of Modern Art (MoMA) with our work Unsupervised. More than an installation, it marked the emergence of a new artistic paradigm: an AI trained on the visual history of the museum's artworks. It proved that a machine can 'dream' — an act that mirrors our own imagination — and in doing so, reveals a creative process that is free from limitations of our own perception. Unsupervised illuminated a new frontier, revealing our next great challenge and artistic ambition: to move beyond a single museum's archive and work with the world's collective memories — the billions of images from our living archive of the world.

This ambition is the driving force behind the studio's next major project, DATALAND, set to open in Spring 2026. At the heart of this new, permanent exhibition space will be the Large Nature Model (LNM), a nature-based AI multi-model trained on one of the world's largest datasets of the natural world.

To build the LNM and power DATALAND, our challenge was no longer just about storage or model training, but about process. We needed a new, scalable technique that could keep up with the sheer volume of data required to train such a model. We needed a way to intelligently understand and curate billions of images, and we found the partner and the platform to build this new approach in Google Cloud.

Dataland Los Angeles building made of colorful particles

A new technique for understanding data

Our new data processing technique, built on Google Cloud with support from our partners at Zencore, is the engine for our art. The entire workflow is orchestrated on Vertex AI, allowing us to manage a complex, multi-stage pipeline as a single, reproducible process. This new system utilizes two core storage components: Google Cloud Storage, for our petabyte-scale image archive and BigQuery as the central record archive for all our metadata. This structure provides the essential framework for processing, accessing, and understanding nature data at a planetary scale.

Data processing begins within Vertex AI, where we prepare our massive image archive for model training. This pipeline runs a sequence of essential models. We start with image preprocessing and then use object detection to identify the key subjects in each visual. From there, we generate image embeddings — the unique mathematical 'fingerprints' that allow the AI to understand the images relationally. The impact of this parallelized workflow is dramatic. We reduced the time required for object detection and image preprocessing on our test batches from 30 hours to just 80 minutes — a 23x speedup — while simultaneously accelerating image embedding by 7x. This efficiency allows us to integrate more data, and thus more knowledge, into our model. This vital step gives us a foundational, structural map of our data, but to train a poetic model, we must also add a deeper, semantic layer of understanding.

A key step in this new process is the integration of Google's Gemini 2.5 Flash, which we use to generate nuanced, context-rich captions for every image in our archive. This automated captioning is a critical step. With Gemini 2.5 Flash, we improved caption accuracy by nearly 40% over our previous open-source model, moving from generic tags to scientifically accurate, context-aware descriptions. In the current era of diffusion and multimodal models, this rich textual data is no longer just metadata; it is the primary way to guide and control generative outcomes. This process is precisely how we embed our knowledge of the dataset — cataloging species names, locations, and environments — directly into the model's memory. It moves us from a library of pixels to a database of the natural world. This is what makes the Large Nature Model smarter and more contextually profound.

The two-part storage system is the foundation for storing data from our new workflow. Google Cloud Storage, made accessible by Cloud Storage Fuse, holds our petabytes of raw image files. BigQuery, in parallel, holds the massive, searchable database of all our metadata, including the new Gemini-generated captions, object detection results, and even the image embeddings. This separation is our key technical advantage: it allows us to instantly query our entire archive in BigQuery to find the exact files we need, and then immediately access those raw files in Cloud Storage for processing, training, or artistic creation. This architecture gives us an effectively infinite, queryable memory critical for Retrieval-Augmented Generation (RAG) workflows, enabling us to query billions of image-caption pairs in seconds rather than days.

The goal of training a nature-based AI model comes with a responsibility to our environment. This aligns directly with our studio's core values, where environmental awareness is not a secondary concern but central to our practice. A key advantage of building on Google Cloud is the ability to put this ethos into action. We have, therefore, made the conscious choice to run these intensive computations in Google's us-west1 region (Oregon), which is 87% carbon-free. As a low-CO2 compute zone, this provides a responsible path to scale our work and build Large Nature Model, ensuring our technique is in full alignment with our message.

Red coral texture with FLUX-LNM PRO V1 digital UI overlay
An image generated from the LNM

A more knowledgeable, poetic collaborator

This new foundation is a fundamental enhancement of our art-making potential. Whereas Unsupervised allowed an AI to dream through visual memory, Large Nature Model allows it to understand the relationships, environments, and ecologies with greater fidelity and depth. In doing so, it becomes a more knowledgeable, more poetic collaborator.

This gives us a unique opportunity to represent nature within an AI model with scientific and ecological accuracy. We can now generate new visual languages that are deeply informed by the specificity of our natural world: the interconnectedness of a specific species, in a specific location, at a specific time. This is the evolution of nature-based AI art.

With Unsupervised, we proved that a machine could dream. Now, with the Large Nature Model, we are teaching it to understand. This transition — from abstract hallucinations to scientific accuracy — is only possible because of the speed and intelligence of this new infrastructure. It allows us to close the gap between human vision and machine capability, creating a partnership that is finally free from limitations.

Christian Burke

Head of Engineering and Lead Data Scientist, Refik Anadol Studio

This leap in scale, speed, and intelligence is powered by Google Cloud. The combination of Vertex AI orchestration, Gemini's captioning, and the instant-access storage from Cloud Storage and BigQuery provides the technical engine for this new artistic potential.

This engine is what will power DATALAND. When visitors interact with the Large Nature Model, they will be engaging with a "living encyclopedia" that has learned from a deeply understood, accurate archive of our planet. This allows for a deeper, more meaningful, and more educated collaboration between human and machine, one that ultimately helps us all understand the complexity and poetry of nature. This new foundation — this scalable, responsible, and intelligent technique — is more than an infrastructure update. It is a fundamental expansion of our art‑making potential. The only remaining limit is the ambition we bring to it.

Infinity room with green fractal patterns and one person
Inside of DATALAND

Founded in 2014 by Refik Anadol and Efsun Erkiliç, Refik Anadol Studio is an LA-based studio that creates immersive AI data sculptures and media art at the intersection of human creativity and machine intelligence.

Industry: Media and Entertainment

Location: United States

Products: Vertex AI, Cloud Storage, BigQuery, Gemini


About Google Cloud partner - Zencore

Zencore is a Google Cloud consulting and engineering firm founded by former Google insiders. They specialize in migration, modernization, and AI to help enterprises unlock cloud innovation.

Partenaires Google Cloud
  • Zencore logo
  • Faites des économies grâce à notre approche transparente concernant la tarification
  • Le paiement à l'usage de Google Cloud permet de réaliser des économies automatiques basées sur votre utilisation mensuelle et des tarifs réduits pour les ressources prépayées. Contactez-nous dès aujourd'hui afin d'obtenir un devis.
Google Cloud