ERA5 data

Analysis-Ready, Cloud Optimized (ARCO) ERA5 is the fifth generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) Atmospheric Reanalysis, providing hourly estimates of a large number of atmospheric, land, and oceanic climate variables. The Google Cloud Public Dataset Program hosts ERA5 data that spans from 1940 to May 2023, covering the Earth on a 30 km grid and resolves the atmosphere using 137 levels from the surface up to a height of 80 km.

A reanalysis is the "most complete picture currently possible of past weather and climate." Reanalyses are created from assimilation of a wide range of data sources via numerical weather prediction (NWP) models. Meteorologically valuable variables for land and atmosphere were ingested and converted from grib data to Zarr (with no other modifications) to surface a cloud-optimized version of ERA5. In addition, an open-sourced code base is provided to show the provenance of the data as well as demonstrate common research workflows. This dataset includes both raw (grib) and cloud-optimized (zarr) files.

Use cases

ERA5 data can be used in many different applications, including:

  • Training ML models that predict the impact of weather on different phenomena
  • Training and evaluating ML models that forecast the weather
  • Computing climatologies, the average weather for a region over a given period of time
  • Visualizing and studying historical weather events, such as Hurricane Sandy

Thanks to the open data policy of the Copernicus Climate Change and Atmosphere Monitoring Services and ECMWF, this dataset is available free as part of the Google Cloud Public Dataset Program. Please see below for license information.

Dataset structure

The ERA5 dataset is stored in three core subdirectories: raw/, co/ and ar/. raw/ contains source data ingested from ECMWF. co/ contains a "cloud-optimized" version: These are data directly converted to a cloud-optimized format (Zarr) in its native grid without further processing. ar/, or "analysis-ready", contains an ML-ready dataset. This version of the corpus is in a regular lat/long grid and unifies surface and atmospheric data into a single Zarr.

Cloud-optimized data

Our cloud-optimized corpus includes five separate Zarr datasets, found in the Cloud Storage bucket gcp-public-data-arco-era5:

  • Model-level Moisture: The moisture-related variables and ozone mixing ratio on model levels.
  • Model-level Wind: The divergence, vorticity, temperature, and vertical velocity on model levels.
  • Single-level Surface: The surface geopotential and logarithm of surface pressure at the model's surface.
  • Single-level Forecast: 21 variables relating to solar/longwave radiation at the surface, precipitation amount and type, and snowfall depth and water content.
  • Single-level Reanalysis: 38 variables related to soil moisture/temperature, winds near the surface, temperature and moisture near the surface, total column water vapor and cloud condensate, total cloud cover, and sea-level pressure.

For more information on which variables are included with each dataset, see the example Jupyter notebooks in the GitHub repository.

Analysis-ready data

The Google Cloud analysis-ready corpus is a Zarr covering the years 1959-2022. The latest version of the data can be found in the Cloud Storage bucket gcp-public-data-arco-era5:

  • 1959-2022, full pressure levels: 31 surface and pressure level variables (for all 37 pressure levels) at a 0.25°/0.25° latitude/longitude resolution, organized in 1 hour chunks.

Data access

The following code snippet loads the analysis-ready dataset and displays a summary of the dataset:

import xarray
era5 = xarray.open_zarr(
    chunks={'time': 48},

For more examples using ARCO-ERA5 data in Python, see the example Jupyter notebooks.

About the dataset

Dataset Source: ECMWF - Generated using Copernicus Climate Change Service (C3S) Climate Data Store information.

Category: Atmospheric Science, Data Assimilation, Climate, Cloud Optimized, Meteorology, Reanalysis, Weather, Science & Research.

Use: Use of ERA5 data is free of charge, worldwide, non-exclusive, royalty-free and perpetual. All users of Copernicus Products must provide clear and visible attribution to the Copernicus program. Neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus information or data it contains. For full details of use, refer to the License to Use Copernicus Products.

Update Frequency: The ERA5 dataset is currently not refreshed in the Google Cloud Public Dataset Program. The program provides ERA5 data spanning from 1940 to May 2023.

Format: Raw files are in .grib and NetCDF format, processed files are in .zarr.

Cloud Storage Location: Data is stored in the bucket gcp-public-data-arco-era5, which is located in the us-central1 region.

Dataset roadmap: Development plans for this Google Cloud dataset are available in the ERA5 repository.