ERA5 data

Analysis-Ready, Cloud Optimized (ARCO) ERA5 is the fifth generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) Atmospheric Reanalysis, providing hourly estimates of a large number of atmospheric, land, and oceanic climate variables. The Google Cloud Public Dataset Program hosts ERA5 data that spans from 1979 to August 2021, covering the Earth on a 30 km grid and resolves the atmosphere using 137 levels from the surface up to a height of 80 km.

A reanalysis is the "most complete picture currently possible of past weather and climate." Reanalyses are created from assimilation of a wide range of data sources via numerical weather prediction (NWP) models. Meteorologically valuable variables for land and atmosphere were ingested and converted from grib data to Zarr (with no other modifications) to surface a cloud-optimized version of ERA5. In addition, an open-sourced code base is provided to show the providence of the data as well as demonstrate common research workflows. This dataset includes both raw (grib) and cloud-optimized (zarr) files.

Use cases

ERA5 data can be used in many different applications, including:

  • Training ML models that predict the impact of weather on different phenomena
  • Training and evaluating ML models that forecast the weather
  • Computing climatologies, the average weather for a region over a given period of time
  • Visualizing and studying historical weather events, such as Hurricane Sandy

Thanks to the open data policy of the Copernicus Climate Change and Atmosphere Monitoring Services and ECMWF, this dataset is available free as part of the Google Cloud Public Dataset Program. Please see below for license information.

Dataset structure

The ARCO-ERA5 dataset is stored in five separate Zarr datasets, found in the Cloud Storage bucket gcp-public-data-arco-era5:

  • Model-level Moisture: The moisture-related variables and ozone mixing ratio on model levels.
  • Model-level Wind: The divergence, vorticity, temperature, and vertical velocity on model levels.
  • Single-level Surface: The surface geopotential and logarithm of surface pressure at the model's surface.
  • Single-level Forecast: 21 variables relating to solar/longwave radiation at the surface, precipitation amount and type, and snowfall depth and water content.
  • Single-level Reanalysis: 38 variables related to soil moisture/temperature, winds near the surface, temperature and moisture near the surface, total column water vapor and cloud condensate, total cloud cover, and sea-level pressure.

For more information on which variables are included with each dataset, see the example Jupyter notebooks in the GitHub repository.

Data access

The following code snippet loads the single-level reanalysis dataset and displays a summary of the dataset:

import xarray
single_reanalysis = xarray.open_zarr(
    chunks={'time': 48},

For more examples using ARCO-ERA5 data in Python, see the example Jupyter notebooks.

About the dataset

Dataset Source: ECMWF - Generated using Copernicus Climate Change Service (C3S) Climate Data Store information.

Category: Atmospheric Science, Data Assimilation, Climate, Cloud Optimized, Meteorology, Reanalysis, Weather, Science & Research.

Use: Use of ERA5 data is free of charge, worldwide, non-exclusive, royalty-free and perpetual. All users of Copernicus Products must provide clear and visible attribution to the Copernicus program. Neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus information or data it contains. For full details of use, refer to the License to Use Copernicus Products.

Update Frequency: The ERA5 dataset is currently not refreshed in the Google Cloud Public Dataset Program. The program provides ERA5 data spanning from 1979 to August 2021.

Format: Raw files are in .grib format, re-gridded files are in .zarr files.

Cloud Storage Location: Data is stored in the bucket gcp-public-data-arco-era5, which is located in the us-central1 region.