Case studies

Here we list completed datasets, with the reproducible code that made them, link to the created references and possibly notebook/benchmark examples. This page is a work in progress. All datasets available here will also be listed in the repo Intake catalogue.


This page needs to be cleaned up and the cases standardized.

Solar Dynamics Observatory

Native data format: FITS.

Effective in-memory data size: 400GB

Notes: each wavelength filter is presented as a separate variable. The DATE-OBS of the nearest preceding 94A image is used for other filters to maintain a single time axis for all variables.


National Water Model

Native data format: NetCDF4/HDF5.

Effective in-memory size: 80TB

Notes: there are so many files, that dask and a tee reduction were required to aggregate the metadata.

Generator notebook:



Native data format: NetCDF4/HDF5. Effective in-memory size: 66TB. On disk size: 16TB



Notes: Global sea surface temperature data. The notebook includes benchmarks. See the notebook for how to establish NASA Earthdata credentials necessary for data access.


Native format: GRIB2.

Effective in-memory size: 1.5GB (11-file subset)


Notebook (generation and use):

Notes: High-Resolution Rapid Refresh, real-time 3-km resolution, hourly updated, cloud-resolving, convection-allowing atmospheric model from NOAA. Notebook extracts only sections matching the filter “heightAboveGround=2”.