Here we list completed datasets, with the reproducible code that made them, link to the created references and possibly notebook/benchmark examples. This page is a work in progress. All datasets available here will also be listed in the repo Intake catalogue.
This page needs to be cleaned up and the cases standardized.
Sentinel Global coherence
Native data format: GeoTIFF.
Effective in-memory size: 400TB.
Solar Dynamics Observatory
Native data format: FITS.
Effective in-memory data size: 400GB
Notes: each wavelength filter is presented as a separate variable. The DATE-OBS of the nearest preceding 94A image is used for other filters to maintain a single time axis for all variables.
National Water Model
Native data format: NetCDF4/HDF5.
Effective in-memory size: 80TB
Notes: there are so many files, that dask and a tee reduction were required to aggregate the metadata.
Generator notebook: https://nbviewer.org/gist/rsignell-usgs/ef435a53ac530a2843ce7e1d59f96e22
Native data format: NetCDF4/HDF5. Effective in-memory size: 66TB. On disk size: 16TB
Notes: Global sea surface temperature data. The notebook includes benchmarks. See the notebook for how to establish NASA Earthdata credentials necessary for data access.
Native format: GRIB2.
Effective in-memory size: 1.5GB (11-file subset)
Notebook (generation and use): https://nbviewer.org/gist/peterm790/92eb1df3d58ba41d3411f8a840be2452
Notes: High-Resolution Rapid Refresh, real-time 3-km resolution, hourly updated, cloud-resolving, convection-allowing atmospheric model from NOAA. Notebook extracts only sections matching the filter “heightAboveGround=2”.