This notebook demonstrates how to use the egi_datahub_zarr toolkit to read and write Zarr v3 stores directly from/to EGI DataHub.
Setup¶
Get an access token from https://
datahub .egi .eu (Tokens → Create new access token) Set the EGI DATAHUB Token in the file “egi-datahub-token” (in your home directory)
Import the toolkit
# Import the toolkit
from egi_datahub_zarr import DataHubClient, open_zarr, to_zarr
import xarray as xr
import numpy as np
import osInitialize Client¶
# Create client (uses DATAHUB_TOKEN env var)
token = open(os.path.join(os.environ['HOME'],"egi-datahub-token")).read().rstrip()
os.environ["DATAHUB_TOKEN"] = token
client = DataHubClient(token)
# Check connection
user = client.get_user_info()
print(f"Connected as: {user.get('name')}")Connected as: Anne Fouilloux
List Spaces and Browse Files¶
# List available spaces
spaces = client.list_spaces()
print("Available spaces:")
for name, info in spaces.items():
print(f" 📁 {name}")Available spaces:
📁 Pangeo
📁 notebooks-shared
📁 PLAYGROUND
📁 Reliance
📁 open-datasets
# Browse a directory
items = client.list_directory("Reliance/FAIR2Adapt")
print("Contents of Reliance/FAIR2Adapt:")
for item in items:
icon = "📁" if item['type'] == 'DIR' else "📄"
print(f" {icon} {item['name']}")Contents of Reliance/FAIR2Adapt:
📁 CS1
📁 CS2
📁 CS3
📁 CS4
📁 CS5
📁 CS6
📄 README
# Open Zarr store directly from DataHub
ds = client.open_zarr("Reliance/FAIR2Adapt/CS1/sample_climate_data.zarr")
print(ds)<xarray.Dataset> Size: 378MB
Dimensions: (time: 365, lat: 180, lon: 360)
Coordinates:
* time (time) datetime64[ns] 3kB 2020-01-01 ... 2020-12-30
* lat (lat) float64 1kB -90.0 -88.99 -87.99 ... 87.99 88.99 90.0
* lon (lon) float64 3kB -180.0 -179.0 -178.0 ... 178.0 179.0 180.0
Data variables:
precipitation (time, lat, lon) float64 189MB dask.array<chunksize=(30, 45, 90), meta=np.ndarray>
temperature (time, lat, lon) float64 189MB dask.array<chunksize=(30, 45, 90), meta=np.ndarray>
Attributes:
title: Sample Climate Data for FAIR2Adapt CS1
institution: Science Live
source: Synthetic data for demonstration
Conventions: CF-1.8
# Access data - only fetches the chunks you need!
temp_slice = ds.temperature[0, :10, :10].values
print(f"Temperature slice shape: {temp_slice.shape}")
print(f"Values:\n{temp_slice}")Temperature slice shape: (10, 10)
Values:
[[ 19.96714153 13.61735699 21.47688538 30.23029856 12.65846625
12.65863043 30.79212816 22.67434729 10.30525614 20.42560044]
[ 20.19346514 30.32738913 13.91239852 19.01711722 21.90143992
10.98779528 17.24092482 15.12592401 15.97676099 7.26990216]
[ 18.07801769 -2.10168393 1.51814578 22.43264094 16.70865438
13.16016664 15.18433933 18.47581705 9.6024032 7.21695275]
[ 11.51347866 11.50742296 11.78364949 35.76747984 18.81935452
19.30041647 25.30283454 17.38789159 12.40957854 13.03650151]
[ 21.62881269 26.73473857 16.81021559 2.03168052 18.99687952
8.48643106 9.71383318 20.86364019 27.38283071 15.21271577]
[ 34.01190686 14.39339186 7.91593233 -0.13714393 -3.03139676
-0.84135943 17.67126651 20.08725023 -0.81190702 23.95038314]
[ 3.88542043 17.46504778 19.98221748 26.40149039 30.80540687
4.84905808 6.89142489 2.42422141 12.65980141 19.66358374]
[ 16.75211419 44.85259003 18.67481665 11.86470314 24.21801502
19.82687887 19.2009449 21.06850593 35.56543565 3.69111565]
[ -7.38231233 -6.20700153 8.93134822 19.57686586 -12.47504843
10.00269824 9.73752143 28.88337778 11.14978188 18.82988985]
[ 15.12499322 43.68403056 -1.68599259 25.5872853 13.27198263
22.71920155 19.41307152 7.66843803 17.28996392 -3.57901451]]
Method 2: Using convenience function¶
# One-liner to open Zarr
ds = open_zarr("Reliance/FAIR2Adapt/CS1/sample_climate_data.zarr")
print(ds)<xarray.Dataset> Size: 378MB
Dimensions: (time: 365, lat: 180, lon: 360)
Coordinates:
* time (time) datetime64[ns] 3kB 2020-01-01 ... 2020-12-30
* lat (lat) float64 1kB -90.0 -88.99 -87.99 ... 87.99 88.99 90.0
* lon (lon) float64 3kB -180.0 -179.0 -178.0 ... 178.0 179.0 180.0
Data variables:
precipitation (time, lat, lon) float64 189MB dask.array<chunksize=(30, 45, 90), meta=np.ndarray>
temperature (time, lat, lon) float64 189MB dask.array<chunksize=(30, 45, 90), meta=np.ndarray>
Attributes:
title: Sample Climate Data for FAIR2Adapt CS1
institution: Science Live
source: Synthetic data for demonstration
Conventions: CF-1.8
Method 3: Using store directly with xarray¶
# Get the low-level store for more control
store = client.get_zarr_store("Reliance/FAIR2Adapt/CS1/sample_climate_data.zarr")
# Use with xarray directly
ds = xr.open_zarr(store, consolidated=False, zarr_format=3)
print(ds)<xarray.Dataset> Size: 378MB
Dimensions: (time: 365, lat: 180, lon: 360)
Coordinates:
* time (time) datetime64[ns] 3kB 2020-01-01 ... 2020-12-30
* lat (lat) float64 1kB -90.0 -88.99 -87.99 ... 87.99 88.99 90.0
* lon (lon) float64 3kB -180.0 -179.0 -178.0 ... 178.0 179.0 180.0
Data variables:
precipitation (time, lat, lon) float64 189MB dask.array<chunksize=(30, 45, 90), meta=np.ndarray>
temperature (time, lat, lon) float64 189MB dask.array<chunksize=(30, 45, 90), meta=np.ndarray>
Attributes:
title: Sample Climate Data for FAIR2Adapt CS1
institution: Science Live
source: Synthetic data for demonstration
Conventions: CF-1.8
# Create a sample dataset
times = np.arange('2024-01-01', '2024-01-11', dtype='datetime64[D]')
lats = np.linspace(-90, 90, 36)
lons = np.linspace(-180, 180, 72)
# Generate random data
np.random.seed(42)
temperature = 15 + 10 * np.random.randn(len(times), len(lats), len(lons))
precipitation = np.abs(np.random.randn(len(times), len(lats), len(lons))) * 10
# Create xarray dataset
new_ds = xr.Dataset(
{
'temperature': (['time', 'lat', 'lon'], temperature),
'precipitation': (['time', 'lat', 'lon'], precipitation),
},
coords={
'time': times,
'lat': lats,
'lon': lons,
},
attrs={
'title': 'Test dataset from Python toolkit',
'created_by': 'egi_datahub_zarr toolkit',
'Conventions': 'CF-1.8',
}
)
print(new_ds)<xarray.Dataset> Size: 416kB
Dimensions: (time: 10, lat: 36, lon: 72)
Coordinates:
* time (time) datetime64[s] 80B 2024-01-01 2024-01-02 ... 2024-01-10
* lat (lat) float64 288B -90.0 -84.86 -79.71 ... 79.71 84.86 90.0
* lon (lon) float64 576B -180.0 -174.9 -169.9 ... 169.9 174.9 180.0
Data variables:
temperature (time, lat, lon) float64 207kB 19.97 13.62 ... 0.3737 14.29
precipitation (time, lat, lon) float64 207kB 3.215 13.34 ... 0.04883 19.88
Attributes:
title: Test dataset from Python toolkit
created_by: egi_datahub_zarr toolkit
Conventions: CF-1.8
Write to DataHub¶
# Write using client
client.to_zarr(new_ds, "Reliance/FAIR2Adapt/CS1/my_test_output.zarr")✅ Written to: Reliance/FAIR2Adapt/CS1/my_test_output.zarr
# Or use the convenience function
to_zarr(new_ds, "Reliance/FAIR2Adapt/CS1/another_output.zarr")✅ Written to: Reliance/FAIR2Adapt/CS1/another_output.zarr
Verify the write¶
# Read back to verify
verified_ds = client.open_zarr("Reliance/FAIR2Adapt/CS1/my_test_output.zarr")
print("Verified dataset:")
print(verified_ds)
print(f"\nTemperature values match: {np.allclose(verified_ds.temperature.values, new_ds.temperature.values)}")Verified dataset:
<xarray.Dataset> Size: 416kB
Dimensions: (time: 10, lat: 36, lon: 72)
Coordinates:
* time (time) datetime64[ns] 80B 2024-01-01 ... 2024-01-10
* lat (lat) float64 288B -90.0 -84.86 -79.71 ... 79.71 84.86 90.0
* lon (lon) float64 576B -180.0 -174.9 -169.9 ... 169.9 174.9 180.0
Data variables:
precipitation (time, lat, lon) float64 207kB dask.array<chunksize=(10, 36, 72), meta=np.ndarray>
temperature (time, lat, lon) float64 207kB dask.array<chunksize=(10, 36, 72), meta=np.ndarray>
Attributes:
title: Test dataset from Python toolkit
created_by: egi_datahub_zarr toolkit
Conventions: CF-1.8
Temperature values match: True
Advanced: Low-Level Store Access¶
from egi_datahub_zarr import OnedataZarrStore
import os
# Get the store directly for advanced use
TOKEN = os.environ.get("DATAHUB_TOKEN")
# First resolve the path to get the file ID
provider, file_id = client.resolve_path("Reliance/FAIR2Adapt/CS1/sample_climate_data.zarr")
print(f"Provider: {provider}")
print(f"File ID: {file_id[:50]}...")
# Create store directly
store = OnedataZarrStore(
root_file_id=file_id,
token=TOKEN,
provider=provider,
read_only=True
)
# Use with zarr directly
import zarr
group = zarr.open_group(store, mode='r', zarr_format=3)
print(f"\nZarr group contents: {list(group.keys())}")Provider: cesnet-oneprovider-01.datahub.egi.eu
File ID: 000000000052446A6775696423666132326565333666326565...
Zarr group contents: ['lat', 'lon', 'precipitation', 'temperature', 'time']
Cleanup (Optional)¶
# Delete test files if needed
client.delete_zarr("Reliance/FAIR2Adapt/CS1/my_test_output.zarr")
client.delete_zarr("Reliance/FAIR2Adapt/CS1/another_output.zarr")✅ Deleted: Reliance/FAIR2Adapt/CS1/my_test_output.zarr
✅ Deleted: Reliance/FAIR2Adapt/CS1/another_output.zarr
Summary¶
The egi_datahub_zarr toolkit provides:
| Feature | Method |
|---|---|
| List spaces | client.list_spaces() |
| Browse directories | client.list_directory(path) |
| Read Zarr | client.open_zarr(path) or open_zarr(path) |
| Write Zarr | client.to_zarr(ds, path) or to_zarr(ds, path) |
| Delete Zarr | client.delete_zarr(path) |
| Low-level store | client.get_zarr_store(path) |
All paths use the format: SpaceName/folder/subfolder/file.zarr