Prerequisites¶
Run
Create_ROI_from_bbox.ipynbto generate:parent_ids.npz(containsparent_idsandparent_level)
Run
Prep_regrid.ipynbto generate a temporary input Zarr (e.g.small.zarr)Install:
pip install "git+https://github.com/EOPF-DGGS/healpix-resample.git"What this notebook does¶
Loads the temporary Zarr
Stacks the source grid into a 1D
pointdimensionBuilds a HEALPix operator once from
nav_lon_*/nav_lat_*Applies the transform to selected variables with
xarray.apply_ufuncWrites the HEALPix output to Zarr (e.g.
hp.zarr)
Steps¶
Setup & load inputs
Load the temporary Zarr and the ROI definition (parent_ids.npz), and set parameters (levels, chunk sizes, paths).
Prepare the source grid for apply_ufunc
Stack the native grid into a 1D point dimension and drop invalid points (NaNs).
Build the HEALPix operator and regrid
Build the regridding operator once from nav_lon_* / nav_lat_*, then apply the transform with xarray.apply_ufunc.
Quick verification / visualization
Decode and visualize the HEALPix output for sanity checks.
Align to the ROI and write output Zarr
Expand parent_ids to child_level, reindex to the target cell_ids, chunk, and write the aligned HEALPix dataset to Zarr.
Output dataset conventions¶
cell_idscoordinate (nested indexing)cell_idsattributes: grid name, level, indexing scheme, ellipsoid (WGS84)
Next step (big scale)¶
After validating on the temporary Zarr, switch the input to the full virtual dataset (from the Kerchunk catalog) and run this notebook as a batch job on HPC. Once verified, publish the resulting Zarr to the FAIR2Adapt endpoint.
Step 1 — Setup & load inputs¶
Goal: define paths and parameters, then load:
the temporary input Zarr (from
Prep_regrid.ipynb)the ROI definition
parent_ids.npz(fromCreate_ROI_from_bbox.ipynb)
In the next cell we:
import dependencies
set
zarr_file_path,zarr_hp_file_path,child_level,time_chunk_sizeload
parent_idsandparent_levelfromparent_ids.npz
import numpy as np
import xarray as xr
import xdggs
from healpix_regrid import to_healpix
zarr_file_path = "/Users/todaka/data/RIOMAR/small.zarr"
time_chunk_size = 24 # 1 day as a chunk
child_level = 13
zarr_hp_file_path = "/Users/todaka/data/RIOMAR/small_hp.zarr"
with np.load("parent_ids.npz") as data:
parent_ids = data["parent_ids"]
parent_level = int(data["parent_level"])Step 2 — Prepare the source grid for apply_ufunc¶
Goal: convert the 2D native grid into a 1D point axis required by the ufunc.
In the next cell we:
open the input Zarr
stack(point=("y_rho","x_rho"))drop invalid points with
dropna(dim="point")
Output: ds with a point dimension (ready for regridding)
small_ds = xr.open_zarr(zarr_file_path)
ds = small_ds.stack(point=("y_rho", "x_rho")).dropna(dim="point")
dsStep 3 — Regrid to HEALPix using healpix_regrid.to_healpix¶
Goal: regrid the stacked dataset to HEALPix in a single call.
The to_healpix function from the healpix_regrid package handles:
Building the bilinear operator from
nav_lon_rho/nav_lat_rhoApplying the transform via
xr.apply_ufuncwithdask="parallelized"Attaching
cell_idscoordinate and HEALPix metadataAligning to the ROI target cell IDs and chunking
Output: ds_hp — a HEALPix-indexed dataset ready for writing
# Regrid using healpix_regrid.to_healpix
# This builds the operator, applies apply_ufunc, and aligns to the ROI cell_ids
ds_hp = to_healpix(
ds,
parent_ids=parent_ids,
parent_level=parent_level,
child_level=child_level,
time_chunk_size=time_chunk_size,
)
ds_hpStep 4 — Quick verification / visualization¶
Goal: visually verify that the HEALPix output looks correct.
In the next cell we:
compute a small result
decode HEALPix coordinates
explore on a map for sanity checks
# visualise and verify
ds_hp.compute().pipe(xdggs.decode).dggs.assign_latlon_coords().dggs.explore(alpha=0.3)Step 5 — Write output Zarr¶
The to_healpix function already handles ROI alignment (reindex to target cell_ids and chunking).
We just need to write the result to Zarr.
%%time
# ds_hp is already aligned to the ROI — write directly
ds_hp.to_zarr(zarr_hp_file_path, consolidated=True, mode="w")# should match exactly
np.array_equal(ds_aligned["cell_ids"].values, target_ids)
# how many were missing and got filled
missing = ~np.isin(target_ids, ds_hp["cell_ids"].values)
missing.sum()(
xr.open_dataset(
zarr_hp_file_path,
consolidated=True,
)
.compute()
.pipe(xdggs.decode)
.dggs.assign_latlon_coords()
.dggs.explore(alpha=0.3)
)new_ds = ds_hp
display(
(new_ds.max() - ds.max()).compute(),
(new_ds.min() - ds.min()).compute(),
(new_ds.mean() - ds.mean()).compute(),
)