Regrid RiOMar variables to HEALPix using healpix-resample + xarray.apply_ufunc

Prerequisites¶

Run Create_ROI_from_bbox.ipynb to generate:
- parent_ids.npz (contains parent_ids and parent_level)
Run Prep_regrid.ipynb to generate a temporary input Zarr (e.g. small.zarr)
Install:

pip install "git+https://github.com/EOPF-DGGS/healpix-resample.git"

What this notebook does¶

Loads the temporary Zarr
Stacks the source grid into a 1D point dimension
Builds a HEALPix operator once from nav_lon_* / nav_lat_*
Applies the transform to selected variables with xarray.apply_ufunc
Writes the HEALPix output to Zarr (e.g. hp.zarr)

Steps¶

Setup & load inputs

Load the temporary Zarr and the ROI definition (parent_ids.npz), and set parameters (levels, chunk sizes, paths).

Prepare the source grid for apply_ufunc

Stack the native grid into a 1D point dimension and drop invalid points (NaNs).

Build the HEALPix operator and regrid

Build the regridding operator once from nav_lon_* / nav_lat_*, then apply the transform with xarray.apply_ufunc.

Quick verification / visualization

Decode and visualize the HEALPix output for sanity checks.

Align to the ROI and write output Zarr

Expand parent_ids to child_level, reindex to the target cell_ids, chunk, and write the aligned HEALPix dataset to Zarr.

Output dataset conventions¶

cell_ids coordinate (nested indexing)
cell_ids attributes: grid name, level, indexing scheme, ellipsoid (WGS84)

Next step (big scale)¶

After validating on the temporary Zarr, switch the input to the full virtual dataset (from the Kerchunk catalog) and run this notebook as a batch job on HPC. Once verified, publish the resulting Zarr to the FAIR2Adapt endpoint.

Step 1 — Setup & load inputs¶

Goal: define paths and parameters, then load:

the temporary input Zarr (from Prep_regrid.ipynb)
the ROI definition parent_ids.npz (from Create_ROI_from_bbox.ipynb)

In the next cell we:

import dependencies
set zarr_file_path, zarr_hp_file_path, child_level, time_chunk_size
load parent_ids and parent_level from parent_ids.npz

import numpy as np
import xarray as xr
import xdggs

from healpix_regrid import to_healpix

zarr_file_path = "/Users/todaka/data/RIOMAR/small.zarr"
time_chunk_size = 24  # 1 day as a chunk
child_level = 13
zarr_hp_file_path = "/Users/todaka/data/RIOMAR/small_hp.zarr"

with np.load("parent_ids.npz") as data:
    parent_ids = data["parent_ids"]
    parent_level = int(data["parent_level"])

Step 2 — Prepare the source grid for `apply_ufunc`¶

Goal: convert the 2D native grid into a 1D point axis required by the ufunc.

In the next cell we:

open the input Zarr
stack(point=("y_rho","x_rho"))
drop invalid points with dropna(dim="point")

Output: ds with a point dimension (ready for regridding)

small_ds = xr.open_zarr(zarr_file_path)
ds = small_ds.stack(point=("y_rho", "x_rho")).dropna(dim="point")
ds

Step 3 — Regrid to HEALPix using `healpix_regrid.to_healpix`¶

Goal: regrid the stacked dataset to HEALPix in a single call.

The to_healpix function from the healpix_regrid package handles:

Building the bilinear operator from nav_lon_rho / nav_lat_rho
Applying the transform via xr.apply_ufunc with dask="parallelized"
Attaching cell_ids coordinate and HEALPix metadata
Aligning to the ROI target cell IDs and chunking

Output: ds_hp — a HEALPix-indexed dataset ready for writing

# Regrid using healpix_regrid.to_healpix
# This builds the operator, applies apply_ufunc, and aligns to the ROI cell_ids
ds_hp = to_healpix(
    ds,
    parent_ids=parent_ids,
    parent_level=parent_level,
    child_level=child_level,
    time_chunk_size=time_chunk_size,
)
ds_hp

Step 4 — Quick verification / visualization¶

Goal: visually verify that the HEALPix output looks correct.

In the next cell we:

compute a small result
decode HEALPix coordinates
explore on a map for sanity checks

# visualise  and verify
ds_hp.compute().pipe(xdggs.decode).dggs.assign_latlon_coords().dggs.explore(alpha=0.3)

Step 5 — Write output Zarr¶

The to_healpix function already handles ROI alignment (reindex to target cell_ids and chunking). We just need to write the result to Zarr.

%%time

# ds_hp is already aligned to the ROI — write directly
ds_hp.to_zarr(zarr_hp_file_path, consolidated=True, mode="w")

# should match exactly
np.array_equal(ds_aligned["cell_ids"].values, target_ids)

# how many were missing and got filled
missing = ~np.isin(target_ids, ds_hp["cell_ids"].values)
missing.sum()

(
    xr.open_dataset(
        zarr_hp_file_path,
        consolidated=True,
    )
    .compute()
    .pipe(xdggs.decode)
    .dggs.assign_latlon_coords()
    .dggs.explore(alpha=0.3)
)

new_ds = ds_hp
display(
    (new_ds.max() - ds.max()).compute(),
    (new_ds.min() - ds.min()).compute(),
    (new_ds.mean() - ds.mean()).compute(),
)

Prerequisites¶

What this notebook does¶

Steps¶

Output dataset conventions¶

Next step (big scale)¶

Step 1 — Setup & load inputs¶

Step 2 — Prepare the source grid for apply_ufunc¶

Step 3 — Regrid to HEALPix using healpix_regrid.to_healpix¶

Step 4 — Quick verification / visualization¶

Step 5 — Write output Zarr¶

Step 2 — Prepare the source grid for `apply_ufunc`¶

Step 3 — Regrid to HEALPix using `healpix_regrid.to_healpix`¶