Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Regrid RiOMar variables to HEALPix using healpix-resample + xarray.apply_ufunc

Prerequisites

  1. Run Create_ROI_from_bbox.ipynb to generate:

    • parent_ids.npz (contains parent_ids and parent_level)

  2. Run Prep_regrid.ipynb to generate a temporary input Zarr (e.g. small.zarr)

  3. Install:

pip install "git+https://github.com/EOPF-DGGS/healpix-resample.git"

What this notebook does

  • Loads the temporary Zarr

  • Stacks the source grid into a 1D point dimension

  • Builds a HEALPix operator once from nav_lon_* / nav_lat_*

  • Applies the transform to selected variables with xarray.apply_ufunc

  • Writes the HEALPix output to Zarr (e.g. hp.zarr)

Steps

  1. Setup & load inputs

Load the temporary Zarr and the ROI definition (parent_ids.npz), and set parameters (levels, chunk sizes, paths).

  1. Prepare the source grid for apply_ufunc

Stack the native grid into a 1D point dimension and drop invalid points (NaNs).

  1. Build the HEALPix operator and regrid

Build the regridding operator once from nav_lon_* / nav_lat_*, then apply the transform with xarray.apply_ufunc.

  1. Quick verification / visualization

Decode and visualize the HEALPix output for sanity checks.

  1. Align to the ROI and write output Zarr

Expand parent_ids to child_level, reindex to the target cell_ids, chunk, and write the aligned HEALPix dataset to Zarr.

Output dataset conventions

  • cell_ids coordinate (nested indexing)

  • cell_ids attributes: grid name, level, indexing scheme, ellipsoid (WGS84)

Next step (big scale)

After validating on the temporary Zarr, switch the input to the full virtual dataset (from the Kerchunk catalog) and run this notebook as a batch job on HPC. Once verified, publish the resulting Zarr to the FAIR2Adapt endpoint.

Step 1 — Setup & load inputs

Goal: define paths and parameters, then load:

  • the temporary input Zarr (from Prep_regrid.ipynb)

  • the ROI definition parent_ids.npz (from Create_ROI_from_bbox.ipynb)

In the next cell we:

  1. import dependencies

  2. set zarr_file_path, zarr_hp_file_path, child_level, time_chunk_size

  3. load parent_ids and parent_level from parent_ids.npz

import numpy as np
import xarray as xr
import xdggs

from healpix_regrid import to_healpix

zarr_file_path = "/Users/todaka/data/RIOMAR/small.zarr"
time_chunk_size = 24  # 1 day as a chunk
child_level = 13
zarr_hp_file_path = "/Users/todaka/data/RIOMAR/small_hp.zarr"

with np.load("parent_ids.npz") as data:
    parent_ids = data["parent_ids"]
    parent_level = int(data["parent_level"])

Step 2 — Prepare the source grid for apply_ufunc

Goal: convert the 2D native grid into a 1D point axis required by the ufunc.

In the next cell we:

  1. open the input Zarr

  2. stack(point=("y_rho","x_rho"))

  3. drop invalid points with dropna(dim="point")

Output: ds with a point dimension (ready for regridding)

small_ds = xr.open_zarr(zarr_file_path)
ds = small_ds.stack(point=("y_rho", "x_rho")).dropna(dim="point")
ds
Loading...

Step 3 — Regrid to HEALPix using healpix_regrid.to_healpix

Goal: regrid the stacked dataset to HEALPix in a single call.

The to_healpix function from the healpix_regrid package handles:

  1. Building the bilinear operator from nav_lon_rho / nav_lat_rho

  2. Applying the transform via xr.apply_ufunc with dask="parallelized"

  3. Attaching cell_ids coordinate and HEALPix metadata

  4. Aligning to the ROI target cell IDs and chunking

Output: ds_hp — a HEALPix-indexed dataset ready for writing

# Regrid using healpix_regrid.to_healpix
# This builds the operator, applies apply_ufunc, and aligns to the ROI cell_ids
ds_hp = to_healpix(
    ds,
    parent_ids=parent_ids,
    parent_level=parent_level,
    child_level=child_level,
    time_chunk_size=time_chunk_size,
)
ds_hp

Step 4 — Quick verification / visualization

Goal: visually verify that the HEALPix output looks correct.

In the next cell we:

  • compute a small result

  • decode HEALPix coordinates

  • explore on a map for sanity checks

# visualise  and verify
ds_hp.compute().pipe(xdggs.decode).dggs.assign_latlon_coords().dggs.explore(alpha=0.3)
Loading...

Step 5 — Write output Zarr

The to_healpix function already handles ROI alignment (reindex to target cell_ids and chunking). We just need to write the result to Zarr.

%%time

# ds_hp is already aligned to the ROI — write directly
ds_hp.to_zarr(zarr_hp_file_path, consolidated=True, mode="w")
# should match exactly
np.array_equal(ds_aligned["cell_ids"].values, target_ids)

# how many were missing and got filled
missing = ~np.isin(target_ids, ds_hp["cell_ids"].values)
missing.sum()
(
    xr.open_dataset(
        zarr_hp_file_path,
        consolidated=True,
    )
    .compute()
    .pipe(xdggs.decode)
    .dggs.assign_latlon_coords()
    .dggs.explore(alpha=0.3)
)
Loading...
new_ds = ds_hp
display(
    (new_ds.max() - ds.max()).compute(),
    (new_ds.min() - ds.min()).compute(),
    (new_ds.mean() - ds.mean()).compute(),
)
Loading...
Loading...
Loading...