This repository provides an end-to-end workflow to regrid ocean model data (RiOMar/GAMAR) from curvilinear grids to HEALPix (DGGS) format using xarray, Dask, and Kerchunk. It runs both locally (HTTPS mode) and on HPC infrastructure.
Define a Region Of Interest (ROI) from a lon/lat bounding box
Prepare a temporary small Zarr dataset (for fast iteration and reproducible testing)
Regrid variables to HEALPix using
healpix_regrid(viaxarray.apply_ufunc)Scale the same workflow to the full dataset on HPC and publish the resulting Zarr
The workflow assumes geographic coordinates in EPSG:4326 and HEALPix (WGS84) nested indexing.
Installation¶
# 1. Create the conda environment
conda env create -f notebook/environment.yml
conda activate riomar
# 2. Install the healpix_regrid package in editable mode
pip install -e ".[test]"Running tests¶
python -m pytest tests/ -vRepository structure¶
healpix_regrid/ Reusable Python package (masking, kerchunk, dask, regridding)
tests/ Pytest test suite for healpix_regrid
notebook/ Jupyter notebooks (interactive workflow & exploration)
bin/ Python scripts for HPC batch runs
singularity_images/ Singularity container definitionsWorkflow (recommended order)¶
A. Create ROI from lon/lat bbox¶
Notebook: Create_ROI_from_bbox.ipynb
Purpose - Convert a lon/lat bounding box (EPSG:4326) into a HEALPix (nested) ROI\
Export parent-level cell IDs for fast indexing and masking\
Export an outer boundary footprint (GeoJSON) for plotting and polygon masking
Input - Bounding box (min_lon, min_lat, max_lon, max_lat) in
EPSG:4326\
child_level\parent_level\edge_level
Output - HEALPix ROI cells at parent level: parent_ids.npz\
Boundary footprint (for masking/subsetting before regridding):
outer_boundary.geojson
Notes - The notebook computes child-level cells covering the bbox, then maps them to the parent level and builds polygons.
B. Prepare a temporary Zarr (fast iteration dataset)¶
Notebook: Prep_regrid.ipynb
Purpose - Open RiOMar data via a Kerchunk catalog (HPC filesystem or HTTPS export)\
Apply a spatial subset or mask using the ROI from step A\
Write a lightweight Zarr locally (or on HPC scratch) to speed up development
Output - A temporary Zarr dataset (e.g. small.zarr) used as input
for the regridding notebook
Tip - Set OUT_ZARR to an existing path on your machine or HPC
scratch.
C. Regrid to HEALPix using apply_ufunc¶
Notebook: regrid_apply_ufunc.ipynb
Purpose - Load the temporary Zarr created in B\
Regrid selected variables to HEALPix using
healpix_regrid.to_healpix\Align output to ROI-derived
cell_ids(drop extra cells, fill missing withnp.nan)\Save a HEALPix-aligned, chunked Zarr
Output - A HEALPix-indexed dataset with a cell_ids coordinate
(nested indexing)
Scaling to HPC¶
On HPC (Datarmor), the bin/ scripts run the same pipeline as the notebooks.
Scripts auto-detect the environment by checking whether the HPC filesystem
(/scale/project/lops-oh-fair2adapt/) exists.
# Submit a PBS job
qsub bin/submit.shSingularity container definitions are in singularity_images/f2a_riomar/
(layered build: hardened Debian base -> conda scientific stack -> JupyterHub).
Conventions / Expected Variables¶
Geographic coordinates:
nav_lon_*,nav_lat_*in degrees\HEALPix:
Nested indexing\
cell_idscoordinate\levelattribute (e.g. 13)\WGS84 ellipsoid
Troubleshooting¶
ROI boundary “edge” artifact
Increaseedge_levelinCreate_ROI_from_bbox.ipynb.Memory spikes
Write intermediate results to Zarr and rechunk. Avoid premature.compute()calls.Mismatch between stacked dims and ufunc core dims
Ensurestack(point=(...))matchesinput_core_dims=[["point"]].
Extra notebooks¶
Define_healpix_Parent_chunk.ipynb— Experiments on parent levels together with bounding boxes.simple_regrid.ipynb— Tests alternative regridding methods available inhealpix-resample.M1_*/M2_*notebooks — Experiments with Kerchunk/Icechunk/VirtualiZarr creation and loading on Datarmor.
Contributing¶
See Contributing to RIOMAR / healpix_regrid for guidelines on how to contribute.
License¶
This project is licensed under the Apache License 2.0.