Create a small sample RiOMar dataset#

Context#

Purpose#

The goal is to create a smaller RiOMar dataset to test regridding to Healpix on Pangeo EOSC.

Description#

In this notebook, we will:

  • Open a RiOMar data file

  • Select a few times to reduce the amount of data

  • Save the transformed data in Zarr

Contributions#

Notebook#

  • Tina Odaka (author), IFREMER (France), @tinaok

Bibliography and other interesting resources#

Warning

This notebook is designed to run on Datamor, the HPC cluster from IFREMER, where the RiOMar data currently resides. Running the notebook directly on Datamor is necessary because the dataset is large, and processing needs to occur close to the data for efficiency. However, the raw data is openly available online at https://data-fair2adapt.ifremer.fr/riomar/.

To enhance portability, we have included the URLs for the data online. However, executing it on the cloud would be very slow because the original data is in netCDF format, and we are simply reading it without leveraging chunks, for example, using tools like Kerchunk.

How to set up pangeo enviroment on datarmor for Fair2adapt riomar usecase:#

ssh datarmor

micromamba create -n riomar python=3.12 xarray zarr  hdf5 ipykernel h5netcdf dask  netCDF4 bottleneck scipy  cftime numba healpy  matplotlib hvplot
pip install git+https://github.com/IAOCEA/xarray-healpy.git
python -m ipykernel install --user --name=riomar

Then connect to https://datarmor-jupyterhub.ifremer.fr/

## Import Libraries
import xarray as xr
import fsspec
from pathlib import Path

Open Croco grid file#

  • The grid file is either local if you are running on datamor or accessible via https is running elsewhere.

url = "/home/lops-oh-fair2adapt/riomar/misc/croco_grd.nc"
if Path(url).is_file():
    # file exists
    grid = xr.open_dataset(url, engine='netcdf4')
else:
    url = "https://data-fair2adapt.ifremer.fr/riomar//misc/croco_grd.nc"
    fs = fsspec.filesystem('https')
    grid = xr.open_dataset(fs.open(url))

#grid.to_netcdf('/home/lops-oh-fair2adapt/riomar/misc/croco_grd_hdf5.nc',format='NETCDF4')
grid
<xarray.Dataset> Size: 141MB
Dimensions:    (one: 1, eta_rho: 838, xi_rho: 727, bath: 1, eta_u: 838,
                xi_u: 726, eta_v: 837, xi_v: 727, eta_psi: 837, xi_psi: 726)
Dimensions without coordinates: one, eta_rho, xi_rho, bath, eta_u, xi_u, eta_v,
                                xi_v, eta_psi, xi_psi
Data variables: (12/34)
    xl         (one) float64 8B ...
    el         (one) float64 8B ...
    depthmin   (one) float64 8B ...
    depthmax   (one) float64 8B ...
    spherical  (one) |S1 1B ...
    angle      (eta_rho, xi_rho) float64 5MB ...
    ...         ...
    lat_v      (eta_v, xi_v) float64 5MB ...
    lat_psi    (eta_psi, xi_psi) float64 5MB ...
    mask_rho   (eta_rho, xi_rho) float64 5MB ...
    mask_u     (eta_u, xi_u) float64 5MB ...
    mask_v     (eta_v, xi_v) float64 5MB ...
    mask_psi   (eta_psi, xi_psi) float64 5MB ...
Attributes:
    title:    BOB1000 Model
    date:     09-Mar-2023
    type:     CROCO grid file

Open one RiOMar model file#

  • again the file is either available from datamor if you are running on datamor or accessible via https elsewhere.

url = "/home/lops-oh-fair2adapt/riomar/GAMAR/GAMAR_1h_inst_Y2006M01.nc"

if Path(url).is_file():
    # file exists
    ds = xr.open_dataset(url, engine='h5netcdf')[["temp"]]
else:
    url = "https://data-fair2adapt.ifremer.fr/riomar/GAMAR/GAMAR_1h_inst_Y2006M01.nc"
    fs = fsspec.filesystem('https')
    ds = xr.open_dataset(fs.open(url), engine='h5netcdf')[["temp"]]

ds
<xarray.Dataset> Size: 73GB
Dimensions:       (time_counter: 744, s_rho: 40, y_rho: 838, x_rho: 727)
Coordinates:
  * s_rho         (s_rho) float32 160B -0.9875 -0.9625 ... -0.0375 -0.0125
    nav_lat_rho   (y_rho, x_rho) float32 2MB ...
    nav_lon_rho   (y_rho, x_rho) float32 2MB ...
    time_instant  (time_counter) datetime64[ns] 6kB ...
  * time_counter  (time_counter) datetime64[ns] 6kB 2006-01-01T00:57:45 ... 2...
Dimensions without coordinates: y_rho, x_rho
Data variables:
    temp          (time_counter, s_rho, y_rho, x_rho) float32 73GB ...
Attributes: (12/45)
    name:           GAMAR_GLORYS_1h_inst
    description:    Created by xios
    Conventions:    CF-1.6
    timeStamp:      2024-Apr-02 09:15:02 GMT
    uuid:           1563e80a-8c72-4739-a6b5-424221c7cf2b
    title:          GAMAR_GLORYS
    ...             ...
    gamma2_expl:    Slipperiness parameter
    x_sponge:       0.0
    v_sponge:       0.0
    sponge_expl:    Sponge parameters : extent (m) & viscosity (m2.s-1)
    SRCS:           main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
    CPP-options:    REGIONAL GAMAR MPI TIDES OBC_WEST OBC_NORTH XIOS USE_CALE...
ds["nav_lon_rho"]=ds["nav_lon_rho"] * 0 + grid.lon_rho.data
ds["nav_lat_rho"]=ds["nav_lat_rho"] * 0 + grid.lat_rho.data
ds["ocean_mask"]=ds.temp.isel(time_counter=0,s_rho=0).notnull()

Save to Zarr#

smallpath = "/home/lops-oh-fair2adapt/riomar/small.zarr"

if not Path(smallpath).exists():
    smallpath = "./small.zarr"
    

small = ds[['temp','ocean_mask']].chunk({'time_counter':1}).isel(time_counter=slice(0,5))
small.to_zarr(smallpath, mode='w')
<xarray.backends.zarr.ZarrStore at 0x74922df79740>

Open the newly created Zarr to check it#

ds = xr.open_dataset(smallpath)
ds
/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/plugins.py:159: RuntimeWarning: 'netcdf4' fails while guessing
  warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/plugins.py:159: RuntimeWarning: 'h5netcdf' fails while guessing
  warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/plugins.py:159: RuntimeWarning: 'scipy' fails while guessing
  warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
<xarray.Dataset> Size: 498MB
Dimensions:       (y_rho: 838, x_rho: 727, s_rho: 40, time_counter: 5)
Coordinates:
    nav_lat_rho   (y_rho, x_rho) float64 5MB ...
    nav_lon_rho   (y_rho, x_rho) float64 5MB ...
  * s_rho         (s_rho) float32 160B -0.9875 -0.9625 ... -0.0375 -0.0125
  * time_counter  (time_counter) datetime64[ns] 40B 2006-01-01T00:57:45 ... 2...
    time_instant  (time_counter) datetime64[ns] 40B ...
Dimensions without coordinates: y_rho, x_rho
Data variables:
    ocean_mask    (y_rho, x_rho) bool 609kB ...
    temp          (time_counter, s_rho, y_rho, x_rho) float32 487MB ...
Attributes: (12/45)
    CPP-options:    REGIONAL GAMAR MPI TIDES OBC_WEST OBC_NORTH XIOS USE_CALE...
    Conventions:    CF-1.6
    Cs_r:           have a look at variable Cs_r in this file
    Cs_w:           have a look at variable Cs_w in this file
    SRCS:           main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
    Tcline:         15.0
    ...             ...
    title:          GAMAR_GLORYS
    tnu4_expl:      biharmonic mixing coefficient for tracers
    units:          meter4 second-1
    uuid:           1563e80a-8c72-4739-a6b5-424221c7cf2b
    v_sponge:       0.0
    x_sponge:       0.0