Create a small sample RiOMar dataset#
Context#
Purpose#
The goal is to create a smaller RiOMar dataset to test regridding to Healpix on Pangeo EOSC.
Description#
In this notebook, we will:
Open a RiOMar data file
Select a few times to reduce the amount of data
Save the transformed data in Zarr
Contributions#
Notebook#
Tina Odaka (author), IFREMER (France), @tinaok
Bibliography and other interesting resources#
Warning
This notebook is designed to run on Datamor, the HPC cluster from IFREMER, where the RiOMar data currently resides. Running the notebook directly on Datamor is necessary because the dataset is large, and processing needs to occur close to the data for efficiency. However, the raw data is openly available online at https://data-fair2adapt.ifremer.fr/riomar/.
To enhance portability, we have included the URLs for the data online. However, executing it on the cloud would be very slow because the original data is in netCDF format, and we are simply reading it without leveraging chunks, for example, using tools like Kerchunk.
How to set up pangeo enviroment on datarmor for Fair2adapt riomar usecase:#
ssh datarmor
micromamba create -n riomar python=3.12 xarray zarr hdf5 ipykernel h5netcdf dask netCDF4 bottleneck scipy cftime numba healpy matplotlib hvplot
pip install git+https://github.com/IAOCEA/xarray-healpy.git
python -m ipykernel install --user --name=riomar
Then connect to https://datarmor-jupyterhub.ifremer.fr/
## Import Libraries
import xarray as xr
import fsspec
from pathlib import Path
Open Croco grid file#
The grid file is either local if you are running on datamor or accessible via
httpsis running elsewhere.
url = "/home/lops-oh-fair2adapt/riomar/misc/croco_grd.nc"
if Path(url).is_file():
# file exists
grid = xr.open_dataset(url, engine='netcdf4')
else:
url = "https://data-fair2adapt.ifremer.fr/riomar//misc/croco_grd.nc"
fs = fsspec.filesystem('https')
grid = xr.open_dataset(fs.open(url))
#grid.to_netcdf('/home/lops-oh-fair2adapt/riomar/misc/croco_grd_hdf5.nc',format='NETCDF4')
grid
<xarray.Dataset> Size: 141MB
Dimensions: (one: 1, eta_rho: 838, xi_rho: 727, bath: 1, eta_u: 838,
xi_u: 726, eta_v: 837, xi_v: 727, eta_psi: 837, xi_psi: 726)
Dimensions without coordinates: one, eta_rho, xi_rho, bath, eta_u, xi_u, eta_v,
xi_v, eta_psi, xi_psi
Data variables: (12/34)
xl (one) float64 8B ...
el (one) float64 8B ...
depthmin (one) float64 8B ...
depthmax (one) float64 8B ...
spherical (one) |S1 1B ...
angle (eta_rho, xi_rho) float64 5MB ...
... ...
lat_v (eta_v, xi_v) float64 5MB ...
lat_psi (eta_psi, xi_psi) float64 5MB ...
mask_rho (eta_rho, xi_rho) float64 5MB ...
mask_u (eta_u, xi_u) float64 5MB ...
mask_v (eta_v, xi_v) float64 5MB ...
mask_psi (eta_psi, xi_psi) float64 5MB ...
Attributes:
title: BOB1000 Model
date: 09-Mar-2023
type: CROCO grid fileOpen one RiOMar model file#
again the file is either available from datamor if you are running on datamor or accessible via
httpselsewhere.
url = "/home/lops-oh-fair2adapt/riomar/GAMAR/GAMAR_1h_inst_Y2006M01.nc"
if Path(url).is_file():
# file exists
ds = xr.open_dataset(url, engine='h5netcdf')[["temp"]]
else:
url = "https://data-fair2adapt.ifremer.fr/riomar/GAMAR/GAMAR_1h_inst_Y2006M01.nc"
fs = fsspec.filesystem('https')
ds = xr.open_dataset(fs.open(url), engine='h5netcdf')[["temp"]]
ds
<xarray.Dataset> Size: 73GB
Dimensions: (time_counter: 744, s_rho: 40, y_rho: 838, x_rho: 727)
Coordinates:
* s_rho (s_rho) float32 160B -0.9875 -0.9625 ... -0.0375 -0.0125
nav_lat_rho (y_rho, x_rho) float32 2MB ...
nav_lon_rho (y_rho, x_rho) float32 2MB ...
time_instant (time_counter) datetime64[ns] 6kB ...
* time_counter (time_counter) datetime64[ns] 6kB 2006-01-01T00:57:45 ... 2...
Dimensions without coordinates: y_rho, x_rho
Data variables:
temp (time_counter, s_rho, y_rho, x_rho) float32 73GB ...
Attributes: (12/45)
name: GAMAR_GLORYS_1h_inst
description: Created by xios
Conventions: CF-1.6
timeStamp: 2024-Apr-02 09:15:02 GMT
uuid: 1563e80a-8c72-4739-a6b5-424221c7cf2b
title: GAMAR_GLORYS
... ...
gamma2_expl: Slipperiness parameter
x_sponge: 0.0
v_sponge: 0.0
sponge_expl: Sponge parameters : extent (m) & viscosity (m2.s-1)
SRCS: main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
CPP-options: REGIONAL GAMAR MPI TIDES OBC_WEST OBC_NORTH XIOS USE_CALE...ds["nav_lon_rho"]=ds["nav_lon_rho"] * 0 + grid.lon_rho.data
ds["nav_lat_rho"]=ds["nav_lat_rho"] * 0 + grid.lat_rho.data
ds["ocean_mask"]=ds.temp.isel(time_counter=0,s_rho=0).notnull()
Save to Zarr#
smallpath = "/home/lops-oh-fair2adapt/riomar/small.zarr"
if not Path(smallpath).exists():
smallpath = "./small.zarr"
small = ds[['temp','ocean_mask']].chunk({'time_counter':1}).isel(time_counter=slice(0,5))
small.to_zarr(smallpath, mode='w')
<xarray.backends.zarr.ZarrStore at 0x74922df79740>
Open the newly created Zarr to check it#
ds = xr.open_dataset(smallpath)
ds
/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/plugins.py:159: RuntimeWarning: 'netcdf4' fails while guessing
warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/plugins.py:159: RuntimeWarning: 'h5netcdf' fails while guessing
warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/plugins.py:159: RuntimeWarning: 'scipy' fails while guessing
warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
<xarray.Dataset> Size: 498MB
Dimensions: (y_rho: 838, x_rho: 727, s_rho: 40, time_counter: 5)
Coordinates:
nav_lat_rho (y_rho, x_rho) float64 5MB ...
nav_lon_rho (y_rho, x_rho) float64 5MB ...
* s_rho (s_rho) float32 160B -0.9875 -0.9625 ... -0.0375 -0.0125
* time_counter (time_counter) datetime64[ns] 40B 2006-01-01T00:57:45 ... 2...
time_instant (time_counter) datetime64[ns] 40B ...
Dimensions without coordinates: y_rho, x_rho
Data variables:
ocean_mask (y_rho, x_rho) bool 609kB ...
temp (time_counter, s_rho, y_rho, x_rho) float32 487MB ...
Attributes: (12/45)
CPP-options: REGIONAL GAMAR MPI TIDES OBC_WEST OBC_NORTH XIOS USE_CALE...
Conventions: CF-1.6
Cs_r: have a look at variable Cs_r in this file
Cs_w: have a look at variable Cs_w in this file
SRCS: main.F step.F read_inp.F timers_roms.F init_scalars.F ini...
Tcline: 15.0
... ...
title: GAMAR_GLORYS
tnu4_expl: biharmonic mixing coefficient for tracers
units: meter4 second-1
uuid: 1563e80a-8c72-4739-a6b5-424221c7cf2b
v_sponge: 0.0
x_sponge: 0.0