# Create a small sample RiOMar dataset

## Context

### Purpose

The goal is to create a smaller RiOMar dataset to test regridding to Healpix on Pangeo EOSC.

### Description

In this notebook, we will:
- Open a RiOMar data file
- Select a few times to reduce the amount of data
- Save the transformed data in Zarr

## Contributions

### Notebook


- Tina Odaka (author), IFREMER (France), @tinaok

## Bibliography and other interesting resources

- [RiOMar](https://coast.ifremer.fr/Laboratoires-Environnement-Ressources/LER-Pertuis-Charentais-La-Tremblade/Projets/RIOMAR-2024-2030)


```{warning}
This notebook is designed to run on Datamor, the HPC cluster from IFREMER, where the RiOMar data currently resides. Running the notebook directly on Datamor is necessary because the dataset is large, and processing needs to occur close to the data for efficiency. However, the raw data is openly available online at `https://data-fair2adapt.ifremer.fr/riomar/`.

To enhance portability, we have included the URLs for the data online. **However, executing it on the cloud would be very slow because the original data is in netCDF format, and we are simply reading it without leveraging chunks, for example, using tools like Kerchunk.**

```

## How to set up pangeo enviroment on datarmor for Fair2adapt riomar usecase: 

```bash
ssh datarmor

micromamba create -n riomar python=3.12 xarray zarr  hdf5 ipykernel h5netcdf dask  netCDF4 bottleneck scipy  cftime numba healpy  matplotlib hvplot
pip install git+https://github.com/IAOCEA/xarray-healpy.git
python -m ipykernel install --user --name=riomar

```

Then connect to  `https://datarmor-jupyterhub.ifremer.fr/`


In [None]:
## Import Libraries

In [6]:
import xarray as xr
import fsspec
from pathlib import Path

## Open Croco grid file 
- The grid file is either local if you are running on datamor or accessible via `https` is running elsewhere.

In [7]:
url = "/home/lops-oh-fair2adapt/riomar/misc/croco_grd.nc"
if Path(url).is_file():
    # file exists
    grid = xr.open_dataset(url, engine='netcdf4')
else:
    url = "https://data-fair2adapt.ifremer.fr/riomar//misc/croco_grd.nc"
    fs = fsspec.filesystem('https')
    grid = xr.open_dataset(fs.open(url))

#grid.to_netcdf('/home/lops-oh-fair2adapt/riomar/misc/croco_grd_hdf5.nc',format='NETCDF4')
grid

## Open one RiOMar model file
- again the file is either available from datamor if you are running on datamor or accessible via `https` elsewhere.

In [10]:
url = "/home/lops-oh-fair2adapt/riomar/GAMAR/GAMAR_1h_inst_Y2006M01.nc"

if Path(url).is_file():
    # file exists
    ds = xr.open_dataset(url, engine='h5netcdf')[["temp"]]
else:
    url = "https://data-fair2adapt.ifremer.fr/riomar/GAMAR/GAMAR_1h_inst_Y2006M01.nc"
    fs = fsspec.filesystem('https')
    ds = xr.open_dataset(fs.open(url), engine='h5netcdf')[["temp"]]

ds

In [11]:
ds["nav_lon_rho"]=ds["nav_lon_rho"] * 0 + grid.lon_rho.data
ds["nav_lat_rho"]=ds["nav_lat_rho"] * 0 + grid.lat_rho.data
ds["ocean_mask"]=ds.temp.isel(time_counter=0,s_rho=0).notnull()

## Save to Zarr

In [12]:
smallpath = "/home/lops-oh-fair2adapt/riomar/small.zarr"

if not Path(smallpath).exists():
    smallpath = "./small.zarr"
    

small = ds[['temp','ocean_mask']].chunk({'time_counter':1}).isel(time_counter=slice(0,5))
small.to_zarr(smallpath, mode='w')

<xarray.backends.zarr.ZarrStore at 0x74922df79740>

## Open the newly created Zarr to check it

In [13]:
ds = xr.open_dataset(smallpath)
ds

