Hack4RiOMAR

Icon of the RiOMar project
DGGS for FAIR2Adapt case studiesWorkathon Overview

Event Highlights & Outcomes

Hack4RiOMar summary and outcomes

The event was attended in person by six motivated participants, who collaborated to tackle the challenges at hand and advance the FAIR2Adapt RiOMar Case Study. We would like to thank the external experts who joined the hackathon, despite not being part of the FAIR2Adapt project, and attended at their own expense.

Participants

Name, organisation, Github username:

Short summary of the Hack4RiOMar workathon

Jupyter Notebooks

We created 20 Jupyter notebooks. Some are still in progress or contain issues we have discovered (e.g., these notebooks did not run successfully).

12 notebooks have successfully run, showcasing and explaining the progress and results of each planned task. Let's dig into each planned task and review the progress:

  1. Create virtualiZarr of RiOMar Data: The goal was to use virtualzarr and icechunk to create a single Virtual RiOMar dataset using VirtualiZarr, aiming to make the RiOMar data accessible from anywhere.
    • This approach is very promising and a Jupyter notebook was developed during the workathon.
    • However, we did not figure out how to use icechunk on non-amazon s3 buckets (the documentation is out-of-sync with the newest alpha version which made our work very difficult).
    • We managed to use Icechunk to write the Virtual Zarr to a local filesystem (we can then push it on any S3 bucket) but we realized that icechunk cannot yet store references to https (RiOMar data is currently stored on Datamor e.g. IFREMER HPC cluster and made accessible to everyone via https).
    • As an alternative solution, we used kerchunk with virtualzarr and write to a local kerchunk parquet and upload it afterwards to a location where averyone can access it (pangeo-eosc S3, or Datamor "portal")
  2. Create an example to demonstrate how to read RiOMar data and perform regridding to a DGGS (Healpix) grid:
  3. Regrid other types of data such as NorESM data and/or GEBCO gridded bathymetry data and/or Argo Kinetic Energy Product to DGGS (Healpix) to assess the robustness of the transformation procedure: NorESM (Norwegian Earth System Modelling) data will be used for another FAIR2Adapt case study. The GEBCO dataset is very popular and would be very useful for the FAIR2Adapt project. Transforming Argo Kinetic Energy Product into DGGS (Healpix) would help to understand if the procedure we have developed is robust. Our goal would be to write a short recipe that could be part of the data publishing pipeline we will develop within the FAIR2Adapt project.
  4. Create Conda Environment: the goal is to be able to use the same conda environment on different platform, including on Datarmor (HPC cluster of IFREMER). The plan is to create two environments: one with Cubed and one with Dask.
  5. Apply regridding to DGGS with different resolution to test multilevel (/multiscale) zarr storage format
  6. Parallelize the transformation procedure and test it on the same datasets used in 3.
  7. Create a RO-Crate for RiOMar Data: Explore the metadata available within the RiOMar data (checking if they truly follow the CF-Convention) and review the preliminary metadata work conducted within the RiOMar project to assess the current state and identify what needs to be done to make the RiOMar data FAIR.
    • Creation of RO-Crate from a RiOMar sample dataset
    • We also tested the new NIRD archive which we plan to use within the FAIR2Adapt project. We archived a sample RiOMAr dataset into the new test archive and created a Jupyter notebook to show how to create a RO-Crate from RiOMar dataset archived on NIRD. This notebook will not work for most users since we have used a test instance of the new archive where only the NIRD archive team and a few test users have access to. It was however very useful to identify which metadata is missing from the RiOMar data (for instance, a unique identifier of the data creator, clear information on the provenance).
  8. Create a STAC Item for RiOMar Dataset

Where to find our work