Skip to article frontmatterSkip to article content

Example to load data from NRDA

Learn how to load data from the Norwegian Research Data Archive with rocrate

Simula Research Laboratory

Introduction

The Norwegian Research Data Archive (NIRD RDA), managed by Sigma2, is the Norwegian national open-access repository for research data, built on the open-source CKAN platform. It aims at supporting the FAIR principles, enabling the discovery, access, and reuse of datasets across scientific domains. With nearly 1,000 TB of data, the archive facilitates Open Science by providing persistent identifiers and rich metadata.

Below, we demonstrate how to access a dataset from the NIRD RDA using the rocrate Python library, leveraging RO-Crate metadata to retrieve and process a NetCDF file, as shown in the following example.

Setup

  • Install requirements e.g. Python packages;
  • Start importing the necessary libraries.
pip install rocrate cmcrameri
Requirement already satisfied: rocrate in /srv/conda/envs/notebook/lib/python3.12/site-packages (0.13.0)
Collecting cmcrameri
  Using cached cmcrameri-1.9-py3-none-any.whl.metadata (4.6 kB)
Requirement already satisfied: requests in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (2.32.3)
Requirement already satisfied: arcp==0.2.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (0.2.1)
Requirement already satisfied: jinja2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (3.1.5)
Requirement already satisfied: python-dateutil in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (2.9.0)
Requirement already satisfied: click in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (8.1.8)
Requirement already satisfied: matplotlib in /srv/conda/envs/notebook/lib/python3.12/site-packages (from cmcrameri) (3.10.0)
Requirement already satisfied: numpy in /srv/conda/envs/notebook/lib/python3.12/site-packages (from cmcrameri) (2.0.2)
Requirement already satisfied: packaging in /srv/conda/envs/notebook/lib/python3.12/site-packages (from cmcrameri) (24.2)
Requirement already satisfied: MarkupSafe>=2.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from jinja2->rocrate) (3.0.2)
Requirement already satisfied: contourpy>=1.0.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->cmcrameri) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->cmcrameri) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->cmcrameri) (4.55.4)
Requirement already satisfied: kiwisolver>=1.3.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->cmcrameri) (1.4.8)
Requirement already satisfied: pillow>=8 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->cmcrameri) (11.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->cmcrameri) (3.2.1)
Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from python-dateutil->rocrate) (1.17.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->rocrate) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->rocrate) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->rocrate) (1.26.19)
Requirement already satisfied: certifi>=2017.4.17 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->rocrate) (2024.12.14)
Using cached cmcrameri-1.9-py3-none-any.whl (277 kB)
Installing collected packages: cmcrameri
Successfully installed cmcrameri-1.9
Note: you may need to restart the kernel to use updated packages.
import requests
import tempfile
import json
import os
from rocrate.rocrate import ROCrate
from rocrate.model.person import Person

import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import cmcrameri

Input Parameters

Currently, only temporary credentials are available. You need to access the archive to obtain a valid link and download the RO-Crate promptly, as the credentials expire after a few minutes.

Visit the dataset at https://data.archive-sandbox.sigma2.no/dataset/relative-humidity-over-small-sub-region2, and check the metadata to get the credentials.

# URL of the RO-Crate metadata file
url = "https://s3.nird.sigma2.no/archive-sandbox-ro/3888d382-6269-479c-9448-701ad6e3fa74_metadata/dataset_metadata_10.82969_2025.pitlyjt0.json?response-content-disposition=attachment&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=TRnFBoNN9N9QfuRYw5mX%2F20250518%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20250518T175123Z&X-Amz-Expires=60&X-Amz-SignedHeaders=host&X-Amz-Signature=f089662d8f3c95f9fcde3fc9da0d00b7ff8a7ce2d0c2b8598a94d6f06f9ef30a"
# Output directory and file path
data_dir = "../data"
metadata_path = os.path.join(data_dir, "ro-crate-metadata.json")

Retrieve RO-Crate for a given dataset

# Create data directory if it doesn't exist
os.makedirs(data_dir, exist_ok=True)

if not os.path.exists(metadata_path):
    # Download metadata
    response = requests.get(url)
    if response.status_code != 200:
        raise Exception(f"Failed to download metadata: {response.status_code} - {response.text}")

    # Save as data/ro-crate-metadata.json
    with open(metadata_path, "wb") as f:
        f.write(response.content)
    print(f"Metadata saved to: {metadata_path}")

Load RO-Crate to access collection

crate = ROCrate(data_dir)
root_dataset = crate.root_dataset

# Print metadata
print("=== Dataset Metadata ===")
print(f"Name: {root_dataset.get('name', 'Unnamed')}")
print(f"Description: {root_dataset.get('description', 'No description')}")
print(f"DOI: {root_dataset.get('identifier', 'No DOI')}")
print(f"Author: {root_dataset.get('author', 'Unknown')}")
print(f"License: {root_dataset.get('license', {}).get('@id', 'No license')}")
print(f"Date Published: {root_dataset.get('datePublished', 'Unknown')}")
print(f"Temporal Coverage: {root_dataset.get('temporalCoverage', 'Unknown')}")
location_entity = crate.dereference(root_dataset.get('location'))
print(f"Geospatial Coverage: {location_entity.get('polygon', 'Unknown') if location_entity else 'Unknown'}")

# Process hasPart files
print("\n=== Data Files ===")
files = []
for part in root_dataset.get('hasPart', []):
    part_id = part if isinstance(part, str) else getattr(part, 'id', None)
    if not part_id:
        continue
    part_entity = crate.dereference(part_id)
    if not part_entity or "File" not in part_entity.type:
        continue
    file_url = part_entity.id
    if not (file_url.endswith(".nc") or file_url.endswith(".zarr")):
        continue
    print(f"File part of the collection: {file_url}")
    try:
        files.append(file_url)
        print(f"\n=== Dataset found and opened with xarray ({file_url}) ===")
    except Exception as e:
        print(f"Failed to open {file_url}: {e}")
=== Dataset Metadata ===
Name: Relative humidity over small sub-region
Description: rh_mean_july_1980_2018_small.nc

Relative humidity (%) monthly values for July (year 1980 and 2018).
DOI: https://doi.org/10.82969/2025.pitlyjt0
Author: ['annef@simula.no']
License: https://creativecommons.org/licenses/by/4.0/
Date Published: 2025-05-18
Temporal Coverage: 1980-06-30T21:00:00.000Z/2018-06-30T21:00:00.000Z
Geospatial Coverage: Unknown

=== Data Files ===
File part of the collection: https://data.archive-sandbox.sigma2.no/dataset/3888d382-6269-479c-9448-701ad6e3fa74/download/rh_mean_july_1980_2018_small.nc

=== Dataset found and opened with xarray (https://data.archive-sandbox.sigma2.no/dataset/3888d382-6269-479c-9448-701ad6e3fa74/download/rh_mean_july_1980_2018_small.nc) ===

Access netCDF file from the dataset collection

dset = xr.open_dataset(files[0], engine="h5netcdf") if file_url.endswith(".nc") else xr.open_zarr(files[0])
dset
Loading...

Plot Relative Humidity

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5), subplot_kw={'projection': ccrs.PlateCarree()})

# Time points and human-readable labels
times = dset['time'].values
time_labels = [pd.Timestamp(t).strftime('%Y-%m-%d') for t in times]

# Plot for 1980-07-01
im1 = dset.sel(time=times[0]).R.plot(ax=ax1, cmap='Blues', vmin=0, vmax=100,
                                     cbar_kwargs={'label': 'Relative Humidity (%)'})
ax1.coastlines()
ax1.add_feature(cfeature.BORDERS, linestyle=':')
ax1.set_title(f"Relative Humidity ({time_labels[0]})")
ax1.set_xlabel('Longitude')
ax1.set_ylabel('Latitude')
ax1.gridlines(draw_labels=True, dms=True, x_inline=False, y_inline=False)

# Plot for 2018-07-01
im2 = dset.sel(time=times[1]).R.plot(ax=ax2, cmap='Blues', vmin=0, vmax=100,
                                 cbar_kwargs={'label': 'Relative Humidity (%)'})
ax2.coastlines()
ax2.add_feature(cfeature.BORDERS, linestyle=':')
ax2.set_title(f"Relative Humidity ({time_labels[1]})")
ax2.set_xlabel('Longitude')
ax2.set_ylabel('Latitude')
ax2.gridlines(draw_labels=True, dms=True, x_inline=False, y_inline=False)

# Adjust layout and display
plt.tight_layout()
plt.show()
<Figure size 1200x500 with 4 Axes>