Example to load data from NRDA
Learn how to load data from the Norwegian Research Data Archive with rocrate
Introduction¶
The Norwegian Research Data Archive (NIRD RDA), managed by Sigma2, is the Norwegian national open-access repository for research data, built on the open-source CKAN platform. It aims at supporting the FAIR principles, enabling the discovery, access, and reuse of datasets across scientific domains. With nearly 1,000 TB of data, the archive facilitates Open Science by providing persistent identifiers and rich metadata.
Below, we demonstrate how to access a dataset from the NIRD RDA using the rocrate Python library, leveraging RO-Crate metadata to retrieve and process a NetCDF file, as shown in the following example.
Setup¶
- Install requirements e.g. Python packages;
- Start importing the necessary libraries.
pip install rocrate cmcrameriRequirement already satisfied: rocrate in /srv/conda/envs/notebook/lib/python3.12/site-packages (0.13.0)
Collecting cmcrameri
Using cached cmcrameri-1.9-py3-none-any.whl.metadata (4.6 kB)
Requirement already satisfied: requests in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (2.32.3)
Requirement already satisfied: arcp==0.2.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (0.2.1)
Requirement already satisfied: jinja2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (3.1.5)
Requirement already satisfied: python-dateutil in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (2.9.0)
Requirement already satisfied: click in /srv/conda/envs/notebook/lib/python3.12/site-packages (from rocrate) (8.1.8)
Requirement already satisfied: matplotlib in /srv/conda/envs/notebook/lib/python3.12/site-packages (from cmcrameri) (3.10.0)
Requirement already satisfied: numpy in /srv/conda/envs/notebook/lib/python3.12/site-packages (from cmcrameri) (2.0.2)
Requirement already satisfied: packaging in /srv/conda/envs/notebook/lib/python3.12/site-packages (from cmcrameri) (24.2)
Requirement already satisfied: MarkupSafe>=2.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from jinja2->rocrate) (3.0.2)
Requirement already satisfied: contourpy>=1.0.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->cmcrameri) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->cmcrameri) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->cmcrameri) (4.55.4)
Requirement already satisfied: kiwisolver>=1.3.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->cmcrameri) (1.4.8)
Requirement already satisfied: pillow>=8 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->cmcrameri) (11.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->cmcrameri) (3.2.1)
Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from python-dateutil->rocrate) (1.17.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->rocrate) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->rocrate) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->rocrate) (1.26.19)
Requirement already satisfied: certifi>=2017.4.17 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests->rocrate) (2024.12.14)
Using cached cmcrameri-1.9-py3-none-any.whl (277 kB)
Installing collected packages: cmcrameri
Successfully installed cmcrameri-1.9
Note: you may need to restart the kernel to use updated packages.
import requests
import tempfile
import json
import os
from rocrate.rocrate import ROCrate
from rocrate.model.person import Person
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import cmcrameriInput Parameters¶
Currently, only temporary credentials are available. You need to access the archive to obtain a valid link and download the RO-Crate promptly, as the credentials expire after a few minutes.
Visit the dataset at https://
# URL of the RO-Crate metadata file
url = "https://s3.nird.sigma2.no/archive-sandbox-ro/3888d382-6269-479c-9448-701ad6e3fa74_metadata/dataset_metadata_10.82969_2025.pitlyjt0.json?response-content-disposition=attachment&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=TRnFBoNN9N9QfuRYw5mX%2F20250518%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20250518T175123Z&X-Amz-Expires=60&X-Amz-SignedHeaders=host&X-Amz-Signature=f089662d8f3c95f9fcde3fc9da0d00b7ff8a7ce2d0c2b8598a94d6f06f9ef30a"
# Output directory and file path
data_dir = "../data"
metadata_path = os.path.join(data_dir, "ro-crate-metadata.json")Retrieve RO-Crate for a given dataset¶
# Create data directory if it doesn't exist
os.makedirs(data_dir, exist_ok=True)
if not os.path.exists(metadata_path):
# Download metadata
response = requests.get(url)
if response.status_code != 200:
raise Exception(f"Failed to download metadata: {response.status_code} - {response.text}")
# Save as data/ro-crate-metadata.json
with open(metadata_path, "wb") as f:
f.write(response.content)
print(f"Metadata saved to: {metadata_path}")Load RO-Crate to access collection¶
crate = ROCrate(data_dir)root_dataset = crate.root_dataset
# Print metadata
print("=== Dataset Metadata ===")
print(f"Name: {root_dataset.get('name', 'Unnamed')}")
print(f"Description: {root_dataset.get('description', 'No description')}")
print(f"DOI: {root_dataset.get('identifier', 'No DOI')}")
print(f"Author: {root_dataset.get('author', 'Unknown')}")
print(f"License: {root_dataset.get('license', {}).get('@id', 'No license')}")
print(f"Date Published: {root_dataset.get('datePublished', 'Unknown')}")
print(f"Temporal Coverage: {root_dataset.get('temporalCoverage', 'Unknown')}")
location_entity = crate.dereference(root_dataset.get('location'))
print(f"Geospatial Coverage: {location_entity.get('polygon', 'Unknown') if location_entity else 'Unknown'}")
# Process hasPart files
print("\n=== Data Files ===")
files = []
for part in root_dataset.get('hasPart', []):
part_id = part if isinstance(part, str) else getattr(part, 'id', None)
if not part_id:
continue
part_entity = crate.dereference(part_id)
if not part_entity or "File" not in part_entity.type:
continue
file_url = part_entity.id
if not (file_url.endswith(".nc") or file_url.endswith(".zarr")):
continue
print(f"File part of the collection: {file_url}")
try:
files.append(file_url)
print(f"\n=== Dataset found and opened with xarray ({file_url}) ===")
except Exception as e:
print(f"Failed to open {file_url}: {e}")=== Dataset Metadata ===
Name: Relative humidity over small sub-region
Description: rh_mean_july_1980_2018_small.nc
Relative humidity (%) monthly values for July (year 1980 and 2018).
DOI: https://doi.org/10.82969/2025.pitlyjt0
Author: ['annef@simula.no']
License: https://creativecommons.org/licenses/by/4.0/
Date Published: 2025-05-18
Temporal Coverage: 1980-06-30T21:00:00.000Z/2018-06-30T21:00:00.000Z
Geospatial Coverage: Unknown
=== Data Files ===
File part of the collection: https://data.archive-sandbox.sigma2.no/dataset/3888d382-6269-479c-9448-701ad6e3fa74/download/rh_mean_july_1980_2018_small.nc
=== Dataset found and opened with xarray (https://data.archive-sandbox.sigma2.no/dataset/3888d382-6269-479c-9448-701ad6e3fa74/download/rh_mean_july_1980_2018_small.nc) ===
Access netCDF file from the dataset collection¶
dset = xr.open_dataset(files[0], engine="h5netcdf") if file_url.endswith(".nc") else xr.open_zarr(files[0])
dsetPlot Relative Humidity¶
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5), subplot_kw={'projection': ccrs.PlateCarree()})
# Time points and human-readable labels
times = dset['time'].values
time_labels = [pd.Timestamp(t).strftime('%Y-%m-%d') for t in times]
# Plot for 1980-07-01
im1 = dset.sel(time=times[0]).R.plot(ax=ax1, cmap='Blues', vmin=0, vmax=100,
cbar_kwargs={'label': 'Relative Humidity (%)'})
ax1.coastlines()
ax1.add_feature(cfeature.BORDERS, linestyle=':')
ax1.set_title(f"Relative Humidity ({time_labels[0]})")
ax1.set_xlabel('Longitude')
ax1.set_ylabel('Latitude')
ax1.gridlines(draw_labels=True, dms=True, x_inline=False, y_inline=False)
# Plot for 2018-07-01
im2 = dset.sel(time=times[1]).R.plot(ax=ax2, cmap='Blues', vmin=0, vmax=100,
cbar_kwargs={'label': 'Relative Humidity (%)'})
ax2.coastlines()
ax2.add_feature(cfeature.BORDERS, linestyle=':')
ax2.set_title(f"Relative Humidity ({time_labels[1]})")
ax2.set_xlabel('Longitude')
ax2.set_ylabel('Latitude')
ax2.gridlines(draw_labels=True, dms=True, x_inline=False, y_inline=False)
# Adjust layout and display
plt.tight_layout()
plt.show()