๐ŸŒŠ Welcome!

Hands-On with the EDITO Data API

Learn to explore, search, and use marine data from the EDITO Data Lake

๐Ÿ‘จโ€๐Ÿซ Presented by Samuel Fooks (VLIZ)

For all the PDFs and code, check out the workshop GitHub repository

Funded by the European Union

๐ŸŒ What is EDITO?

EDITO stands for the European Digital Twin of the Ocean.

๐Ÿงญ It is a European infrastructure to:

  • Integrate marine data, models, and services
  • Support marine policy (e.g. the Green Deal)
  • Help connect EU/national initiatives and citizen science

๐ŸŒ Offers:

  • Open API access to curated datasets
  • Analysis-ready formats (Zarr, Parquet, COG)
  • Tools to publish, process, and visualize ocean data
Funded by the European Union

Data in EDITO

The data available in the EU DTO consists of a STAC (SpatioTemporal Asset Catalog) as well Data storage on S3 buckets

EDITO Data Lake

Funded by the European Union

๐Ÿ—„๏ธ EDITO Data Storage

EDITO Data Lake uses modern cloud storage solutions to host public datasets. These datasets are stored in:

  • S3-compatible object storage
  • Access via URL, anonymous or secure
  • High performance, cloud-native data formats

๐ŸŒ Explore: 38 million occurrence records

Funded by the European Union

๐Ÿ—‚๏ธ EDITO STAC

EDITO offers a standardized STAC (SpatioTemporal Asset Catalog) built on CMEMS and EMODnet data, designed to integrate diverse marine and environmental datasets.

  • ๐ŸŒ Based on OGC STAC API for easy discovery and access
  • ๐ŸŒ Integrates data from multiple domains (ocean, climate, biodiversity)
  • ๐Ÿ”Ž Search by time, space, type โ€” with direct links to S3-hosted assets
  • ๐Ÿค Supports both human users and automated workflows

A gateway to an interoperable ocean of FAIR data

Funded by the European Union

๐ŸŒ What is STAC?

STAC = SpatioTemporal Asset Catalog

A community standard for:

  • Describing Earth-observation data
  • Providing metadata for geospatial assets

Used across satellites, models, and in-situ data.

๐Ÿ“š Learn more: stacspec.org

Funded by the European Union

๐Ÿงฑ STAC Structure

๐Ÿ”น Catalogs โ€“ High-level groupings (e.g., "All CMEMS data")
๐Ÿ”น Collections โ€“ Thematic datasets (e.g., temperature, sea level)
๐Ÿ”น Items โ€“ Individual assets with time+space (e.g., file for 2024-01-01)
๐Ÿ”น Assets โ€“ Actual data files: GeoTIFF, Zarr, Parquet...

Each has consistent metadata (bbox, datetime, etc.)

Funded by the European Union

๐Ÿ” Use the EDITO STAC Viewer

viewer.dive.edito.eu

We can follow the STAC structure to the EUROBIS database exported in parquet

Catalog -> Catalog -> Collection -> Item
EMODnet -> Biodiversity -> Occurrence data -> Occurrence data eurobis database observations

Funded by the European Union

DEMO Using STAC Viewer

Can also view in your browser radiantearth.github.io/stac-browser

Funded by the European Union

Search EDITO STAC via the API

Base URL for STAC:

https://api.dive.edito.eu/data/

๐Ÿ“– Docs: Interact with Data API

Funded by the European Union

What is ARCO Data?

ARCO = Analysis Ready Cloud Optimized

EDITO adopts modern cloud-friendly formats:

  • High performance
  • Scalable access
  • Efficient for machine learning, large analytics

Let's explore each format!

Funded by the European Union

๐ŸงŠ Zarr Format

Zarr is used for chunked N-dimensional arrays (like NetCDF but cloud-native)

โœ… Ideal for model outputs, time series, climate reanalyses
โœ… Works well with xarray, kerchunk, zarr-python

๐Ÿ”— zarr.readthedocs.io

import zarr
import xarray as xr

xr.open_zarr("https://s3...zarr/", consolidated=True)
Funded by the European Union

๐Ÿ—ƒ๏ธ Parquet and GeoParquet

Parquet = columnar tabular format, very efficient
GeoParquet = Parquet + geospatial metadata

โœ… Good for point observations, events, tracks, etc.
โœ… Efficient for large queries and spatial joins

๐Ÿ”— parquet.apache.org
๐Ÿ”— geoparquet.org

Funded by the European Union

๐Ÿ“ Access Parquet/GeoParquet via Arrow (Python)

import pyarrow.dataset as ds
import s3fs

fs = s3fs.S3FileSystem(anon=True)
dataset = ds.dataset("s3://...your-parquet-folder...",
                     filesystem=fs, format="parquet")

df = dataset.to_table().to_pandas()
print(df.head())
Funded by the European Union

Lets Explore the EDITO STAC, find an ARCO dataset from Biodiversity

viewer.dive.edito.eu

Funded by the European Union

Reading parquet

Lets go read that parquet
https://s3.waw3-1.cloudferro.com/emodnet/biology/eurobis_occurrence_data/eurobis_occurrences_geoparquet_2024-10-01.parquet

Using a pre configured service on EDITO explore_data/view_parquet

Funded by the European Union

๐Ÿ”Exploring STAC via the API (Python)

import pystac_client

url = "https://api.dive.edito.eu/data/collections"
editocollections = pystac_client.Client.open(url)
collections = list(editocollections.get_collections())

print("Found collections:", len(collections))
for col in collections[:5]:
    print(col.id, ":", col.title)
    items = col.get_items()
    itemlist = list(items)
    for item in itemlist:
        print(item.properties['title'])
        print(item.assets)
Funded by the European Union

๐Ÿงช ๐Ÿ”Exploring STAC via the API (R)

library(rstac)

stac_endpoint <- "https://api.dive.edito.eu/data/"
collections <- stac(stac_endpoint) %>%
  rstac::collections() %>%
  get_request()

length(collections$collections)  # how many

๐Ÿ‘‰ R packages like arrow, sf, terra also help with asset processing.

Funded by the European Union

๐Ÿ“Œ Recap: What You Can Now Do

โœ… Understand the EDITO API and data stack
โœ… Find and filter collections/items
โœ… Read Parquet or Zarr data with Python or R

๐Ÿงญ Go explore: my-ocean.dive.edito.eu
viewer.dive.edito.eu
๐Ÿ’ฌ Questions?
๐Ÿ“ง Reach us at: edito-infra-dev@mercator-ocean.eu
๐Ÿ”— Docs: Interact with EDITO Data

๐ŸŒŠ Happy exploring!

Funded by the European Union