Hint

You can run this notebook in a live session with Binder Helmholtz.

  • it is important to acknowledge that the utilization of the Helmholtz is restricted to individuals who possess the necessary credentials as Helmholtz users. It is highly recommended to utilize the Python scipy option when constructing the environment in order to mitigate the occurrence of a 404 Bad request error.

Recognizer#

The classification of datasets in Thredds was conducted, resulting in the identification of nine distinct and plausible scenarios. This feature aims to identify the quantity of scenarios, while concurrently providing an assessment of the depth of nested datasets (i.e., datasets that contain subdirectories within them). There are a total of nine distinct scenarios, which are outlined as follows:

First scenario (Nested):#

In this particular instance, the catalogRef tags are found just beneath the dataset element tag, without any separate data adjacent to the catalogRefs.

Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/era5/sfc/single/catalog.xml

XML-based

HTML-based

image0

image1

Second scenario (Nested):#

The catalogRefs are positioned below the catalog tag element, rather than being nested within a dataset element tag. Additionally, there is no distinct data present adjacent to the catalogRefs.

Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/sensor_catalog_ext.xml

XML-based

HTML-based

image2

image3

Third scenario (Nested):#

The current scenario has a resemblance to the first scenario, albeit with the distinction that a distinct dataset tag is positioned adjacent to the catalogRef tags.

Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/chirps/catalog.xml

XML-based

HTML-based

image4

image5

Fourth scenario:#

An empty dataset.

Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/bio_geo_chem_catalog_ext.xml

XML-based

HTML-based

image6

image7

Fifth scenario:#

In this particular scenario, the catalogRef tag is absent, and all of the tags present are dataset tags. It means the parent dataset tag encompasses a collection of dataset tags.

Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/climate/raster/global/chelsa/v1.2/catalog.html

XML-based

HTML-based

image8

image9

Sixth scenario:#

A single dataset.

Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/era5/sfc/single/daily/catalog.html?dataset=era5_sfc_0.25_single/daily/ERA5_daily_sp_1981.nc

XML-based

HTML-based

image10

image11

Seventh scenario:#

An aggregated dataset

Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/swabian_moses_2021.xml?dataset=swabian_moses_aggregation

XML-based

HTML-based

image12

image13

Eighth scenario (Nested):#

A configuration comprising of catalogRef elements and dataset tags that are not nested within a parent dataset tag. The situation has a resemblance to the second scenario, albeit with the notable distinction of the presence of distinct and individual dataset tags adjacent to catalogRefs.

Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/transfer.xml

XML-based

HTML-based

image14

image15

Ninth scenario (Nested):#

This scenario pertains to the presence of more than one individual dataset tags located outside the parent dataset tag, adjacent to the catalogRef elements. The situation has a resemblance to the third scenario, albeit with the notable distinction that it encompasses multiple (more than one) distinct dataset tags in addition to the catalogRefs. Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/hydrogfd/v3.0/catalog.xml

XML-based

HTML-based

image16

image17

We have provided several examples to illustrate the functioning of the Recognizer here:

[ ]:
from tds2stac import Recognizer

# First case
Recognizer(
    "https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/era5/sfc/single/catalog.html",
    nested_check=True,
)

# Second case

Recognizer(
    "https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/sensor_catalog_ext.html",
    nested_check=True,
)

# Third case

Recognizer(
    "https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/chirps/catalog.html",
    nested_check=True,
)


# Fourth case
Recognizer("https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/bio_geo_chem_catalog_ext.html")


# Fifth case
Recognizer("https://thredds.imk-ifu.kit.edu/thredds/catalog/climate/raster/global/chelsa/v1.2/catalog.html")


# Sixth case
Recognizer("https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/era5/sfc/single/daily/catalog.html?dataset=era5_sfc_0.25_single/daily/ERA5_daily_sp_1979.nc")


# Seven case
Recognizer("https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/swabian_moses_2021.html?dataset=swabian_moses_aggregation")


# Eighth case
Recognizer(
    "https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/transfer.html",
    nested_check=True,
)


# Ninth case
Recognizer(
    "https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/hydrogfd/v3.0/catalog.html",
    nested_check=True,
)


# Finding random case
Recognizer(
    "https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/reg_clim_sys_catalog_ext.html",
    nested_check=True,
)
Output: ('First Scenario', 1) ('Second Scenario', 2) ('Third Scenario', 2) ('Fourth Scenario', 0) ('Fifth Scenario', 0) ('Sixth Scenario', 0) ('Seventh Scenario', 0) ('Eighth Scenario', 1) ('Ninth Scenario', 0) ('Second Scenario', 5)