{ "cells": [ { "cell_type": "markdown", "id": "4bd19066-a142-4f41-8269-39c8904e1bde", "metadata": {}, "source": [ "\n", "Recognizer\n", "=============\n", "\n", "The classification of datasets in Thredds was conducted, resulting in the identification of nine distinct and plausible scenarios. This feature aims to identify the quantity of scenarios, while concurrently providing an assessment of the depth of nested datasets (i.e., datasets that contain subdirectories within them). There are a total of nine distinct scenarios, which are outlined as follows:\n", "\n", "\n", "## First scenario (Nested): \n", "In this particular instance, the `catalogRef` tags are found just beneath the `dataset` element tag, without any separate data adjacent to the `catalogRef`s.\n", "\n", "Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/era5/sfc/single/catalog.xml \n", "\n", "XML-based | HTML-based\n", ":-------------------------:|:-------------------------:\n", "![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/first_scenario_xml.png) | ![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/first_scenario_html.png)\n", "\n", "## Second scenario (Nested): \n", "The `catalogRef`s are positioned below the `catalog` tag element, rather than being nested within a `dataset` element tag. Additionally, there is no distinct data present adjacent to the `catalogRef`s. \n", "\n", "Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/sensor_catalog_ext.xml\n", "\n", "XML-based | HTML-based\n", ":-------------------------:|:-------------------------:\n", "![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/second_scenario_xml.png) | ![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/second_scenario_html.png)\n", "\n", "## Third scenario (Nested): \n", "The current scenario has a resemblance to the first scenario, albeit with the distinction that a distinct `dataset` tag is positioned adjacent to the `catalogRef` tags.\n", "\n", "Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/chirps/catalog.xml\n", "\n", "XML-based | HTML-based\n", ":-------------------------:|:-------------------------:\n", "![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/third_scenario_xml.png) | ![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/third_scenario_html.png)\n", "\n", "\n", "## Fourth scenario: \n", "An empty dataset.\n", "\n", "Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/bio_geo_chem_catalog_ext.xml\n", "\n", "XML-based | HTML-based\n", ":-------------------------:|:-------------------------:\n", "![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/fourth_scenario_xml.png) | ![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/fourth_scenario_html.png)\n", "\n", "\n", "## Fifth scenario: \n", "In this particular scenario, the `catalogRef` tag is absent, and all of the tags present are dataset tags. It means the parent `dataset` tag encompasses a collection of `dataset` tags.\n", "\n", "Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/climate/raster/global/chelsa/v1.2/catalog.html\n", "\n", "XML-based | HTML-based\n", ":-------------------------:|:-------------------------:\n", "![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/fifth_scenario_xml.png) | ![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/fifth_scenario_html.png)\n", "\n", "## Sixth scenario: \n", "A single dataset.\n", "\n", "Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/era5/sfc/single/daily/catalog.html?dataset=era5_sfc_0.25_single/daily/ERA5_daily_sp_1981.nc\n", "\n", "XML-based | HTML-based\n", ":-------------------------:|:-------------------------:\n", "![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/sixth_scenario_xml.png) | ![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/sixth_scenario_html.png)\n", "\n", "## Seventh scenario: \n", "An aggregated dataset\n", "\n", "Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/swabian_moses_2021.xml?dataset=swabian_moses_aggregation\n", "\n", "XML-based | HTML-based\n", ":-------------------------:|:-------------------------:\n", "![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/seventh_scenario_xml.png) | ![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/seventh_scenario_html.png)\n", "\n", "## Eighth scenario (Nested): \n", "A configuration comprising of `catalogRef` elements and `dataset` tags that are not nested within a parent `dataset` tag. The situation has a resemblance to the second scenario, albeit with the notable distinction of the presence of distinct and individual dataset tags adjacent to `catalogRef`s.\n", "\n", "Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/transfer.xml\n", "\n", "XML-based | HTML-based\n", ":-------------------------:|:-------------------------:\n", "![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/eighth_scenario_xml.png) | ![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/eighth_scenario_html.png)\n", "\n", "## Ninth scenario (Nested): \n", "This scenario pertains to the presence of more than one individual `dataset` tags located outside the parent dataset tag, adjacent to the `catalogRef` elements. The situation has a resemblance to the third scenario, albeit with the notable distinction that it encompasses multiple (more than one) distinct `dataset` tags in addition to the `catalogRef`s.\n", "Example: https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/hydrogfd/v3.0/catalog.xml\n", "\n", "XML-based | HTML-based\n", ":-------------------------:|:-------------------------:\n", "![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/ninth_scenario_xml.png) | ![](https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/-/raw/main/docs/_static/ninth_scenario_html.png)\n" ] }, { "cell_type": "markdown", "id": "16e20cf6-23f5-4a9f-b1c7-3531452c3947", "metadata": {}, "source": [ "**We have provided several examples to illustrate the functioning of the Recognizer here:**" ] }, { "cell_type": "code", "execution_count": null, "id": "638d8b71-f591-46d1-aa6b-ceb19086aead", "metadata": {}, "outputs": [], "source": [ "from tds2stac import Recognizer\n", "\n", "# First case\n", "Recognizer(\n", " \"https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/era5/sfc/single/catalog.html\",\n", " nested_check=True,\n", ")\n", "\n", "# Second case\n", "\n", "Recognizer(\n", " \"https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/sensor_catalog_ext.html\",\n", " nested_check=True,\n", ")\n", "\n", "# Third case\n", "\n", "Recognizer(\n", " \"https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/chirps/catalog.html\",\n", " nested_check=True,\n", ")\n", "\n", "\n", "# Fourth case\n", "Recognizer(\"https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/bio_geo_chem_catalog_ext.html\")\n", "\n", "\n", "# Fifth case\n", "Recognizer(\"https://thredds.imk-ifu.kit.edu/thredds/catalog/climate/raster/global/chelsa/v1.2/catalog.html\")\n", "\n", "\n", "# Sixth case\n", "Recognizer(\"https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/era5/sfc/single/daily/catalog.html?dataset=era5_sfc_0.25_single/daily/ERA5_daily_sp_1979.nc\")\n", "\n", "\n", "# Seven case\n", "Recognizer(\"https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/swabian_moses_2021.html?dataset=swabian_moses_aggregation\")\n", "\n", "\n", "# Eighth case\n", "Recognizer(\n", " \"https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/transfer.html\",\n", " nested_check=True,\n", ")\n", "\n", "\n", "# Ninth case\n", "Recognizer(\n", " \"https://thredds.imk-ifu.kit.edu/thredds/catalog/regclim/raster/global/hydrogfd/v3.0/catalog.html\",\n", " nested_check=True,\n", ")\n", "\n", "\n", "# Finding random case\n", "Recognizer(\n", " \"https://thredds.imk-ifu.kit.edu/thredds/catalog/catalogues/reg_clim_sys_catalog_ext.html\",\n", " nested_check=True,\n", ")" ] }, { "cell_type": "raw", "id": "ffb15862-407a-45b5-b1fe-da97fc6afab7", "metadata": {}, "source": [ "Output:\n", "\n", "('First Scenario', 1)\n", "('Second Scenario', 2)\n", "('Third Scenario', 2)\n", "('Fourth Scenario', 0)\n", "('Fifth Scenario', 0)\n", "('Sixth Scenario', 0)\n", "('Seventh Scenario', 0)\n", "('Eighth Scenario', 1)\n", "('Ninth Scenario', 0)\n", "('Second Scenario', 5)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 5 }