1.2.3. tds2stac.harvester module#

class tds2stac.harvester.CollectionHarvester(url: str, recognizer: str | None, subdirs: list | None = [], collection_tuples: list[tuple] | None = None, logger_properties: dict = {}, requests_properties: dict = {})[source]#

Bases: object

This class harvests data pertaining to Collections from TDS catalogs. Depending on the sort of dataset scenario, it returns one of the five variables below. collection_id , collection_title , collection_description , collection_url , and collection_subdirs.

Parameters:
  • url (str) – TDS catalog URL address

  • recognizer (str) – status scenario number of Recognizer

  • subdirs (list) – subdirs is a list of url, id, title, and subdirs of a nested dataset

  • collection_tuples (list) – a tuple of STAC collection’s auto-generated ID, user-ID, user-Title and user-Description defined by user.

  • logger_properties (dict) – dictionary of logger properties

  • requests_properties (dict) – dictionary of requests properties

collection_description: str | None#
collection_id: str | None#
collection_id_desc_maker(url: str, collection_tuples: list[tuple] | None = None, recognizer_output: str | None = None)[source]#

A function for getting collection id and description from the TDS catalog urls and pre-defined collection_tuples for scenarios number 4, 5, 6, 7 ,and 9

Parameters:
  • url (str) – TDS catalog URL address

  • collection_tuples (list) – a tuple of STAC collection’s auto-generated ID, user-ID, user-Title and user-Description defined by user.

  • recognizer_output (str) – status scenario output of Recognizer class

collection_subdirs: list | None#
collection_title: str | None#
collection_tuples: list[tuple] | None#

a tuple of STAC collection’s auto-generated ID, user-ID, user-Title and user-Description defined by user.

collection_url: str | None#
logger_properties: dict#

dictionary of logger properties, more information in Logger

recognizer: str | None#

status scenario output of Recognizer class

requests_properties: dict#

To obtain additional information on this topic, refer to the requests_properties. The default value is an empty dictionary.

subdirs: list | None#

subdirs is a list of url, id, title, and subdirs of a nested dataset

url: str#

TDS catalog URL address. Initial point of harvesting e.g. https://thredds.atmohub.kit.edu/thredds/catalog/caribic/IAGOS-CARIBIC_MS_files_collection_20231017/catalog.html (*)

class tds2stac.harvester.ItemHarvester(url: str, elem: Element, harvesting_vars: dict, web_service_dict: dict | None, datetime_after: datetime | None = None, datetime_before: datetime | None = None, spatial_information: list | None = None, temporal_format_by_dataname: str | None = None, extension_properties: dict | None = None, linestring: bool = False, requests_properties: dict = {}, logger_properties: dict = {})[source]#

Bases: object

This class harvests information about an Item from TDS data catalogs. It ultimately returns a dictionary of harvesting variables, based on the type of dataset scenario and activated extensions.

Parameters:
  • url (str) – TDS catalog URL address

  • elem (str) – xml element of the data in dataset

  • harvesting_vars (dict) – dictionary of harvesting variables that is going to be filled

  • web_service_dict (dict) – web service that the user wants to harvest from

  • datetime_after (str) – datetime that the user wants to harvest data after that

  • datetime_before (str) – datetime that the user wants to harvest data before that

  • spatial_information (list) – Spatial information of 2D datasets e.g. [minx, maxx, miny, maxy] or 1D dataset e.g. [x,y]

  • temporal_format_by_dataname (str) – datetime format for datasets that have datetime in their name e.g `e%y%m%d%H.%M%S%f`(optional),

  • extension_properties (dict) – dictionary of extension properties (optional)

  • linestring (bool) – using this attribute, user activate making LineString instead of Polygon (True and False) (optional)

  • logger_properties (dict) – dictionary of logger properties

datetime_after: str | None#

datetime that the user wants to harvest data after that

datetime_before: str | None#

datetime that the user wants to harvest data before that

elem: Element#

xml element of the data in dataset. It’s an element of the xml file that is going to be harvested

extension_properties: dict | None#

dictionary of extension properties (optional)

harvesting_vars: dict#

dictionary of harvesting variables that is going to be filled

linestring: bool#

using this attribute, user activate making LineString instead of Polygon (True and False) (optional)

logger_properties: dict#

dictionary of logger properties, more information in Logger

requests_properties: dict#

To obtain additional information on this topic, refer to the requests_properties. The default value is an empty dictionary.

spatial_information: list | None#

Spatial information of 2D datasets e.g. [minx, maxx, miny, maxy] or 1D dataset e.g. [x,y] (optional)

temporal_format_by_dataname: str | None#

datetime format for datasets that have datetime in their name e.g `e%y%m%d%H.%M%S%f`(optional)

usl: str#

TDS catalog URL address. Initial point of harvesting e.g. https://thredds.atmohub.kit.edu/thredds/catalog/caribic/IAGOS-CARIBIC_MS_files_collection_20231017/catalog.html (*)

web_service_dict: dict | None#

web service that the user wants to harvest from