1.2.3. tds2stac.harvester module#

class tds2stac.harvester.CollectionHarvester(url: str, recognizer: str | None, subdirs: list | None = [], collection_tuples: list[tuple] | None = None, logger_properties: dict = {}, requests_properties: dict = {})[source]#

Bases: object

This class harvests data pertaining to Collections from TDS catalogs. Depending on the sort of dataset scenario, it returns one of the five variables below. collection_id , collection_title , collection_description , collection_url , and collection_subdirs.

Parameters:

url (str) – TDS catalog URL address
recognizer (str) – status scenario number of Recognizer
subdirs (list) – subdirs is a list of url, id, title, and subdirs of a nested dataset
collection_tuples (list) – a tuple of STAC collection’s auto-generated ID, user-ID, user-Title and user-Description defined by user.
logger_properties (dict) – dictionary of logger properties
requests_properties (dict) – dictionary of requests properties

collection_description: str | None#

collection_id: str | None#

collection_id_desc_maker(url: str, collection_tuples: list[tuple] | None = None, recognizer_output: str | None = None)[source]#

A function for getting collection id and description from the TDS catalog urls and pre-defined collection_tuples for scenarios number 4, 5, 6, 7 ,and 9

Parameters:

url (str) – TDS catalog URL address
collection_tuples (list) – a tuple of STAC collection’s auto-generated ID, user-ID, user-Title and user-Description defined by user.
recognizer_output (str) – status scenario output of Recognizer class

collection_subdirs: list | None#

collection_title: str | None#

collection_tuples: list[tuple] | None#: a tuple of STAC collection’s auto-generated ID, user-ID, user-Title and user-Description defined by user.

collection_url: str | None#

logger_properties: dict#: dictionary of logger properties, more information in Logger

recognizer: str | None#: status scenario output of Recognizer class

requests_properties: dict#: To obtain additional information on this topic, refer to the requests_properties. The default value is an empty dictionary.

subdirs: list | None#: subdirs is a list of url, id, title, and subdirs of a nested dataset

url: str#: TDS catalog URL address. Initial point of harvesting e.g. https://thredds.atmohub.kit.edu/thredds/catalog/caribic/IAGOS-CARIBIC_MS_files_collection_20231017/catalog.html (*)

class tds2stac.harvester.ItemHarvester(url: str, elem: Element, harvesting_vars: dict, web_service_dict: dict | None, datetime_after: datetime | None = None, datetime_before: datetime | None = None, spatial_information: list | None = None, temporal_format_by_dataname: str | None = None, extension_properties: dict | None = None, linestring: bool = False, requests_properties: dict = {}, logger_properties: dict = {})[source]#

Bases: object

This class harvests information about an Item from TDS data catalogs. It ultimately returns a dictionary of harvesting variables, based on the type of dataset scenario and activated extensions.

Parameters:

url (str) – TDS catalog URL address
elem (str) – xml element of the data in dataset
harvesting_vars (dict) – dictionary of harvesting variables that is going to be filled
web_service_dict (dict) – web service that the user wants to harvest from
datetime_after (str) – datetime that the user wants to harvest data after that
datetime_before (str) – datetime that the user wants to harvest data before that
spatial_information (list) – Spatial information of 2D datasets e.g. [minx, maxx, miny, maxy] or 1D dataset e.g. [x,y]
temporal_format_by_dataname (str) – datetime format for datasets that have datetime in their name e.g `e%y%m%d%H.%M%S%f`(optional),
extension_properties (dict) – dictionary of extension properties (optional)
linestring (bool) – using this attribute, user activate making LineString instead of Polygon (True and False) (optional)
logger_properties (dict) – dictionary of logger properties

datetime_after: str | None#: datetime that the user wants to harvest data after that

datetime_before: str | None#: datetime that the user wants to harvest data before that

elem: Element#: xml element of the data in dataset. It’s an element of the xml file that is going to be harvested

extension_properties: dict | None#: dictionary of extension properties (optional)

harvesting_vars: dict#: dictionary of harvesting variables that is going to be filled

linestring: bool#: using this attribute, user activate making LineString instead of Polygon (True and False) (optional)

logger_properties: dict#: dictionary of logger properties, more information in Logger

requests_properties: dict#: To obtain additional information on this topic, refer to the requests_properties. The default value is an empty dictionary.

spatial_information: list | None#: Spatial information of 2D datasets e.g. [minx, maxx, miny, maxy] or 1D dataset e.g. [x,y] (optional)

temporal_format_by_dataname: str | None#: datetime format for datasets that have datetime in their name e.g `e%y%m%d%H.%M%S%f`(optional)

usl: str#: TDS catalog URL address. Initial point of harvesting e.g. https://thredds.atmohub.kit.edu/thredds/catalog/caribic/IAGOS-CARIBIC_MS_files_collection_20231017/catalog.html (*)

web_service_dict: dict | None#: web service that the user wants to harvest from