Helper Methods#

This module provides utility functions for data subsetting and spatial operations in gdptools.

Overview#

The helper methods module contains functions that support the core functionality of gdptools by providing:

  • Spatial Subsetting: Functions to create subset dictionaries for xarray operations

  • Coordinate System Handling: Utilities for managing different coordinate orientations

  • Data Validation: Functions to verify input data structure and dimensions

Key Functions#

Spatial Subsetting Functions#

These functions create selection dictionaries for xarray operations:

  • build_subset(): General-purpose subsetting with spatial and temporal dimensions

  • build_subset_tiff(): TIFF-specific subsetting with band selection

  • build_subset_tiff_da(): DataArray-specific TIFF subsetting

Validation Functions#

  • check_gridded_data_for_dimensions(): Validates input data dimensions

Usage Examples#

Basic Spatial Subsetting#

import numpy as np
from gdptools.helpers import build_subset

# Define spatial bounds
bounds = np.array([-180, -90, 180, 90])  # [minx, miny, maxx, maxy]

# Create subset dictionary
subset_dict = build_subset(
    bounds=bounds,
    xname='longitude',
    yname='latitude',
    tname='time',
    toptobottom=False,
    date_min='2020-01-01',
    date_max='2020-12-31'
)

# Apply to dataset
data_subset = dataset.sel(subset_dict)

TIFF Data Subsetting#

from gdptools.helpers import build_subset_tiff

# Subset TIFF data by space and band
subset_dict = build_subset_tiff(
    bounds=bounds,
    xname='x',
    yname='y',
    toptobottom=True,
    bname='band',
    band=1
)

raster_subset = raster_data.sel(subset_dict)

Data Validation#

from gdptools.helpers import check_gridded_data_for_dimensions

# Validate dataset dimensions
try:
    check_gridded_data_for_dimensions(dataset, ['temperature', 'precipitation'])
    print("Data validation passed")
except KeyError as e:
    print(f"Data validation failed: {e}")

Coordinate System Orientation#

The toptobottom parameter is crucial for handling different coordinate systems:

  • True: Y-coordinates increase from north to south (typical for image data)

  • False: Y-coordinates increase from south to north (typical for geographic data)

Best Practices#

  1. Always validate bounds: Ensure bounds array is in correct format [minx, miny, maxx, maxy]

  2. Check coordinate orientation: Verify the toptobottom parameter matches your data

  3. Validate data dimensions: Use validation functions before processing

  4. Handle temporal data carefully: Ensure date strings are in ISO format

API Reference#

Helper functions for data subsetting and validation.

This module provides utility functions that support the core functionality of gdptools by providing: - Spatial and temporal subsetting for xarray Datasets and DataArrays. - Validation checks for gridded data dimensions.

exception GDPToolsError[source]

Bases: Exception

Base exception for gdptools library errors.

exception STACCatalogError[source]

Bases: GDPToolsError

Exception raised when STAC catalog operations fail.

build_subset(bounds, xname, yname, tname, toptobottom, date_min=None, date_max=None)[source]

Create a dictionary to use with xarray .sel() method to subset by time and space.

Constructs a selection dictionary for xarray subsetting operations that handles both spatial (x, y) and temporal (time) dimensions. Automatically adjusts for coordinate system orientation and provides flexible time range selection.

Parameters:
  • bounds (ndarray[tuple[Any, ...], dtype[float64]]) – Spatial bounds array in format [minx, miny, maxx, maxy].

  • xname (str) – Name of the x-dimension in the dataset.

  • yname (str) – Name of the y-dimension in the dataset.

  • tname (str) – Name of the time dimension in the dataset.

  • toptobottom (bool) – If True, y-coordinates increase from north to south. If False, y-coordinates increase from south to north.

  • date_min (str | None) – Start date for temporal subset (ISO format string). If None, no temporal subsetting is applied.

  • date_max (str | None) – End date for temporal subset (ISO format string). If None and date_min is provided, only the exact date_min is selected.

Returns:

Dictionary containing slice objects for xarray .sel() method with keys corresponding to dimension names and values as slice objects or exact values.

Return type:

dict[str, object]

Examples

>>> bounds = np.array([-180, -90, 180, 90])
>>> subset_dict = build_subset(
...     bounds, 'longitude', 'latitude', 'time', False,
...     '2020-01-01', '2020-12-31'
... )
>>> data_subset = dataset.sel(subset_dict)
build_subset_tiff(bounds, xname, yname, toptobottom, bname, band)[source]

Create a dictionary to use with xarray .sel() method to subset TIFF data by space and band.

Constructs a selection dictionary for xarray subsetting operations specifically for TIFF/raster data that handles spatial (x, y) dimensions and band selection. Automatically adjusts for coordinate system orientation.

Parameters:
  • bounds (ndarray[tuple[Any, ...], dtype[float64]]) – Spatial bounds array in format [minx, miny, maxx, maxy].

  • xname (str) – Name of the x-dimension in the dataset.

  • yname (str) – Name of the y-dimension in the dataset.

  • toptobottom (bool) – If True, y-coordinates increase from north to south. If False, y-coordinates increase from south to north.

  • bname (str) – Name of the band dimension in the dataset.

  • band (int) – Specific band number to select.

Returns:

Dictionary containing slice objects for xarray .sel() method with keys corresponding to dimension names and values as slice objects or exact values.

Return type:

Mapping[Any, Any]

Examples

>>> bounds = np.array([-180, -90, 180, 90])
>>> subset_dict = build_subset_tiff(
...     bounds, 'x', 'y', True, 'band', 1
... )
>>> raster_subset = raster_data.sel(subset_dict)
build_subset_tiff_da(bounds, xname, yname, toptobottom)[source]

Create a dictionary to use with xarray .sel() method to subset TIFF DataArray by space.

Constructs a selection dictionary for xarray subsetting operations specifically for TIFF/raster DataArray objects that handles spatial (x, y) dimensions. Automatically adjusts for coordinate system orientation.

Parameters:
  • bounds (ndarray[tuple[Any, ...], dtype[float64]]) – Spatial bounds array in format [minx, miny, maxx, maxy].

  • xname (str) – Name of the x-dimension in the dataset.

  • yname (str) – Name of the y-dimension in the dataset.

  • toptobottom (int | bool) – If True or 1, y-coordinates increase from north to south. If False or 0, y-coordinates increase from south to north.

Returns:

Dictionary containing slice objects for xarray .sel() method with keys corresponding to dimension names and values as slice objects.

Return type:

Mapping[Any, Any]

Examples

>>> bounds = np.array([-180, -90, 180, 90])
>>> subset_dict = build_subset_tiff_da(
...     bounds, 'x', 'y', True
... )
>>> raster_subset = raster_dataarray.sel(subset_dict)
check_gridded_data_for_dimensions(ds, vars)[source]

Check that gridded data has the required dimensions.

Checks each specified DataArray in an xarray Dataset to confirm that it has three dimensions and that the first dimension is ‘time’. This is a pre-requisite for many gdptools processing functions.

Parameters:
  • ds (Dataset) – The xarray Dataset to validate.

  • vars (list[str]) – A list of variable names within the dataset to check.

Raises:

KeyError – If any of the specified variables do not have exactly three dimensions or if ‘time’ is not the first dimension.

get_stac_collection(collection_id)[source]

Fetch a collection from the NHGF STAC catalog.

Attempts a direct API lookup first (single HTTP request). Falls back to a recursive catalog traversal for nested collections whose API path doesn’t match their ID.

Parameters:

collection_id (str) – The collection identifier (e.g., "conus404_daily", "nlcd-LndCov").

Returns:

The pystac Collection object.

Raises:

STACCatalogError – If the collection is not found.

Return type:

Collection