Getting Started#

This guide will help you get up and running with gdptools quickly and efficiently.

Installation#

Quick Installation#

The easiest way to install gdptools is via conda or pip:

# Via conda (recommended)
conda install -c conda-forge gdptools

# Via pip
pip install gdptools

Development Installation#

For development or to get the latest features:

git clone https://code.usgs.gov/wma/nhgf/toolsteam/gdptools.git
cd gdptools
conda env create -f environment.yml
conda activate gdptools
poetry install
pre-commit install --install-hooks

Offline CRS configuration#

pyproj downloads grid-shift files the first time you reproject to certain CRSs. If you work behind a firewall or on an air-gapped network, provide those grids locally and disable network fetches:

  1. Install the grid bundle on a machine with internet access. The most reliable option is the proj-data package from conda-forge:

    mamba install -c conda-forge proj-data
    

    Copy the resulting share/proj directory to the offline machine (for example /opt/proj/share/proj).

  2. Point PROJ at that directory and disable remote downloads before running gdptools:

    export PROJ_NETWORK=OFF
    export PROJ_DATA=/opt/proj/share/proj  # or PROJ_LIB for older PROJ builds
    export PROJ_CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt  # optional, fixes TLS interception
    

    On Windows PowerShell, use setx PROJ_NETWORK OFF and setx PROJ_DATA C:\\proj\\share\\proj.

  3. Verify your configuration:

    python - <<'PY'
    from pyproj import datadir, network
    
    print("PROJ data dir:", datadir.get_data_dir())
    print("Network enabled:", network.is_network_enabled())
    PY
    

When these variables are set, gdptools CRS helpers surface a clear error directing colleagues to the same steps instead of waiting on blocked downloads.

Core Concepts#

Spatial Weight Calculation#

gdptools calculates area-weighted intersections between:

  • Gridded datasets (NetCDF, Zarr) and polygon geometries

  • Two sets of polygon geometries (watershed-to-county, etc.)

Processing Workflow#

  1. Data Input: Load your gridded data and target geometries

  2. Weight Generation: Calculate spatial intersection weights

  3. Aggregation: Apply statistical operations using the weights

  4. Output: Export results in multiple formats

Configuring Logging#

gdptools uses Python’s standard logging module. By default it emits no log output because the library registers a NullHandler — this follows the recommended practice for libraries.

Enabling log output#

To see log messages, configure logging in your application or notebook:

import logging

logging.basicConfig(level=logging.INFO)

This enables INFO-level output from all libraries. To control gdptools independently from other packages, set its logger directly:

# Show detailed gdptools output while silencing other libraries
logging.basicConfig(level=logging.WARNING)
logging.getLogger("gdptools").setLevel(logging.DEBUG)

Log levels#

gdptools emits messages at the following levels:

Level

What you’ll see

DEBUG

Internal state, geometry validation details, intersection data

INFO

Workflow milestones, timing summaries, data dimensions, weight-gen progress

WARNING

Recoverable issues (e.g., antimeridian wrapping, CRS validation fallbacks)

ERROR

Failures that precede an exception being raised

See the logging demo notebook for a hands-on walkthrough.

Examples#

The following table summarizes the example notebooks available in the documentation.

ClimateR-Catalog Examples#

These tutorials demonstrate how to use ClimRCatData to access and process climate data from the ClimateR-Catalog. See the ClimateR-Catalog documentation for a table of some of the common datasets available in the catalog.

Example Notebook

Description

Link

GridMET Grid-to-Polygon

Aggregates daily GridMET climate data to HUC12 polygons.

View Notebook

GridMET DRB Grid-to-Polygon

Aggregates daily gridMET climate data to Delaware River Basin HUC12 polygons.

View Notebook

3DEP Grid-to-Line

Interpolates 3DEP elevation data from a GeoTIFF along NHD stream segments.

View Notebook

GridMET Grid-to-Line

Interpolates daily gridMET variables along stream flowlines.

View Notebook

NHGF STAC Examples#

These tutorials demonstrate how to use NHGFStacData to access and process climate data from the USGS NHGF Stac Catalog. See the NHGF Stac Catalog for a table of some of the common datasets available in the catalog.

Example Notebook

Description

Link

CONUS404 Daily Data Grid-to-Polygon

Aggregates daily CONUS404 data to HUC12 polygons using NHGFStacData.

View Notebook

CONUS404 Daily Data Grid-to-Line

Interpolates gridded data along lines using NHGFStacData.

View Notebook

NLCD Land Cover Zonal Statistics

GeoTIFF-backed STAC collection for categorical land cover classification.

View Notebook

Non-Catalog Examples#

For custom datasets not in the ClimateR-Catalog or NHGF STAC, you can use UserCatData to access data from OPeNDAP endpoints or other sources.

Example Notebook

Description

Link

GridMET Non-Catalog

Demonstrates using UserCatData with a non-catalog OPeNDAP endpoint for GridMET data.

View Notebook

CONUS404 Daily Non-Catalog

Demonstrates using UserCatData as an alternative to NHGFStacData for NHGF STAC data. Can be used as a template for reading in data for other STAC catalogs.

View Notebook

Polygon-to-Polygon Examples#

For workflows involving two sets of polygons, such as watershed-to-county or county-to-state, use the WeightGenP2P class to calculate intersection weights. The Area-Weighted Aggregation can then be performed as demonstrated in the second Extensive vs Intensive variables example.

Example Notebook

Description

Link

Polygon-to-polygon weight calculation

Calculate the intersection weights between source and target polygons. Uses WeightGenP2P class.

View Notebook

Area-Weighted Aggregation of Polygonal Datasets, including intensive and extensive variables

Aggregates an idealized set of polygons each with extensive and intensive variables.

View Notebook

Rasters#

gdptools supports raster data processing with multiple computational engines. For standard zonal statistics, choose between serial, parallel, dask, or exactextract engines. The exactextract engine uses the exactextract library for high-performance computation with fractional pixel coverage.

Example Notebook

Description

Link

Raster Zonal Statistics

Demonstrates zonal statistics using serial, parallel, and exactextract engines for continuous and categorical rasters.

View Notebook

Configuring Logging

Shows how to enable, customize, and filter gdptools log output.

View Notebook

Which class should I use?

  • Grid → Polygon: ClimRCatData or UserCatData + WeightGen + AggGen (set weight_gen_crs=6931; choose engine via serial|parallel|dask; start with a modest jobs value—each worker loads the source dataset, so jobs=-1 can quickly exhaust memory).

  • Polygon → Polygon: WeightGenP2P (handle intensive vs extensive stats accordingly).

  • Rasters: UserTiffData + ZonalGen/WeightedZonalGen (zonal statistics; no weight generation step).