Getting Started

Getting Started#

This guide will help you get up and running with gdptools quickly and efficiently.

Installation #

Quick Installation #

The easiest way to install gdptools is via conda or pip:

# Via conda (recommended)
conda install -c conda-forge gdptools

# Via pip
pip install gdptools

Development Installation #

For development or to get the latest features:

git clone https://code.usgs.gov/wma/nhgf/toolsteam/gdptools.git
cd gdptools
conda env create -f environment.yml
conda activate gdptools
poetry install
pre-commit install --install-hooks

Offline CRS configuration #

pyproj downloads grid-shift files the first time you reproject to certain CRSs. If you work behind a firewall or on an air-gapped network, provide those grids locally and disable network fetches:

Install the grid bundle on a machine with internet access. The most reliable option is the proj-data package from conda-forge:
```
mamba install -c conda-forge proj-data
```
Copy the resulting share/proj directory to the offline machine (for example /opt/proj/share/proj).

Point PROJ at that directory and disable remote downloads before running gdptools:

export PROJ_NETWORK=OFF
export PROJ_DATA=/opt/proj/share/proj  # or PROJ_LIB for older PROJ builds
export PROJ_CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt  # optional, fixes TLS interception

On Windows PowerShell, use setx PROJ_NETWORK OFF and setx PROJ_DATA C:\\proj\\share\\proj.

Verify your configuration:

python - <<'PY'
from pyproj import datadir, network

print("PROJ data dir:", datadir.get_data_dir())
print("Network enabled:", network.is_network_enabled())
PY

When these variables are set, gdptools CRS helpers surface a clear error directing colleagues to the same steps instead of waiting on blocked downloads.

Core Concepts #

Spatial Weight Calculation #

gdptools calculates area-weighted intersections between:

Gridded datasets (NetCDF, Zarr) and polygon geometries
Two sets of polygon geometries (watershed-to-county, etc.)

Processing Workflow #

Data Input: Load your gridded data and target geometries
Weight Generation: Calculate spatial intersection weights
Aggregation: Apply statistical operations using the weights
Output: Export results in multiple formats

Configuring Logging #

gdptools uses Python’s standard logging module. By default it emits no log output because the library registers a NullHandler — this follows the recommended practice for libraries.

Enabling log output #

To see log messages, configure logging in your application or notebook:

import logging

logging.basicConfig(level=logging.INFO)

This enables INFO-level output from all libraries. To control gdptools independently from other packages, set its logger directly:

# Show detailed gdptools output while silencing other libraries
logging.basicConfig(level=logging.WARNING)
logging.getLogger("gdptools").setLevel(logging.DEBUG)

Log levels #

gdptools emits messages at the following levels:

Level	What you’ll see
`DEBUG`	Internal state, geometry validation details, intersection data
`INFO`	Workflow milestones, timing summaries, data dimensions, weight-gen progress
`WARNING`	Recoverable issues (e.g., antimeridian wrapping, CRS validation fallbacks)
`ERROR`	Failures that precede an exception being raised

See the logging demo notebook for a hands-on walkthrough.

Examples #

The following table summarizes the example notebooks available in the documentation.

These tutorials demonstrate how to use ClimRCatData to access and process climate data from the ClimateR-Catalog. See the ClimateR-Catalog documentation for a table of some of the common datasets available in the catalog.

Example Notebook	Description	Link
GridMET Grid-to-Polygon	Aggregates daily GridMET climate data to HUC12 polygons.	View Notebook
GridMET DRB Grid-to-Polygon	Aggregates daily gridMET climate data to Delaware River Basin HUC12 polygons.	View Notebook
3DEP Grid-to-Line	Interpolates 3DEP elevation data from a GeoTIFF along NHD stream segments.	View Notebook
GridMET Grid-to-Line	Interpolates daily gridMET variables along stream flowlines.	View Notebook

NHGF STAC Examples #

These tutorials demonstrate how to use NHGFStacData to access and process climate data from the USGS NHGF Stac Catalog. See the NHGF Stac Catalog for a table of some of the common datasets available in the catalog.

Example Notebook	Description	Link
CONUS404 Daily Data Grid-to-Polygon	Aggregates daily CONUS404 data to HUC12 polygons using `NHGFStacData`.	View Notebook
CONUS404 Daily Data Grid-to-Line	Interpolates gridded data along lines using `NHGFStacData`.	View Notebook
NLCD Land Cover Zonal Statistics	GeoTIFF-backed STAC collection for categorical land cover classification.	View Notebook

Non-Catalog Examples #

For custom datasets not in the ClimateR-Catalog or NHGF STAC, you can use UserCatData to access data from OPeNDAP endpoints or other sources.

Example Notebook	Description	Link
GridMET Non-Catalog	Demonstrates using `UserCatData` with a non-catalog OPeNDAP endpoint for GridMET data.	View Notebook
CONUS404 Daily Non-Catalog	Demonstrates using `UserCatData` as an alternative to `NHGFStacData` for NHGF STAC data. Can be used as a template for reading in data for other STAC catalogs.	View Notebook

Polygon-to-Polygon Examples #

For workflows involving two sets of polygons, such as watershed-to-county or county-to-state, use the WeightGenP2P class to calculate intersection weights. The Area-Weighted Aggregation can then be performed as demonstrated in the second Extensive vs Intensive variables example.

Example Notebook	Description	Link
Polygon-to-polygon weight calculation	Calculate the intersection weights between source and target polygons. Uses `WeightGenP2P` class.	View Notebook
Area-Weighted Aggregation of Polygonal Datasets, including `intensive` and `extensive` variables	Aggregates an idealized set of polygons each with extensive and intensive variables.	View Notebook

Rasters #

gdptools supports raster data processing with multiple computational engines. For standard zonal statistics, choose between serial, parallel, dask, or exactextract engines. The exactextract engine uses the exactextract library for high-performance computation with fractional pixel coverage.

Example Notebook	Description	Link
Raster Zonal Statistics	Demonstrates zonal statistics using serial, parallel, and exactextract engines for continuous and categorical rasters.	View Notebook
Configuring Logging	Shows how to enable, customize, and filter gdptools log output.	View Notebook

Which class should I use?

Grid → Polygon: ClimRCatData or UserCatData + WeightGen + AggGen (set weight_gen_crs=6931; choose engine via serial|parallel|dask; start with a modest jobs value—each worker loads the source dataset, so jobs=-1 can quickly exhaust memory).
Polygon → Polygon: WeightGenP2P (handle intensive vs extensive stats accordingly).
Rasters: UserTiffData + ZonalGen/WeightedZonalGen (zonal statistics; no weight generation step).