Getting Started#
This guide will help you get up and running with gdptools quickly and efficiently.
Installation#
Quick Installation#
The easiest way to install gdptools is via conda or pip:
# Via conda (recommended)
conda install -c conda-forge gdptools
# Via pip
pip install gdptools
Development Installation#
For development or to get the latest features:
git clone https://code.usgs.gov/wma/nhgf/toolsteam/gdptools.git
cd gdptools
conda env create -f environment.yml
conda activate gdptools
poetry install
pre-commit install --install-hooks
Offline CRS configuration#
pyproj downloads grid-shift files the first time you reproject to certain CRSs. If you work behind a
firewall or on an air-gapped network, provide those grids locally and disable network fetches:
Install the grid bundle on a machine with internet access. The most reliable option is the
proj-datapackage from conda-forge:mamba install -c conda-forge proj-data
Copy the resulting
share/projdirectory to the offline machine (for example/opt/proj/share/proj).Point PROJ at that directory and disable remote downloads before running
gdptools:export PROJ_NETWORK=OFF export PROJ_DATA=/opt/proj/share/proj # or PROJ_LIB for older PROJ builds export PROJ_CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt # optional, fixes TLS interception
On Windows PowerShell, use
setx PROJ_NETWORK OFFandsetx PROJ_DATA C:\\proj\\share\\proj.Verify your configuration:
python - <<'PY' from pyproj import datadir, network print("PROJ data dir:", datadir.get_data_dir()) print("Network enabled:", network.is_network_enabled()) PY
When these variables are set, gdptools CRS helpers surface a clear error directing colleagues to the
same steps instead of waiting on blocked downloads.
Core Concepts#
Spatial Weight Calculation#
gdptools calculates area-weighted intersections between:
Gridded datasets (NetCDF, Zarr) and polygon geometries
Two sets of polygon geometries (watershed-to-county, etc.)
Processing Workflow#
Data Input: Load your gridded data and target geometries
Weight Generation: Calculate spatial intersection weights
Aggregation: Apply statistical operations using the weights
Output: Export results in multiple formats
Configuring Logging#
gdptools uses Python’s standard logging module. By default it emits no log output because the library registers a NullHandler — this follows the recommended practice for libraries.
Enabling log output#
To see log messages, configure logging in your application or notebook:
import logging
logging.basicConfig(level=logging.INFO)
This enables INFO-level output from all libraries. To control gdptools independently from other packages, set its logger directly:
# Show detailed gdptools output while silencing other libraries
logging.basicConfig(level=logging.WARNING)
logging.getLogger("gdptools").setLevel(logging.DEBUG)
Log levels#
gdptools emits messages at the following levels:
Level |
What you’ll see |
|---|---|
|
Internal state, geometry validation details, intersection data |
|
Workflow milestones, timing summaries, data dimensions, weight-gen progress |
|
Recoverable issues (e.g., antimeridian wrapping, CRS validation fallbacks) |
|
Failures that precede an exception being raised |
See the logging demo notebook for a hands-on walkthrough.
Examples#
The following table summarizes the example notebooks available in the documentation.
ClimateR-Catalog Examples#
These tutorials demonstrate how to use ClimRCatData to access and process climate data from the ClimateR-Catalog. See the ClimateR-Catalog documentation for a table of some of the common datasets available in the catalog.
Example Notebook |
Description |
Link |
|---|---|---|
GridMET Grid-to-Polygon |
Aggregates daily GridMET climate data to HUC12 polygons. |
|
GridMET DRB Grid-to-Polygon |
Aggregates daily gridMET climate data to Delaware River Basin HUC12 polygons. |
|
3DEP Grid-to-Line |
Interpolates 3DEP elevation data from a GeoTIFF along NHD stream segments. |
|
GridMET Grid-to-Line |
Interpolates daily gridMET variables along stream flowlines. |
NHGF STAC Examples#
These tutorials demonstrate how to use NHGFStacData to access and process climate data from the USGS NHGF Stac Catalog. See the NHGF Stac Catalog for a table of some of the common datasets available in the catalog.
Example Notebook |
Description |
Link |
|---|---|---|
CONUS404 Daily Data Grid-to-Polygon |
Aggregates daily CONUS404 data to HUC12 polygons using |
|
CONUS404 Daily Data Grid-to-Line |
Interpolates gridded data along lines using |
|
NLCD Land Cover Zonal Statistics |
GeoTIFF-backed STAC collection for categorical land cover classification. |
Non-Catalog Examples#
For custom datasets not in the ClimateR-Catalog or NHGF STAC, you can use UserCatData to access data from OPeNDAP endpoints or other sources.
Example Notebook |
Description |
Link |
|---|---|---|
GridMET Non-Catalog |
Demonstrates using |
|
CONUS404 Daily Non-Catalog |
Demonstrates using |
Polygon-to-Polygon Examples#
For workflows involving two sets of polygons, such as watershed-to-county or county-to-state, use the WeightGenP2P class to calculate intersection weights. The Area-Weighted Aggregation can then be performed as demonstrated in the second Extensive vs Intensive variables example.
Example Notebook |
Description |
Link |
|---|---|---|
Polygon-to-polygon weight calculation |
Calculate the intersection weights between source and target polygons. Uses |
|
Area-Weighted Aggregation of Polygonal Datasets, including |
Aggregates an idealized set of polygons each with extensive and intensive variables. |
Rasters#
gdptools supports raster data processing with multiple computational engines. For standard zonal statistics, choose between serial, parallel, dask, or exactextract engines. The exactextract engine uses the exactextract library for high-performance computation with fractional pixel coverage.
Example Notebook |
Description |
Link |
|---|---|---|
Raster Zonal Statistics |
Demonstrates zonal statistics using serial, parallel, and exactextract engines for continuous and categorical rasters. |
|
Configuring Logging |
Shows how to enable, customize, and filter gdptools log output. |
Which class should I use?
Grid → Polygon:
ClimRCatDataorUserCatData+WeightGen+AggGen(setweight_gen_crs=6931; choose engine viaserial|parallel|dask; start with a modestjobsvalue—each worker loads the source dataset, sojobs=-1can quickly exhaust memory).Polygon → Polygon:
WeightGenP2P(handle intensive vs extensive stats accordingly).Rasters:
UserTiffData+ZonalGen/WeightedZonalGen(zonal statistics; no weight generation step).