# ![gdptools](assets/gdptools_logo.png)

[![PyPI](https://img.shields.io/pypi/v/gdptools.svg)](https://pypi.org/project/gdptools/)
[![conda](https://anaconda.org/conda-forge/gdptools/badges/version.svg)](https://anaconda.org/conda-forge/gdptools)
[![Latest Release](https://code.usgs.gov/wma/nhgf/toolsteam/gdptools/-/badges/release.svg)](https://code.usgs.gov/wma/nhgf/toolsteam/gdptools/-/releases)

[![Status](https://img.shields.io/pypi/status/gdptools.svg)](https://pypi.org/project/gdptools/)
[![Python Version](https://img.shields.io/pypi/pyversions/gdptools)](https://pypi.org/project/gdptools)

[![License](https://img.shields.io/pypi/l/gdptools)](https://creativecommons.org/publicdomain/zero/1.0/legalcode)

[![Read the documentation at https://gdptools.readthedocs.io/](https://img.shields.io/readthedocs/gdptools/latest.svg?label=Read%20the%20Docs)](https://gdptools.readthedocs.io/)
[![pipeline status](https://code.usgs.gov/wma/nhgf/toolsteam/gdptools/badges/main/pipeline.svg)](https://code.usgs.gov/wma/nhgf/toolsteam/gdptools/-/commits/main)
[![coverage report](https://code.usgs.gov/wma/nhgf/toolsteam/gdptools/badges/main/coverage.svg)](https://code.usgs.gov/wma/nhgf/toolsteam/gdptools/-/commits/main)

[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://code.usgs.gov/pre-commit/pre-commit)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Poetry](https://img.shields.io/badge/poetry-enabled-blue)](https://python-poetry.org/)
[![Conda](https://img.shields.io/badge/conda-enabled-green)](https://anaconda.org/)

**gdptools** is a Python package for calculating area-weighted statistics and spatial interpolations between gridded datasets and vector geometries. It provides efficient tools for **grid-to-polygon**, **grid-to-line**, and **polygon-to-polygon** interpolations with support for multiple data catalogs and custom datasets.

```{figure} assets/Welcom_fig.png
:alt: Example grid-to-polygon interpolation
:width: 100%

Example grid-to-polygon interpolation. A) HUC12 basins for Delaware River Watershed. B) Gridded monthly water evaporation amount (mm) from TerraClimate dataset. C) Area-weighted-average interpolation of gridded TerraClimate data to HUC12 polygons.
```

## 🚀 Key Features

- **Multiple Interpolation Methods**: Grid-to-polygon, grid-to-line, and polygon-to-polygon area-weighted statistics
- **Catalog Integration**: Built-in support for U.S. Geological Survey's NHGF-STAC catalog, Mike Johnson's ClimateR catalog, and custom metadata
- **Flexible Data Sources**: Works with any xarray-compatible gridded data and geopandas vector (line and polygon) data
- **Scalable Processing**: Serial, parallel, and Dask-based computation methods
- **Multiple Output Formats**: NetCDF, CSV, Parquet, and in-memory results
- **Extensive vs Intensive Variables**: Proper handling of different variable types in polygon-to-polygon operations
- **Intelligent Spatial Processing**: Automatic reprojection to equal-area coordinate systems and efficient spatial subsetting

## 🌍 Spatial Processing & Performance

gdptools automatically handles complex geospatial transformations to ensure accurate and efficient calculations:

### Automatic Reprojection

- **Equal-Area Projections**: Both source gridded data and target geometries are automatically reprojected to a common equal-area coordinate reference system (default: EPSG:6931 - Equal-Area Scalable Earth Grid (EASE-Grid)
- **Accurate Area Calculations**: Equal-area projections ensure that area-weighted statistics are calculated correctly, regardless of the original coordinate systems
- **Flexible CRS Options**: Users can specify alternative projection systems via the `weight_gen_crs` parameter

### Efficient Spatial Subsetting

- **Bounding Box Optimization**: Gridded datasets are automatically subset to the bounding box of the target geometries plus a buffer
- **Smart Buffering**: Buffer size is calculated as twice the maximum grid resolution to ensure complete coverage
- **Memory Efficiency**: Only the necessary spatial extent is loaded into memory, reducing processing time and memory usage for large datasets

## 📦 Installation

### Quick Installation

```bash
# Via conda (recommended)
conda install -c conda-forge gdptools

# Via pip
pip install gdptools
```

**→ [Complete installation guide with development setup](getting_started.md)**

## 🚀 Quick Start

**→ [Complete getting started guide with examples](getting_started.md)**

## 🔧 Core Components

**→ [Complete API reference](reference.md)**

### Data Classes

- **`ClimRCatData`**: Interface with ClimateR catalog datasets
- **`NHGFStacData`**: Interface with NHGF STAC catalog datasets
- **`UserCatData`**: Custom user-defined gridded datasets
- **`UserTiffData`**: GeoTIFF/raster data interface

### Processing Classes

- **`WeightGen`**: Calculate spatial intersection weights
- **`AggGen`**: Perform area-weighted aggregations
- **`InterpGen`**: Grid-to-line interpolation along vector paths

## 📚 Documentation Structure

- **[Getting Started](getting_started.md)**: Installation, core concepts, and first examples
- **[API Reference](reference.md)**: Complete documentation of all classes and functions

## 🗄️ Data Sources

gdptools integrates with multiple climate and environmental data sources:

- **[ClimateR Catalog](catalog_datasets.md)**: TerraClimate, GridMET, Daymet, PRISM, and more
- **[NHGF STAC Catalog](nhgf_stac_datasets.md)**: Cloud-optimized CONUS404, observational data, climate projections
- **[Custom Datasets](getting_started.md)**: Your own NetCDF, Zarr, or GeoTIFF files

## 💡 Use Cases

- **Climate Analysis**: Aggregate weather/climate data over watersheds, counties, or custom regions
- **Environmental Monitoring**: Calculate zonal statistics from satellite imagery and gridded datasets
- **Hydrological Modeling**: Transfer data between different spatial frameworks (HUCs, model grids, etc.)
- **Impact Assessment**: Interpolate climate projections to administrative boundaries
- **Research Applications**: Process custom model outputs and observational datasets

**→ [See detailed tutorials and examples](getting_started.md)**

## 🤝 Contributing

We welcome contributions! Please see our [contributing guide](contributing.md) for development setup, testing procedures, code style guidelines, and issue reporting.

## 📄 License

This project is in the public domain. See [LICENSE](license.md) for details.

## ⚠️ Disclaimer

This software is preliminary or provisional and is subject to revision. See our full [disclaimer](disclaimer.md) for important usage information.

## 🙏 Acknowledgments

gdptools integrates with several excellent open-source projects:

- **[xarray](http://xarray.pydata.org/)**: Multi-dimensional array processing
- **[geopandas](https://geopandas.org/)**: Geospatial data manipulation
- **[HyRiver](https://docs.hyriver.io/)**: Hydrologic data access (pynhd, pygeohydro)
- **[STAC](https://stacspec.org/)**: Spatiotemporal asset catalogs
- **[ClimateR](https://github.com/mikejohnson51/climateR-catalogs)**: Climate data catalogs

---

**Questions?** Open an issue on our [GitLab repository](https://code.usgs.gov/wma/nhgf/toolsteam/gdptools) or check the documentation for detailed examples and API reference.
