Changelog#

All notable changes to gdptools are documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Current Version: 0.2.21#

0.3.12 (2026-04-17)#

  • Security: Upgraded 4 locked dependencies to resolve 13 CVEs: aiohttp (10 CVEs), pygments (CVE-2026-4539), pytest (CVE-2025-71176), and requests (CVE-2026-25645).

  • Fixed: UserCatData and NHGFStacZarrData no longer mutate the caller’s source dataset when rotating 0–360° longitude into -180…180°. Constructing multiple instances from a single shared xr.Dataset now works; previously the second instance failed with KeyError on a non-monotonic index (#97).

  • Fixed: AggGen.calculate_agg() now returns a CF-1.8-compliant xr.Dataset matching the structure written by NetCDFWriter: adds the crs scalar variable referenced by grid_mapping, along with lat/lon centroid coordinates in EPSG:4326 (#97).

  • Internal: Extracted a shared build_cf_dataset() helper (new module gdptools.agg._cf_dataset) so AggGen._gen_xarray_return and NetCDFWriter.create_out_file cannot drift.

  • Internal: Extracted a _rotate_longitude_if_needed helper in gdptools.data.user_data, removing the duplicated rotation block between UserCatData.__init__ and NHGFStacZarrData.__init__.

  • Internal: NetCDFWriter no longer leaks a process-wide UserWarning filter via warnings.filterwarnings(...). The centroid-computation suppression is now scoped to a warnings.catch_warnings() block inside build_cf_dataset, so other code running after a NetCDF save sees unchanged warning behavior.

  • Internal: Removed stale hvplot entry from dev group in uv.lock; hvplot is a docs-only dependency and was never listed in pyproject.toml dev group.

0.3.11 (2026-02-27)#

  • Deprecated: serial, parallel, and dask zonal engines now emit FutureWarning. Use zonal_engine="exactextract" instead, which provides better performance and handles large datasets with bounded memory.

  • Changed: Moved dask.distributed from top-level import to lazy import inside ZonalEngineDask.weighted_zonal_stats(), so the distributed package is no longer required at import time.

  • Removed: safety dev dependency (redundant with pip-audit; its transitive nltk dependency had CVE-2025-14009).

  • Removed: Unused .safety-policy.yml configuration file.

  • Documentation: Updated zonal statistics and logging demo notebooks with deprecation notices and revised engine comparison table.

0.3.10 (2026-02-21)#

  • Fixed: Read coordinate names (x, y, time) from STAC cube:dimensions metadata instead of hardcoding, enabling support for collections with non-standard dimension names.

  • Fixed: Longitude rotation in utils._get_cat_data_from_url used hardcoded "lon" instead of the actual coordinate name variable.

0.3.9 (2026-02-21)#

  • Added: Structured logging throughout all source modules, replacing ~86 print() calls with logger.<level>().

  • Added: Logging configuration guide in Getting Started docs and logging_demo.ipynb example notebook.

  • Fixed: Format spec {:0.04} corrected to {:0.4f} in zonal_engines.

  • Fixed: Removed leftover debug print() calls.

  • Fixed: Typo “anitmeridian” → “antimeridian” in utils.

  • CI: Added cross-platform conda-forge dependency validation job.

  • CI: Updated lockfile for yanked virtualenv 20.37.0 → 20.38.0.

  • CI: Fixed pip-audit to use --skip-editable instead of --strict (editable local installs aren’t on PyPI).

  • Security: Upgraded bokeh (CVE-2026-21883), pillow (CVE-2026-25990), and nbconvert (CVE-2025-53000).

0.3.8 (2026-02-18)#

  • Fixed: Pydantic ValidationError in CatClimRItem when asset or scenario fields contain float('nan') from the ClimateR catalog parquet.

  • Changed: Switched test fixtures and expected values from TerraClimate to gridMET (tmmx/tmmn/pr) after upstream TerraClimate v1.1 reprocessing broke expected values.

  • Changed: Replaced TerraClimate example notebooks with gridMET equivalents: gridmet_drb_polygon.ipynb (was terraclime_et.ipynb) and gridmet_grid_to_line.ipynb (was Terraclimate-Grid-to-Line.ipynb).

  • Documentation: Expanded NHGFStacData documentation to describe the factory + subclass hierarchy (NHGFStacZarrData, NHGFStacTiffData) with autoclass directives and usage examples for both Zarr and GeoTIFF collections.

  • Documentation: Added NLCD notebook to table of contents and examples table in Getting Started guide.

  • Documentation: Added all user data classes to the API Reference autosummary.

0.3.7 (2026-02-16)#

  • Added: GeoTIFF-backed STAC collection support via new NHGFStacTiffData class. NHGFStacData now auto-detects collection format (Zarr or GeoTIFF) and returns the appropriate handler transparently.

  • Added: NHGFStacData_NLCD.ipynb example notebook demonstrating categorical zonal statistics of NLCD land cover classes for HUC12 basins.

  • Added: _stac_href_to_url() helper for robust S3/HTTPS/bare-path URL construction from STAC asset hrefs.

  • Added: Fast-path direct URL lookup in get_stac_collection() to avoid slow recursive STAC catalog traversal and API rate limiting.

  • Added: Per-link error handling in get_stac_collection() to skip broken STAC child links instead of aborting the entire traversal.

  • Performance: Remote GeoTIFFs are opened with rasterio windowed reading, fetching only the tiles covering the target polygons via HTTP range requests instead of loading the full raster into memory.

  • Fixed: CI shell redirection bug where python>=${PYTHON_REQ} was parsed as file redirection in .gitlab-ci.yml.

  • Fixed: Timezone handling in STAC item selection now normalizes both sides to UTC instead of stripping timezone info.

  • Fixed: Updated test_interp_gen_with_climater expected values after TerraClimate v1.1 upstream reprocessing (ERA5, 2026-02-06).

  • Security: Upgraded 8 locked dependencies to resolve 15 CVEs (aiohttp, cryptography, distributed, filelock, pip, urllib3, virtualenv, wheel).

0.3.6 (2026-01-02)#

  • Added: GitLab CI job conda-forge-validate to test dependency resolution against conda-forge with strict channel priority, catching compatibility issues before feedstock submission.

  • Changed: Bumped minimum Python version from 3.10 to 3.11 due to exactextract>=0.3.0 dependency chain requiring pyproj>=3.7.2, which only has conda builds for Python 3.11+.

  • Changed: Updated pyproj minimum version from 3.3.0 to 3.7.2 to match exactextract requirements.

  • Changed: Removed upper bounds on dependencies for better conda-forge compatibility:

    • xarray>=2024.7.0 (removed <2025.0.0)

    • rasterio>=1.2.9 (removed <1.5.0)

    • rioxarray>=0.15 (removed <0.21)

    • pystac>=1.10 (removed <2.0)

    • statsmodels>=0.14 (removed <1.0)

    • fastparquet>=2024.2 (removed <=2024.5)

    • pyarrow>=10.0.0 (removed <18.0.0)

  • Changed: Migrated development setup from Poetry to UV exclusively - removed conda dependency from development workflow.

  • Fixed: Conda-forge packaging failures caused by strict channel priority and upper bound constraints.

  • Documentation: Updated README.md and environment-examples.yml to reflect Python 3.11+ requirement.

0.3.5 (2026-01-01)#

  • Added: Spatial partitioning for parallel weight generation engine using Hilbert curves to improve cache locality and reduce memory fragmentation.

  • Added: Test coverage for calculate_weights(intersections=True) in polygon-to-polygon weight generation.

  • Added: @pytest.mark.slow markers for network-dependent tests (STAC catalog, notebooks).

  • Added: tests-full nox session to run all tests including slow ones; default tests session now skips slow tests (~2.5 min vs ~7 min).

  • Performance: Optimized area-weighted statistics methods in stats_methods.py:

    • MAWeightedMean and MAWeightedStd: 2-3x speedup by vectorizing masked array operations.

    • MASum, MAMin, MAMax: 2.5-11x speedup by replacing np.ma.masked_array with np.nansum, np.nanmin, np.nanmax.

    • SerialAgg.calc_agg: Reduced overhead by extracting single time slice before iterating polygon-by-polygon.

  • Fixed: source_poly_idx string indexing bug in WeightGenP2P where source_poly_idx[0] on a string returned first character instead of column name.

  • Removed: test_coverage_summary.py (zero functional value).

  • Removed: Redundant date format tests in test_serial.py (already covered in test_weight_agg_gen.py).

  • Changed: Renamed test functions in test_dask.py from test_parallel_* to test_dask_* for clarity.

0.3.4 (2025-12-31)#

  • Added: Vectorized _get_cells_poly_fast function in utils_optimized.py providing 100-280x speedup for regular projected grids with 1D coordinates.

  • Added: Memory-safe chunked mode (mode="chunked") for processing very large grids without exhausting memory.

  • Added: estimate_memory_gb() helper to predict memory requirements before processing.

  • Added: Benchmark script scripts/test_cells_poly_optimization.py for performance testing.

  • Fixed: Renovate configuration error by removing unsupported uv manager from matchManagers.

0.3.3 (2025-12-30)#

  • Added: exactextract zonal statistics engine for high-performance raster-to-polygon statistics using the exactextract library.

  • Added: Support for categorical data in exactextract engine with fraction-based output matching serial/parallel/dask engines.

  • Added: Exactextract examples in the zonal statistics notebook (docs/Examples/Rasters/zonal_stats.ipynb).

  • Added: Tests verifying exactextract output format consistency with existing engines.

  • Changed: Zonal statistics engines now output consistent column names across all engines:

    • Continuous: count, mean, std, min, 25%, 50%, 75%, max, sum

    • Categorical: integer category columns + count

0.3.2 (2025-12-30)#

  • Changed: Migrated from conda/poetry to uv for dependency management.

  • Performance: Optimized spatial intersection calculations in calc_weight_engines.py using contained/boundary partitioning with shapely.within() to skip expensive intersection() calls for fully contained polygons.

  • Performance: Optimized pixel weight calculation in zonal_engines.py with same contained/boundary partitioning strategy.

  • Fixed: Consistent n_jobs=-1 default behavior across all engine functions (Parallel and Dask now both default to cpu_count()/2).

  • Fixed: Coverage session failure in CI due to missing Cython source files.

  • Fixed: CVE-2025-53000 in pip-audit (ignored until nbconvert fix is released).

  • Fixed: rioxarray dependency updated to >=0.15,<0.21.

  • CI: Temporarily disabled mypy and lint jobs pending uv migration stabilization.

0.3.1 (2025-12-18)#

  • Added: get_stac_collection() helper function to fetch collections from the NHGF STAC catalog with recursive search.

  • Added: STACCatalogError exception class for STAC catalog access errors.

  • Fixed: urllib3 CVE-2025-66418 and CVE-2025-66471 by pinning urllib3>=2.6.0.

  • Fixed: STAC-dependent tests resilient to rate limiting and catalog unavailability using xfail.

  • Changed: Modernized .gitlab-ci.yml: consolidated to single stage, added dependency-aware caching, removed redundant apt-get upgrade.

  • Changed: Updated nox -s coverage to generate coverage.xml for GitLab CI artifacts.

  • Changed: Removed explicit pyogrio from environment files (transitive dependency via geopandas).

  • Changed: Switched from mamba to conda in CI (libmamba is now the default solver in miniforge3).

  • Documentation: Updated NHGF STAC example notebooks (NHGFStacData_CONUS404, NHGFStacData_Grid_to_Line) to use get_stac_collection() helper instead of direct pystac calls.

0.3.0 (2025-11-26)#

  • Added: gdptools.depreciation_utils.deprecate_kwargs helper so high-level classes retain backwards compatibility while emitting structured warnings for renamed parameters.

  • Changed: Standardized keyword names across the data-prep classes and documented the legacy aliases that now trigger deprecation warnings:

    • ClimRCatData: cat_dict source_cat_dict, f_feature target_gdf, id_feature target_id, period source_time_period.

    • UserCatData: ds source_ds, proj_ds source_crs, x_coord source_x_coord, y_coord source_y_coord, t_coord source_t_coord, var source_var, f_feature target_gdf, proj_feature target_crs, id_feature target_id, period source_time_period.

    • NHGFStacData: collection source_collection, var source_var, f_feature target_gdf, id_feature target_id, period source_time_period.

    • UserTiffData: ds source_ds, proj_ds source_crs, x_coord source_x_coord, y_coord source_y_coord, t_coord source_t_coord, var source_var, f_feature target_gdf, proj_feature target_crs, id_feature target_id, period source_time_period.

  • Changed: Updated the nox -s lint session to execute pre-commit run --all-files, ensuring local lint checks match the enforced pre-commit workflow.

  • Documentation: Expanded helper and data-class documentation to call out the canonical keyword names, the new deprecation behavior, and the preferred CRS guidance for weight generation.

0.2.20 (2024-XX-XX)#

  • Changed: Broadly relaxed dependencies

  • Changed: Bumped pydantic dependency to >= 2.0.0

0.2.18#

  • Added: NHGFStacData class to interface with the NHGF Stac Catalog.

  • Added: NHGFStacData example use cases to documentation.

  • Fixed: Bug in SerialAgg by loading subsetted data before aggregating, improving performance. This was the origin state it was inadvertently changed in a previous commit.

0.2.11 (2024-08-21)#

  • Changed: Updated categorical zonal stats to return the fraction of each category in each polygon.

  • Added: Precision parameter to zonal stats such that the number of significant digits in the output can be set.

0.2.10 (2024-07-18)#

  • Added: New class NHGFStacData as an interface to the NHGF Stac Catalog (still in development).

0.2.9 (2024-04-06)#

  • Added: Ability to specify precision of output

0.2.8 (2024-04-03)#

  • Added: sum and masked_sum statistical methods.

0.2.5 (2023-11-1)#

  • Fixed: Bug in WeightGenP2P. Target polygons are now dissolved by the specified target_poly_idx. The generated weights file should have a unique set of source ids for each target id.

0.2.2 (2023-08-08)#

  • Fixed: Bug in output of AggGen. “parallel” and “dask” engines were not writing the feature_ids with the output.

0.0.1 (2022-03-22)#

  • Added: Original starting version.

View complete version history

Quick Version Info#

Current version:

Recent Highlights#

New Features#

  • Enhanced parallel processing capabilities

  • Improved STAC catalog integration

  • Better error handling and validation

  • Expanded statistical functions

Performance Improvements#

  • Optimized spatial indexing

  • Reduced memory footprint for large datasets

  • Faster coordinate transformations

  • Improved Dask integration

Documentation Enhancements#

  • Comprehensive API documentation

  • Interactive examples and tutorials

  • Better error message explanations

  • Enhanced getting started guide

Installation Matrix#

Python Version

Status

Installation

3.9

✅ Supported

conda install gdptools

3.10

✅ Supported

conda install gdptools

3.11

✅ Supported

conda install gdptools

3.12

✅ Supported

conda install gdptools

3.13

⚠️ Testing

pip install gdptools

Dependencies#

Core Dependencies#

  • geopandas >= 0.12.0

  • pandas >= 1.5.0

  • numpy >= 1.21.0

  • xarray >= 2022.6.0

  • shapely >= 2.0.0

  • pyproj >= 3.4.0

Optional Dependencies#

  • dask[distributed] for distributed computing

  • exactextract for high-performance zonal statistics

  • bokeh for interactive visualizations

  • holoviews for advanced plotting

Platform Support#

Platform

Status

Notes

Linux

✅ Full Support

Recommended for production

macOS

✅ Full Support

Intel and Apple Silicon

Windows

⚠️ Limited

Some features may require WSL

Getting Help#

Contributing#

We welcome contributions! See our contributing guide for:

  • How to set up development environment

  • Code style guidelines

  • Testing requirements

  • Documentation standards