Polygon-to-polygon weight calculation.#

This notebook demonstrates generating weights, or creating a cross-walk between a source GeoDataFrame and a target GeoDataFrame. The example here demonstrates calculating the weights from HUC-12 to HUC-8 polygons encompassing the Delaware River Basin. A second notebook ./Extensive_vs_intensive_variables.ipynb demonstrates how to use the weights to interpolate values from HUC-12 to HUC-8 polygons, for both extensive and intensive variables.

Open Source Python Tools#

  • gdptools: Geo Data Processing Tools (GDPTools) a python package for geospatial interpolation.

  • Geopandas Geopandas a python package for geospatial dataframes.

Data#

  • pynhd pynhd provides access to several hydro-linked datasets including those used here: WaterData and NLDI. We use the NLDI to get the basin geometry upstream of USGS gage Delaware River ad Del Mem Bridge at Wilmington De (01482100). Using the geometry of that basin we can extract a set up HUC12 and HUC08 basins for demonstration purposes.

Maintenance Schedule#

  • This notebook is part of the gdptools demo’s and will be updated with new versions of gdptools.

Description#

A common workflow in geospatial hydrology is to interpolate geospatial values from one set of polygons to another. For example, watershed characteristics associated with HUC12’s could be interpolated to HUC08’s by using an area-weighted statistic. This notebook demonstrates a function of gdptools for calculating the weights or cross-walk for calculating that interpolation. for an example of how to use the weights to interpolate values from HUC-12 to HUC-8 polygons, see the notebook ./Extensive_vs_intensive_variables.ipynb.

Getting Started#

Directions for creating an environment to run this notebook.

Requirements:#

  • Conda-based package manager such as miniconda or mambaforge.

Create conda/mamba environment.#

conda env create -f environment-examples.yml
conda activate gdptools-examples
jupyter lab
import warnings

import hvplot.pandas
import pandas as pd
from bokeh.io import export_svgs
from pynhd import NLDI, WaterData
from pygeohydro import watershed
from gdptools import WeightGenP2P

warnings.filterwarnings("ignore")

Generate example data using pynhd functions#

wbd = watershed.WBD("huc4")
delaware_basin = wbd.byids(field="huc4", fids="0204")
huc12_basins = WaterData('wbd12').bygeom(delaware_basin.iloc[0].geometry)
huc12_basins = huc12_basins[huc12_basins['huc12'].str.startswith(('020401', '020402'))]
huc08_basins = WaterData('wbd08').bygeom(delaware_basin.iloc[0].geometry)
huc08_basins = huc08_basins[huc08_basins['huc8'].str.startswith(('0204'))]

Plot basins#

from holoviews.element.tiles import EsriTerrain
drb_12 = huc12_basins.hvplot(
    geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
    xlabel="longitude", ylabel="latitude",
    title="Delaware River HUC12 basins", xlim=(-78.0, -73.0), aspect='equal'
)
drb_08 = huc08_basins.hvplot(
    geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
    xlabel="longitude", ylabel="latitude",
    title="Delaware River HUC08 basins", xlim=(-78.0, -73.0), aspect='equal'
)
EsriTerrain() * drb_12 + EsriTerrain() * drb_08

Use gdptools WeightGenP2P to calculate weights or cross-walk between huc12 and huc8 polygons#

weight_gen = WeightGenP2P(
    target_poly=huc08_basins,
    target_poly_idx="huc8",
    source_poly=huc12_basins,
    source_poly_idx="huc12",
    method="serial",
    weight_gen_crs=6931,
    output_file='prl_out.csv',
    jobs=4
)
weights = weight_gen.calculate_weights()

Inspect weights#

When considering the calculated areal weights we are considering the intersection areas of the source polygons (huc12) with the target polygons (huc8). The weights are calculated as the area of the intersection between the source and target polygons divided by the area of the target polygon. This results in a normalized area weight that represents the fraction of the source polygon that falls within the target polygon, normalized by the area of the target polygon. We provide also provide the area of the source polygon that falls within the target polygon, and the area of the target polygon which is necessary when using the weights to interpolate extensive variables.

The weights are stored in a pandas DataFrame (weights) or written to a .csv (prl_out.csv) with the following columns:

  • source_id: The ID of the source polygon (huc12).

  • target_id: The ID of the target polygon (huc8).

  • source_area: The area of the source polygon (huc12_area).

  • target_area: The area of the target polygon (huc8_area).

  • area_weight: The area of the source polygon that falls within the target polygon.

  • normalized_area_weight: The area-weighted fraction of the source polygon that falls within the target polygon. The weights represent the fraction of the source polygon that overlaps with the target polygon, normalized by the area of the target polygon. For more context the normalized area weight is calculated as follows:

\[ \text{normalized\_area\_weight} = \frac{\text{area\_weight}}{\text{huc8\_area}} \]
weights
huc12 huc8 huc12_area huc8_area area_weight normalized_area_weight
0 020401040104 02040101 1.212940e+08 3.082427e+09 0.000000e+00 0.000000
1 020401020302 02040101 1.291692e+08 3.082427e+09 0.000000e+00 0.000000
2 020401020101 02040101 7.839078e+07 3.082427e+09 0.000000e+00 0.000000
3 020401040103 02040101 6.995639e+07 3.082427e+09 0.000000e+00 0.000000
4 020401010101 02040101 8.331299e+07 3.082427e+09 8.331299e+07 0.027028
... ... ... ... ... ... ...
702 020402070605 02040303 4.952600e+07 2.717515e+09 0.000000e+00 0.000000
703 020402070604 02040303 5.428396e+07 2.717515e+09 0.000000e+00 0.000000
704 020402070603 02040303 6.139302e+07 2.717515e+09 0.000000e+00 0.000000
705 020402070601 02040303 7.482310e+07 2.717515e+09 0.000000e+00 0.000000
706 020402040000 02040303 1.073500e+09 2.717515e+09 0.000000e+00 0.000000

707 rows × 6 columns

Check weights sum to 1.0#

A good check on the calculation of the weights is to ensure that the sum of the normalized area weights for each target polygon sums to 1.0. This indicates that the weights are correctly normalized and that the total area of the source polygons is fully accounted for within the target polygons.

pd.options.display.float_format = '{:,.2f}'.format
weights.groupby("huc8").sum("normalized_area_weight")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 2
      1 pd.options.display.float_format = '{:,.2f}'.format
----> 2 weights.groupby("huc8").sum("normalized_area_weight")

File ~/.asdf/installs/python/3.12.10/lib/python3.12/site-packages/pandas/core/groupby/groupby.py:3065, in GroupBy.sum(self, numeric_only, min_count, skipna, engine, engine_kwargs)
   3060 else:
   3061     # If we are grouping on categoricals we want unobserved categories to
   3062     # return zero, rather than the default of NaN which the reindexing in
   3063     # _agg_general() returns. GH #31422
   3064     with com.temp_setattr(self, "observed", True):
-> 3065         result = self._agg_general(
   3066             numeric_only=numeric_only,
   3067             min_count=min_count,
   3068             alias="sum",
   3069             npfunc=np.sum,
   3070             skipna=skipna,
   3071         )
   3073     return result

File ~/.asdf/installs/python/3.12.10/lib/python3.12/site-packages/pandas/core/groupby/groupby.py:1708, in GroupBy._agg_general(self, numeric_only, min_count, alias, npfunc, **kwargs)
   1698 @final
   1699 def _agg_general(
   1700     self,
   (...)   1706     **kwargs,
   1707 ):
-> 1708     result = self._cython_agg_general(
   1709         how=alias,
   1710         alt=npfunc,
   1711         numeric_only=numeric_only,
   1712         min_count=min_count,
   1713         **kwargs,
   1714     )
   1715     return result.__finalize__(self.obj, method="groupby")

File ~/.asdf/installs/python/3.12.10/lib/python3.12/site-packages/pandas/core/groupby/groupby.py:1777, in GroupBy._cython_agg_general(self, how, alt, numeric_only, min_count, **kwargs)
   1764 @final
   1765 def _cython_agg_general(
   1766     self,
   (...)   1773     # Note: we never get here with how="ohlc" for DataFrameGroupBy;
   1774     #  that goes through SeriesGroupBy
   1776     if not is_bool(numeric_only):
-> 1777         raise ValueError("numeric_only accepts only Boolean values")
   1779     data = self._get_data_to_aggregate(numeric_only=numeric_only, name=how)
   1781     def array_func(values: ArrayLike) -> ArrayLike:

ValueError: numeric_only accepts only Boolean values