Polygon-to-polygon weight calculation.

Polygon-to-polygon weight calculation.#

This notebook demonstrates generating weights, or creating a cross-walk between a source GeoDataFrame and a target GeoDataFrame. The example here demonstrates calculating the weights from HUC-12 to HUC-8 polygons encompassing the Delaware River Basin. A second notebook ./Extensive_vs_intensive_variables.ipynb demonstrates how to use the weights to interpolate values from HUC-12 to HUC-8 polygons, for both extensive and intensive variables.

Open Source Python Tools#

gdptools: Geo Data Processing Tools (GDPTools) a python package for geospatial interpolation.
Geopandas Geopandas a python package for geospatial dataframes.

Data#

pynhd pynhd provides access to several hydro-linked datasets including those used here: WaterData and NLDI. We use the NLDI to get the basin geometry upstream of USGS gage Delaware River ad Del Mem Bridge at Wilmington De (01482100). Using the geometry of that basin we can extract a set up HUC12 and HUC08 basins for demonstration purposes.
- WaterData
- NLDI

Maintenance Schedule#

This notebook is part of the gdptools demo’s and will be updated with new versions of gdptools.

Description#

A common workflow in geospatial hydrology is to interpolate geospatial values from one set of polygons to another. For example, watershed characteristics associated with HUC12’s could be interpolated to HUC08’s by using an area-weighted statistic. This notebook demonstrates a function of gdptools for calculating the weights or cross-walk for calculating that interpolation. for an example of how to use the weights to interpolate values from HUC-12 to HUC-8 polygons, see the notebook ./Extensive_vs_intensive_variables.ipynb.

Getting Started#

Directions for creating an environment to run this notebook.

Requirements:#

Conda-based package manager such as miniconda or mambaforge.

Create conda/mamba environment.#

Download environment-examples.yml
To create the conda environment follow the steps below.

conda env create -f environment-examples.yml
conda activate gdptools-examples
jupyter lab

import warnings

import hvplot.pandas
import pandas as pd
from bokeh.io import export_svgs
from pynhd import NLDI, WaterData
from pygeohydro import watershed
from gdptools import WeightGenP2P

warnings.filterwarnings("ignore")

Generate example data using pynhd functions#

wbd = watershed.WBD("huc4")
delaware_basin = wbd.byids(field="huc4", fids="0204")
huc12_basins = WaterData('wbd12').bygeom(delaware_basin.iloc[0].geometry)
huc12_basins = huc12_basins[huc12_basins['huc12'].str.startswith(('020401', '020402'))]
huc08_basins = WaterData('wbd08').bygeom(delaware_basin.iloc[0].geometry)
huc08_basins = huc08_basins[huc08_basins['huc8'].str.startswith(('0204'))]

Plot basins#

from holoviews.element.tiles import EsriTerrain
drb_12 = huc12_basins.hvplot(
    geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
    xlabel="longitude", ylabel="latitude",
    title="Delaware River HUC12 basins", xlim=(-78.0, -73.0), aspect='equal'
)
drb_08 = huc08_basins.hvplot(
    geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
    xlabel="longitude", ylabel="latitude",
    title="Delaware River HUC08 basins", xlim=(-78.0, -73.0), aspect='equal'
)
EsriTerrain() * drb_12 + EsriTerrain() * drb_08

Use gdptools WeightGenP2P to calculate weights or cross-walk between huc12 and huc8 polygons#

weight_gen = WeightGenP2P(
    target_poly=huc08_basins,
    target_poly_idx="huc8",
    source_poly=huc12_basins,
    source_poly_idx="huc12",
    method="serial",
    weight_gen_crs=6931,
    output_file='prl_out.csv',
    jobs=4
)
weights = weight_gen.calculate_weights()

Inspect weights#

When considering the calculated areal weights we are considering the intersection areas of the source polygons (huc12) with the target polygons (huc8). The weights are calculated as the area of the intersection between the source and target polygons divided by the area of the target polygon. This results in a normalized area weight that represents the fraction of the source polygon that falls within the target polygon, normalized by the area of the target polygon. We provide also provide the area of the source polygon that falls within the target polygon, and the area of the target polygon which is necessary when using the weights to interpolate extensive variables.

The weights are stored in a pandas DataFrame (weights) or written to a .csv (prl_out.csv) with the following columns:

source_id: The ID of the source polygon (huc12).
target_id: The ID of the target polygon (huc8).
source_area: The area of the source polygon (huc12_area).
target_area: The area of the target polygon (huc8_area).
area_weight: The area of the source polygon that falls within the target polygon.
normalized_area_weight: The area-weighted fraction of the source polygon that falls within the target polygon. The weights represent the fraction of the source polygon that overlaps with the target polygon, normalized by the area of the target polygon. For more context the normalized area weight is calculated as follows:

\[ \text{normalized\_area\_weight} = \frac{\text{area\_weight}}{\text{huc8\_area}} \]

weights

	huc12	huc8	huc12_area	huc8_area	area_weight	normalized_area_weight
0	020401040104	02040101	1.212940e+08	3.082427e+09	0.000000e+00	0.000000
1	020401020302	02040101	1.291692e+08	3.082427e+09	0.000000e+00	0.000000
2	020401020101	02040101	7.839078e+07	3.082427e+09	0.000000e+00	0.000000
3	020401040103	02040101	6.995639e+07	3.082427e+09	0.000000e+00	0.000000
4	020401010101	02040101	8.331299e+07	3.082427e+09	8.331299e+07	0.027028
...	...	...	...	...	...	...
702	020402070605	02040303	4.952600e+07	2.717515e+09	0.000000e+00	0.000000
703	020402070604	02040303	5.428396e+07	2.717515e+09	0.000000e+00	0.000000
704	020402070603	02040303	6.139302e+07	2.717515e+09	0.000000e+00	0.000000
705	020402070601	02040303	7.482310e+07	2.717515e+09	0.000000e+00	0.000000
706	020402040000	02040303	1.073500e+09	2.717515e+09	0.000000e+00	0.000000

707 rows × 6 columns

Check weights sum to 1.0#

A good check on the calculation of the weights is to ensure that the sum of the normalized area weights for each target polygon sums to 1.0. This indicates that the weights are correctly normalized and that the total area of the source polygons is fully accounted for within the target polygons.

pd.options.display.float_format = '{:,.2f}'.format
weights.groupby("huc8").sum("normalized_area_weight")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 2
pd.options.display.float_format = '{:,.2f}'.format
----> 2 weights.groupby("huc8").sum("normalized_area_weight")

File ~/.asdf/installs/python/3.12.10/lib/python3.12/site-packages/pandas/core/groupby/groupby.py:3065, in GroupBy.sum(self, numeric_only, min_count, skipna, engine, engine_kwargs)
else:
   # If we are grouping on categoricals we want unobserved categories to
   # return zero, rather than the default of NaN which the reindexing in
   # _agg_general() returns. GH #31422
   with com.temp_setattr(self, "observed", True):
-> 3065         result = self._agg_general(
           numeric_only=numeric_only,
           min_count=min_count,
           alias="sum",
           npfunc=np.sum,
           skipna=skipna,
       )
   return result

File ~/.asdf/installs/python/3.12.10/lib/python3.12/site-packages/pandas/core/groupby/groupby.py:1708, in GroupBy._agg_general(self, numeric_only, min_count, alias, npfunc, **kwargs)
@final
def _agg_general(
   self,
   (...)   1706     **kwargs,
):
-> 1708     result = self._cython_agg_general(
       how=alias,
       alt=npfunc,
       numeric_only=numeric_only,
       min_count=min_count,
       **kwargs,
   )
   return result.__finalize__(self.obj, method="groupby")

File ~/.asdf/installs/python/3.12.10/lib/python3.12/site-packages/pandas/core/groupby/groupby.py:1777, in GroupBy._cython_agg_general(self, how, alt, numeric_only, min_count, **kwargs)
@final
def _cython_agg_general(
   self,
   (...)   1773     # Note: we never get here with how="ohlc" for DataFrameGroupBy;
   #  that goes through SeriesGroupBy
   if not is_bool(numeric_only):
-> 1777         raise ValueError("numeric_only accepts only Boolean values")
   data = self._get_data_to_aggregate(numeric_only=numeric_only, name=how)
   def array_func(values: ArrayLike) -> ArrayLike:

ValueError: numeric_only accepts only Boolean values