---
jupytext:
  formats: ipynb,md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.18.1
kernelspec:
  display_name: readthedocs
  language: python
  name: python3
---

# Polygon-to-polygon weight calculation.

This notebook demonstrates generating weights, or creating a cross-walk between a source GeoDataFrame and a target GeoDataFrame. The example here demonstrates calculating the weights from HUC-12 to HUC-8 polygons encompassing the Delaware River Basin. A second notebook [./Extensive_vs_intensive_variables.ipynb](./Extensive_vs_intensive_variables.ipynb) demonstrates how to use the weights to interpolate values from HUC-12 to HUC-8 polygons, for both extensive and intensive variables.

## Open Source Python Tools

- [gdptools](https://code.usgs.gov/wma/nhgf/toolsteam/gdptools): Geo Data Processing Tools (GDPTools) a python package for geospatial interpolation.
- [Geopandas](https://geopandas.org/en/stable/) Geopandas a python package for geospatial dataframes.

## Data

- [pynhd](https://docs.hyriver.io/readme/pynhd.html) pynhd provides access to several hydro-linked datasets including those used here: WaterData and NLDI. We use the NLDI to get the basin geometry upstream of USGS gage Delaware River ad Del Mem Bridge at Wilmington De (01482100). Using the geometry of that basin we can extract a set up HUC12 and HUC08 basins for demonstration purposes.
  - [WaterData](https://api.water.usgs.gov/geoserver/wmadata/ows?service=WFS&version=1.0.0&request=GetCapabilities)
  - [NLDI](https://waterdata.usgs.gov/blog/nldi-intro/)

## Maintenance Schedule

- This notebook is part of the gdptools demo's and will be updated with new versions of gdptools.

## Description

A common workflow in geospatial hydrology is to interpolate geospatial values from one set of polygons to another. For example, watershed characteristics associated with HUC12's could be interpolated to HUC08's by using an area-weighted statistic. This notebook demonstrates a function of gdptools for calculating the weights or cross-walk for calculating that interpolation. for an example of how to use the weights to interpolate values from HUC-12 to HUC-8 polygons, see the notebook [./Extensive_vs_intensive_variables.ipynb](./Extensive_vs_intensive_variables.ipynb).

## Getting Started

Directions for creating an environment to run this notebook.

### Requirements:

- Conda-based package manager such as miniconda or mambaforge.

### Create conda/mamba environment.

- Download [environment-examples.yml](https://code.usgs.gov/wma/nhgf/toolsteam/gdptools/-/raw/develop/environment-examples.yml?inline=false)
- To create the conda environment follow the steps below.

```shell
conda env create -f environment-examples.yml
conda activate gdptools-examples
jupyter lab
```

```{code-cell} ipython3
import warnings

import hvplot.pandas
import pandas as pd
from bokeh.io import export_svgs
from pynhd import NLDI, WaterData
from pygeohydro import watershed
from gdptools import WeightGenP2P

warnings.filterwarnings("ignore")
```

## Generate example data using [pynhd](https://docs.hyriver.io/readme/pynhd.html) functions

```{code-cell} ipython3
wbd = watershed.WBD("huc4")
delaware_basin = wbd.byids(field="huc4", fids="0204")
huc12_basins = WaterData('wbd12').bygeom(delaware_basin.iloc[0].geometry)
huc12_basins = huc12_basins[huc12_basins['huc12'].str.startswith(('020401', '020402'))]
huc08_basins = WaterData('wbd08').bygeom(delaware_basin.iloc[0].geometry)
huc08_basins = huc08_basins[huc08_basins['huc8'].str.startswith(('0204'))]
```

## Plot basins

```{code-cell} ipython3
from holoviews.element.tiles import EsriTerrain
drb_12 = huc12_basins.hvplot(
    geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
    xlabel="longitude", ylabel="latitude",
    title="Delaware River HUC12 basins", xlim=(-78.0, -73.0), aspect='equal'
)
drb_08 = huc08_basins.hvplot(
    geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
    xlabel="longitude", ylabel="latitude",
    title="Delaware River HUC08 basins", xlim=(-78.0, -73.0), aspect='equal'
)
EsriTerrain() * drb_12 + EsriTerrain() * drb_08
```

## Use gdptools WeightGenP2P to calculate weights or cross-walk between huc12 and huc8 polygons

```{code-cell} ipython3
weight_gen = WeightGenP2P(
    target_poly=huc08_basins,
    target_poly_idx="huc8",
    source_poly=huc12_basins,
    source_poly_idx="huc12",
    method="serial",
    weight_gen_crs=6931,
    output_file='prl_out.csv',
    jobs=4
)
weights = weight_gen.calculate_weights()
```

## Inspect weights

When considering the calculated areal weights we are considering the intersection areas of the source polygons (huc12) with the target polygons (huc8). The weights are calculated as the area of the intersection between the source and target polygons divided by the area of the target polygon. This results in a normalized area weight that represents the fraction of the source polygon that falls within the target polygon, normalized by the area of the target polygon.  We provide also provide the area of the source polygon that falls within the target polygon, and the area of the target polygon which is necessary when using the weights to interpolate extensive variables.

The weights are stored in a pandas DataFrame (`weights`) or written to a .csv (`prl_out.csv`) with the following columns:
- `source_id`: The ID of the source polygon (huc12).
- `target_id`: The ID of the target polygon (huc8).
- `source_area`: The area of the source polygon (huc12_area).
- `target_area`: The area of the target polygon (huc8_area).
-  `area_weight`: The area of the source polygon that falls within the target polygon.
- `normalized_area_weight`: The area-weighted fraction of the source polygon that falls within the target polygon.
The weights represent the fraction of the source polygon that overlaps with the target polygon, normalized by the area of the target polygon.  For more context the normalized area weight is calculated as follows:

$$
\text{normalized\_area\_weight} = \frac{\text{area\_weight}}{\text{huc8\_area}}
$$

```{code-cell} ipython3
weights
```

## Check weights sum to 1.0

A good check on the calculation of the weights is to ensure that the sum of the normalized area weights for each target polygon sums to 1.0. This indicates that the weights are correctly normalized and that the total area of the source polygons is fully accounted for within the target polygons.

```{code-cell} ipython3
pd.options.display.float_format = '{:,.2f}'.format
weights.groupby("huc8").sum("normalized_area_weight")
```

```{code-cell} ipython3

```
