---
jupytext:
  formats: ipynb,md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.18.1
kernelspec:
  display_name: readthedocs
  language: python
  name: python3
---

# gdptools example without the OPeNDAP catalog - gridMET data

```{code-cell} ipython3
import logging
import warnings

import hvplot.pandas
import hvplot.xarray
import numpy as np
import pandas as pd
import xarray as xr
from pynhd import NLDI, WaterData
from pygeohydro import watershed

from gdptools import AggGen, UserCatData, WeightGen

warnings.filterwarnings("ignore")
```

## Create GeoDataFrame of Delaware River Basin HUC12s

In this cell we use clients to several https://api.water.usgs.gov tools:

- Tool: USGS [NLDI](https://waterdata.usgs.gov/blog/nldi-intro/) Client: [pynhd.NLDI](https://docs.hyriver.io/readme/pynhd.html)
- Tool: USGS [GeoServer](https://api.water.usgs.gov/geoserver/wmadata/ows?service=WFS&version=1.0.0&request=GetCapabilities) Client: [pynhd.WaterData](https://docs.hyriver.io/readme/pynhd.html)

More examples of using the HyRiver pynhd package can be found [here](https://docs.hyriver.io/examples/notebooks/nhdplus.html)

```{code-cell} ipython3
# Delaware River Basin HUC12s
# --------------------------------
wbd = watershed.WBD("huc4")
delaware_basin = wbd.byids(field="huc4", fids="0204")
huc12_basins = WaterData("wbd12").bygeom(delaware_basin.iloc[0].geometry)
huc12_basins = huc12_basins[huc12_basins["huc12"].str.startswith(("020401", "020402"))]
huc12_basins.head()
```

```{code-cell} ipython3
from holoviews.element.tiles import EsriTerrain

drb = huc12_basins.hvplot(
    geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
    xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)",
    title="Delaware River HUC12 basins", xlim=(-76.8, -73.8), aspect='equal'
)
EsriTerrain() * drb
```

## Load gridMET OPeNDAP endpoint with xarray

- Note currently the gridMET data require '#fillmismatch' appended to the end of the OPeNDAP endpoint. For [example](https://github.com/Unidata/netcdf4-python/issues/929#issuecomment-602091340)

```{code-cell} ipython3
gridmet_URL = [
    'http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_met_tmmx_1979_CurrentYear_CONUS.nc#fillmismatch',
    'http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_met_tmmn_1979_CurrentYear_CONUS.nc#fillmismatch'
]
data = xr.open_mfdataset(gridmet_URL)
data
```

## Calculate weights

- To determine an area-weighted statistic for each HUC12 polygon, the area of all grid cells (a Shapely Polygon object representing each cell of the grid).
- The weight of each cell contributing to a HUC12 polygon is the cell-intersected-area / HUC12 polygon-area.
- Not all all data readable by xarray are made the same. Therefor, when using non-OPeNDAP catalog endpoints, the user must supply some metadata about the xarray data and GeoDataFrame data.

### Enable logging

`gdptools` uses Python's standard `logging` module. By default it emits no
output. Configuring the `gdptools` logger before running weight generation and
aggregation lets you monitor progress, timing, and data dimensions throughout
the workflow. Here we set `gdptools` to `INFO` while keeping third-party
libraries at `WARNING` to avoid noisy output from rasterio, fiona, etc.

```{code-cell} ipython3
# Configure logging: gdptools at INFO, everything else at WARNING
logging.basicConfig(
    level=logging.WARNING,
    format="%(name)s | %(levelname)s | %(message)s",
    force=True,
)
logging.getLogger("gdptools").setLevel(logging.INFO)
```

```{code-cell} ipython3
data['crs'].attrs.get('spatial_ref')
```

```{code-cell} ipython3
# metadata for calculating weights

data_crs = data['crs'].attrs.get('spatial_ref')
x_coord = "lon"
y_coord = "lat"
t_coord = "day"
sdate = "1979-01-01"
edate = "1979-01-07"
var = ["daily_maximum_temperature", "daily_minimum_temperature"]
shp_crs = 4326
shp_poly_idx = "huc12"
wght_gen_crs = 6931

user_data = UserCatData(
    source_ds=data,
    source_crs=data_crs,
    source_x_coord=x_coord,
    source_y_coord=y_coord,
    source_t_coord=t_coord,
    source_var=var,
    target_gdf=huc12_basins,
    target_crs=shp_crs,
    target_id=shp_poly_idx,
    source_time_period=[sdate, edate]
)

wght_gen = WeightGen(
    user_data=user_data,
    method="serial",
    output_file='example_weights.csv',
    weight_gen_crs=6931,
)

wdf = wght_gen.calculate_weights()
```

To illustrate the format of the saved weights file it's opened here, however in the next step we use the returned datafame.

```{code-cell} ipython3
pd.read_csv('example_weights.csv')
```

## Run the grid-to-polygon area weighted average interpolation using the calculated weights.

- Future releases will add additional area-weighted statistics. Currently only the mean is returned.
- If more that one variable were available in the dataset then the run_weights function could be embedded in a loop that indexes each variable.

### run_weights Output

Run weighs returns:

- ngdf is the GeoDataFrame passed to run_weights. huc12_basins maybe reordered when running weights so the re-ordered GeoDataFrame is returned.
- vals are the area-weighted interpolated values ordered by the ngdf.index

```{code-cell} ipython3
agg_gen = AggGen(
    user_data=user_data,
    stat_method="masked_mean",
    agg_engine="serial",
    agg_writer="csv",
    weights='example_weights.csv',
    out_path='.',
    file_prefix="test_gm_nc"
)
ngdf, ds_out = agg_gen.calculate_agg()
```

```{code-cell} ipython3
ds_out
```

```{code-cell} ipython3
pd.read_csv('test_gm_nc.csv', index_col=0)
```

```{code-cell} ipython3
aggdata = agg_gen.agg_data
ds_ss = aggdata.get("daily_maximum_temperature").da
```

## Plot the gridded data and interpolated data

```{code-cell} ipython3
var = "daily_maximum_temperature"
time_step = "1979-01-07"
tmin = np.nanmin(ds_ss.sel(day=time_step).values)
tmax = np.nanmax(ds_ss.sel(day=time_step).values)
print(tmin,tmax)
```

```{code-cell} ipython3
ds_out[var].shape
```

```{code-cell} ipython3
ngdf[var] = ds_out[var].sel(time=time_step)
gridmet_agg = ngdf.to_crs(4326).hvplot(
    c='daily_maximum_temperature', geo=True, coastline='50m', frame_width=300, alpha=1, cmap='viridis', clim=(tmin, tmax),
    xlabel="longitude", ylabel="latitude", clabel="daily_maximum_temperature (K)", xlim=(-76.8, -73.8),
    title="HUC12 area-weighted average", aspect='equal'
)
gridmet_raw = ds_ss.isel(day=6).hvplot.quadmesh(
    'lon', 'lat', 'daily_maximum_temperature', cmap='viridis', alpha=1, grid=True, geo=True, coastline='50m', frame_width=300, clim=(tmin, tmax),
    xlabel="longitude", ylabel="latitude", clabel="daily_maximum_temperature (K)", xlim=(-76.8, -73.8),
    title="Gridmet daily max temperature", aspect='equal'
)
```

```{code-cell} ipython3
(EsriTerrain() * gridmet_raw) + (EsriTerrain() * gridmet_agg) + (EsriTerrain() * gridmet_raw * gridmet_agg)
```

```{code-cell} ipython3

```