---
jupytext:
  formats: ipynb,md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.18.1
kernelspec:
  display_name: readthedocs
  language: python
  name: python3
---

# 3DEP Elevation Grid-to-line intersection

## Introduction

In this example, a set of NHD stream-segments accessed using the [HyRiver package](https://docs.hyriver.io/index.html#citation). A USGS 3DEP Elevation dataset is pulled in using the [ClimateR-Catalog](https://mikejohnson51.github.io/opendap.catalog/authors.html#citation). Finally, [gdptools](https://code.usgs.gov/wma/nhgf/toolsteam/gdptools) is used to interpolate the 3DEP elevation to the NHD stream-segments.

<div>
    <br>
<img src="../../assets/3DEP_grid2line.PNG" width="400"/>
</div>

## Geo Data Processing Tools (gdptools)

gdptools is a python package for calculating area-weighted statistics of grid-to-polygon. grid-to-line, and polygon-to-polygon interpolations. gdptools utilizes metadata from the OPeNDAP catalog to aid generalizing gdptools processing capabilities. There are also methods for calculating area-weighted statistics for any gridded dataset that can be opened by xarray, however some of the metadata encapsulated in the json entries will have to be added by the user. Examples can be found in Non-Catalog Examples.

## Open Source Python Tools

- [xarray](http://xarray.pydata.org/en/stable/index.html): Package to work with labelled multi-dimensional arrays.
- [geopandas](http://xarray.pydata.org/en/stable/index.html): Package to work with geospatial data.

## Data

- [3DEP](https://www.sciencebase.gov/catalog/item/4f70a58ce4b058caae3f8ddb): The USGS 3D Elevation Program (3DEP) Datasets are the primary elevation data product produced and distributed by the USGS.
- [NHDPlus_V2](https://www.sciencebase.gov/catalog/item/61f8b96bd34e622189c32a4e): National Hydrography Dataset Plus Version 2, a selection of flowline geometries from HUC 0708

## Maintenance Schedule

- The intent of this notebook is simply a demonstration of an application or workflow. As we progress in the build-out of NGHF project tools, this notebook may be updated or replaced. There is no other planned maintenance at this time.

## Description

This workflow demonstrates the grid to line interpoltion functionality from the gdptools package. The purpose of this tool is to attribute gridded data to line geometries. This is done by interpolating points along each line geometry and extracting the underlaying gridded data values at those points. From these point values, statistics are calculated and attributed to the original input lines.

The inputs to the line interpolation tool are a Geopandas geodataframe containing the line geometries, and a Pandas dataframe containing information about the gridMET gridded data. The line interpolation tool can process multi-dimensional xarray data, outputing the statistics organized by each time segment. Also, multiple gridded datasets can be processed, so for a gridMET example, three different climate variables can be accessed and processed.

The outputs are a dataframe containing the statistics for each line geometries calculated for each gridded dataset at each time segment and a geodataframe containing the interpolated point geometries which have the linearly interpolated values from the gridded dataset. Depending on the user's inputs, the sample points can be interpolated from the line geometry at regular intervals, or the line vertices can be used instead.

## Create conda environment

- gdptools-examples can be downloaded from the project [repo](https://code.usgs.gov/wma/nhgf/toolsteam/gdptools/-/raw/develop/environment-examples.yml?inline=false)

```{code-cell} ipython3
import warnings

import geoviews as gv
import pandas as pd
import pynhd
from pynhd import NLDI, WaterData

from gdptools import InterpGen, UserTiffData

warnings.filterwarnings("ignore")

gv.extension("bokeh")
```

### Create target lines

The target lines will be the vector dataset to which GDPTools will attribute with gridded data values.
The PyNHD package is used to access stream geometries upstream from the USGS stream gage 06730200.

```{code-cell} ipython3
nldi = NLDI()
station_id = "06730200"

flw_main = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamMain",
    source="flowlines",
    distance=1000,
)

nhdp_mr = WaterData("nhdflowline_network")
comids = [int(c) for c in flw_main.nhdplus_comid.to_list()]
nhd_mr_fl = nhdp_mr.byid("comid", comids)
# nhd_mr_fl.columns.to_list()
flw = pynhd.prepare_nhdplus(nhd_mr_fl, 0, 0, purge_non_dendritic=True) # .rename(columns={"comid": "ID"}) # , "tonode": "toID"
strm_segs = pynhd.topoogical_sort(flw, id_col="hydroseq", toid_col="dnhydroseq")

# copy the sorted comids into a list and remove the last element
# List will be used to sort ouput into topological order later
sort_ids = strm_segs[0][:-1]

# create a new dataframe with the sorted comids
mr_fl = nhd_mr_fl.set_index("hydroseq").loc[sort_ids]
mr_fl = mr_fl.reset_index()
mr_fl.head()
```

### Plot the target lines

```{code-cell} ipython3
gv.tile_sources.EsriTerrain * gv.Path(mr_fl, vdims="hydroseq").opts(
    color="hydroseq",
    cmap= "category20",
    width=800,
    height=450,
    line_width=5,
    show_legend=True,
    colorbar=True,
    title="NHDPlus MR Flowlines upstream of USGS Gage 06730200",
)
```

### Define source data

This is the gridded dataset from which GDPTools will be pulling values along the target lines.

```{code-cell} ipython3
import pprint
climater_cat = "https://github.com/mikejohnson51/climateR-catalogs/releases/download/June-2024/catalog.parquet"
cat = pd.read_parquet(climater_cat)

_id = "USGS 3DEP"
_asset = "10m CONUS DEM"
_vars = ["elevation"]
# create list of catalog parameters for each variable
cat_params = [
    cat.query("id == @_id & asset == @_asset & variable == @_vars").to_dict(orient="records")[0]
    for _var in _vars
]
# create dictionary of variable names and catalog parameters
cat_dict = dict(zip(_vars, cat_params))
pprint.pprint(cat_dict)
```

### Load the source data

```{code-cell} ipython3
import rioxarray as rxr
ds_3dep = rxr.open_rasterio(cat_dict["elevation"]["URL"])
ds_3dep
```

### Define `UserTiffData` object

This data class ingests the data used for zonal stats processing of geotiffs.
The inputs are:

- (ds): the gridded dataset as an Xarray object
- (proj_ds): the projection EPSG number for the gridded datset
- (x_coord and y_coord): the name on the X and Y coords of the gridded dataset
- (f_feature): the geodataframe containing the lines
- (id_feature): the name of the ID column of the lines table (there can be no duplicates!)

```{code-cell} ipython3
user_data = UserTiffData(
    source_ds=ds_3dep,
    source_crs=4269,
    source_x_coord="x",
    source_y_coord="y",
    target_gdf=mr_fl,
    target_id="hydroseq",
    bname="band",
    band=1,
    source_var="tiff",
)
```

#### Define `InterpGen` object and run the interpolation

The `InterpGen` class takes the gridded and vector datasets from the `UserTiffData` class and when the
`calc_interp` function is called, it preforms the grid to line interpolation regularly spaced intervals.
The inputs for the `InterpGen` class are:

- (user_data): an instance of a `UserData` class
- (pt_spacing): the spacing in meters inbetween the interpolation points
- (stat): the statistics to calculate from the interpolated points
- (interp_method): the Xarray `dataset.interp()` method for interpolating the sample points

```{code-cell} ipython3
interp_object1 = InterpGen(
    user_data,
    pt_spacing=100,
    stat='all',
    interp_method="linear"
)
stats, pts = interp_object1.calc_interp()
```

#### Plot the sub-setted 3DEP 10m DEM

This is the gridded dataset from which the interpolated plots have pull their values.

```{code-cell} ipython3
ds_ss = user_data.get_source_subset(key="").to_dataset(name="elevation")

import hvplot.xarray
ds_ss.elevation.hvplot.quadmesh(
    x="x",
    y="y",
    geo=True,
    project=True,
    rasterize=True,
    cmap="viridis",
    width=600,
    height=300,
    aspect="equal",
    clabel="3DEP 10m DEM",
    xlabel="Longitude",
    ylabel="Latitude",
    title="3DEP 10m DEM",
)
```

#### Display the grided values for each of the interpolated points

```{code-cell} ipython3
pts.head()
```

#### Display the statistics for each stream segment

```{code-cell} ipython3
stats.head()
```

#### Plot the mean elevation by stream segment

```{code-cell} ipython3
import geopandas as gpd
mr_fl = mr_fl.merge(stats, on="hydroseq", how='outer')
pmin = mr_fl["mean"].min()
pmax = mr_fl["mean"].max()
```

```{code-cell} ipython3
gv.tile_sources.EsriTerrain * gv.Path(
        mr_fl,
        vdims=["hydroseq", "mean", "median"]
    ).opts(
            width=800,
            height=450,
            aspect="equal",
            color="mean",
            cmap="viridis",
            line_width=5,
            colorbar=True,
            clim=(pmin, pmax),
            clabel="Mean Elevation (m)",
            title="Mean Elevation by Stream Segment",
            # tools=['hover']
        )
```

### Plot the interpolated data as a scatter plot by comid

```{code-cell} ipython3
import plotly.graph_objects as go
fig = go.Figure()
pts = pts.groupby("hydroseq")
for index in range(len(pts)):
    group = pts.get_group(sort_ids[index])
    fig.add_trace(go.Scatter(x=group["dist"], y=group["values"], mode='markers', marker=dict(size=5, color=group["values"], colorscale='Viridis', cmin=pmin, cmax=pmax), name=str(sort_ids[index])))
fig.update_layout(title="Stream-segment Interpolated Elevation 10m DEM", xaxis_title="Distance (m)", yaxis_title="Elevation (m)")
fig.show()
```

```{code-cell} ipython3

```