3DEP Elevation Grid-to-line intersection#
Introduction#
In this example, a set of NHD stream-segments accessed using the HyRiver package. A USGS 3DEP Elevation dataset is pulled in using the ClimateR-Catalog. Finally, gdptools is used to interpolate the 3DEP elevation to the NHD stream-segments.
Geo Data Processing Tools (gdptools)#
gdptools is a python package for calculating area-weighted statistics of grid-to-polygon. grid-to-line, and polygon-to-polygon interpolations. gdptools utilizes metadata from the OPeNDAP catalog to aid generalizing gdptools processing capabilities. There are also methods for calculating area-weighted statistics for any gridded dataset that can be opened by xarray, however some of the metadata encapsulated in the json entries will have to be added by the user. Examples can be found in Non-Catalog Examples.
Open Source Python Tools#
Data#
3DEP: The USGS 3D Elevation Program (3DEP) Datasets are the primary elevation data product produced and distributed by the USGS.
NHDPlus_V2: National Hydrography Dataset Plus Version 2, a selection of flowline geometries from HUC 0708
Maintenance Schedule#
The intent of this notebook is simply a demonstration of an application or workflow. As we progress in the build-out of NGHF project tools, this notebook may be updated or replaced. There is no other planned maintenance at this time.
Description#
This workflow demonstrates the grid to line interpoltion functionality from the gdptools package. The purpose of this tool is to attribute gridded data to line geometries. This is done by interpolating points along each line geometry and extracting the underlaying gridded data values at those points. From these point values, statistics are calculated and attributed to the original input lines.
The inputs to the line interpolation tool are a Geopandas geodataframe containing the line geometries, and a Pandas dataframe containing information about the gridMET gridded data. The line interpolation tool can process multi-dimensional xarray data, outputing the statistics organized by each time segment. Also, multiple gridded datasets can be processed, so for a gridMET example, three different climate variables can be accessed and processed.
The outputs are a dataframe containing the statistics for each line geometries calculated for each gridded dataset at each time segment and a geodataframe containing the interpolated point geometries which have the linearly interpolated values from the gridded dataset. Depending on the user’s inputs, the sample points can be interpolated from the line geometry at regular intervals, or the line vertices can be used instead.
Create conda environment#
gdptools-examples can be downloaded from the project repo
import warnings
import geoviews as gv
import pandas as pd
import pynhd
from pynhd import NLDI, WaterData
from gdptools import InterpGen, UserTiffData
warnings.filterwarnings("ignore")
gv.extension("bokeh")
Create target lines#
The target lines will be the vector dataset to which GDPTools will attribute with gridded data values. The PyNHD package is used to access stream geometries upstream from the USGS stream gage 06730200.
nldi = NLDI()
station_id = "06730200"
flw_main = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamMain",
source="flowlines",
distance=1000,
)
nhdp_mr = WaterData("nhdflowline_network")
comids = [int(c) for c in flw_main.nhdplus_comid.to_list()]
nhd_mr_fl = nhdp_mr.byid("comid", comids)
# nhd_mr_fl.columns.to_list()
flw = pynhd.prepare_nhdplus(nhd_mr_fl, 0, 0, purge_non_dendritic=True) # .rename(columns={"comid": "ID"}) # , "tonode": "toID"
strm_segs = pynhd.topoogical_sort(flw, id_col="hydroseq", toid_col="dnhydroseq")
# copy the sorted comids into a list and remove the last element
# List will be used to sort ouput into topological order later
sort_ids = strm_segs[0][:-1]
# create a new dataframe with the sorted comids
mr_fl = nhd_mr_fl.set_index("hydroseq").loc[sort_ids]
mr_fl = mr_fl.reset_index()
mr_fl.head()
| hydroseq | geometry | comid | fdate | resolution | gnis_id | gnis_name | lengthkm | reachcode | flowdir | ... | qc_12 | vc_12 | qe_12 | ve_12 | lakefract | surfarea | rareahload | rpuid | vpuid | enabled | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 550197285.0 | MULTILINESTRING ((-105.63211 40.05203, -105.62... | 2889852 | 2000-12-22T05:00:00Z | Medium | 178467 | North Boulder Creek | 0.373 | 10190005000122 | With Digitized | ... | 0.376 | -9998.00000 | 0.376 | -9998.00000 | 1.0 | 50667.042715 | 10.004544 | 10c | 10L | 1 |
| 1 | 550124544.0 | MULTILINESTRING ((-105.62824 40.05363, -105.62... | 2889186 | 1999-05-28T04:00:00Z | Medium | 178467 | North Boulder Creek | 0.552 | 10190005000121 | With Digitized | ... | 0.498 | 0.57306 | 0.498 | 0.57306 | NaN | NaN | NaN | 10c | 10L | 1 |
| 2 | 550099070.0 | MULTILINESTRING ((-105.62207 40.05388, -105.62... | 2889770 | 1999-05-28T04:00:00Z | Medium | 178467 | North Boulder Creek | 0.478 | 10190005000120 | With Digitized | ... | 0.592 | -9998.00000 | 0.592 | -9998.00000 | 1.0 | 52090.136727 | 6.691098 | 10c | 10L | 1 |
| 3 | 550084363.0 | MULTILINESTRING ((-105.61719 40.05556, -105.61... | 2889188 | 1999-05-28T04:00:00Z | Medium | 178467 | North Boulder Creek | 0.396 | 10190005000119 | With Digitized | ... | 0.659 | 0.62836 | 0.659 | 0.62836 | NaN | NaN | NaN | 10c | 10L | 1 |
| 4 | 550074675.0 | MULTILINESTRING ((-105.61573 40.05258, -105.61... | 2889772 | 1999-05-28T04:00:00Z | Medium | 178467 | North Boulder Creek | 0.257 | 10190005000118 | With Digitized | ... | 0.785 | -9998.00000 | 0.785 | -9998.00000 | 1.0 | 48345.911611 | 4.708286 | 10c | 10L | 1 |
5 rows × 138 columns
Plot the target lines#
gv.tile_sources.EsriTerrain * gv.Path(mr_fl, vdims="hydroseq").opts(
color="hydroseq",
cmap= "category20",
width=800,
height=450,
line_width=5,
show_legend=True,
colorbar=True,
title="NHDPlus MR Flowlines upstream of USGS Gage 06730200",
)
Define source data#
This is the gridded dataset from which GDPTools will be pulling values along the target lines.
import pprint
climater_cat = "https://github.com/mikejohnson51/climateR-catalogs/releases/download/June-2024/catalog.parquet"
cat = pd.read_parquet(climater_cat)
_id = "USGS 3DEP"
_asset = "10m CONUS DEM"
_vars = ["elevation"]
# create list of catalog parameters for each variable
cat_params = [
cat.query("id == @_id & asset == @_asset & variable == @_vars").to_dict(orient="records")[0]
for _var in _vars
]
# create dictionary of variable names and catalog parameters
cat_dict = dict(zip(_vars, cat_params))
pprint.pprint(cat_dict)
{'elevation': {'T_name': nan,
'URL': '/vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/13/TIFF/USGS_Seamless_DEM_13.vrt',
'X1': -180.0005555560936,
'X_name': nan,
'Xn': 180.00000026239937,
'Y1': -180.0005555560936,
'Y_name': nan,
'Yn': 72.0005555562949,
'asset': '10m CONUS DEM',
'crs': '+proj=longlat +datum=NAD83 +no_defs',
'description': '10m CONUS DEM',
'dim_order': nan,
'duration': nan,
'ensemble': nan,
'id': 'USGS 3DEP',
'interval': nan,
'model': nan,
'nT': nan,
'ncols': 3888006.0,
'nrows': 939612.0,
'resX': 9.259259266022042e-05,
'resY': 9.25925926599094e-05,
'scenario': nan,
'tiled': '',
'toptobottom': None,
'type': 'vrt',
'units': 'meters',
'variable': 'elevation',
'varname': 'elevation'}}
Load the source data#
import rioxarray as rxr
ds_3dep = rxr.open_rasterio(cat_dict["elevation"]["URL"])
ds_3dep
<xarray.DataArray (band: 1, y: 939612, x: 3888006)> Size: 15TB
[3653217093672 values with dtype=float32]
Coordinates:
* band (band) int64 8B 1
* y (y) float64 8MB 72.0 72.0 72.0 72.0 ... -15.0 -15.0 -15.0 -15.0
* x (x) float64 31MB -180.0 -180.0 -180.0 ... 180.0 180.0 180.0
spatial_ref int64 8B 0
Attributes:
_FillValue: -999999.0
scale_factor: 1.0
add_offset: 0.0Define UserTiffData object#
This data class ingests the data used for zonal stats processing of geotiffs. The inputs are:
(ds): the gridded dataset as an Xarray object
(proj_ds): the projection EPSG number for the gridded datset
(x_coord and y_coord): the name on the X and Y coords of the gridded dataset
(f_feature): the geodataframe containing the lines
(id_feature): the name of the ID column of the lines table (there can be no duplicates!)
user_data = UserTiffData(
source_ds=ds_3dep,
source_crs=4269,
source_x_coord="x",
source_y_coord="y",
target_gdf=mr_fl,
target_id="hydroseq",
bname="band",
band=1,
source_var="tiff",
)
Define InterpGen object and run the interpolation#
The InterpGen class takes the gridded and vector datasets from the UserTiffData class and when the
calc_interp function is called, it preforms the grid to line interpolation regularly spaced intervals.
The inputs for the InterpGen class are:
(user_data): an instance of a
UserDataclass(pt_spacing): the spacing in meters inbetween the interpolation points
(stat): the statistics to calculate from the interpolated points
(interp_method): the Xarray
dataset.interp()method for interpolating the sample points
interp_object1 = InterpGen(
user_data,
pt_spacing=100,
stat='all',
interp_method="linear"
)
stats, pts = interp_object1.calc_interp()
Plot the sub-setted 3DEP 10m DEM#
This is the gridded dataset from which the interpolated plots have pull their values.
ds_ss = user_data.get_source_subset(key="").to_dataset(name="elevation")
import hvplot.xarray
ds_ss.elevation.hvplot.quadmesh(
x="x",
y="y",
geo=True,
project=True,
rasterize=True,
cmap="viridis",
width=600,
height=300,
aspect="equal",
clabel="3DEP 10m DEM",
xlabel="Longitude",
ylabel="Latitude",
title="3DEP 10m DEM",
)
Display the grided values for each of the interpolated points#
pts.head()
| pt | band | spatial_ref | values | dist | hydroseq | geometry | varname | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 0 | 1560.632596 | 0.0 | 550025946.0 | POINT (-105.18739 40.05048) | tiff |
| 1 | 1 | 1 | 0 | 1560.127862 | 100.0 | 550025946.0 | POINT (-105.18633 40.05041) | tiff |
| 2 | 2 | 1 | 0 | 1560.555034 | 200.0 | 550025946.0 | POINT (-105.18528 40.05026) | tiff |
| 3 | 3 | 1 | 0 | 1559.350482 | 300.0 | 550025946.0 | POINT (-105.18424 40.05007) | tiff |
| 4 | 4 | 1 | 0 | 1558.909865 | 400.0 | 550025946.0 | POINT (-105.18319 40.05005) | tiff |
Display the statistics for each stream segment#
stats.head()
| index | hydroseq | varname | mean | median | std | min | max | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 550025946.0 | tiff | 1558.948776 | 1558.909865 | 1.177185 | 1556.995965 | 1560.632596 |
| 1 | 0 | 550026201.0 | tiff | 1562.467624 | 1562.329515 | 1.200751 | 1560.514595 | 1564.282919 |
| 2 | 0 | 550026469.0 | tiff | 1566.994821 | 1567.276414 | 1.405481 | 1564.551665 | 1569.605381 |
| 3 | 0 | 550026743.0 | tiff | 1570.061471 | 1569.926945 | 1.177150 | 1568.664712 | 1572.449691 |
| 4 | 0 | 550027025.0 | tiff | 1572.780278 | 1572.780278 | 0.221672 | 1572.558607 | 1573.001950 |
Plot the mean elevation by stream segment#
import geopandas as gpd
mr_fl = mr_fl.merge(stats, on="hydroseq", how='outer')
pmin = mr_fl["mean"].min()
pmax = mr_fl["mean"].max()
gv.tile_sources.EsriTerrain * gv.Path(
mr_fl,
vdims=["hydroseq", "mean", "median"]
).opts(
width=800,
height=450,
aspect="equal",
color="mean",
cmap="viridis",
line_width=5,
colorbar=True,
clim=(pmin, pmax),
clabel="Mean Elevation (m)",
title="Mean Elevation by Stream Segment",
# tools=['hover']
)
Plot the interpolated data as a scatter plot by comid#
import plotly.graph_objects as go
fig = go.Figure()
pts = pts.groupby("hydroseq")
for index in range(len(pts)):
group = pts.get_group(sort_ids[index])
fig.add_trace(go.Scatter(x=group["dist"], y=group["values"], mode='markers', marker=dict(size=5, color=group["values"], colorscale='Viridis', cmin=pmin, cmax=pmax), name=str(sort_ids[index])))
fig.update_layout(title="Stream-segment Interpolated Elevation 10m DEM", xaxis_title="Distance (m)", yaxis_title="Elevation (m)")
fig.show()