3DEP Elevation Grid-to-line intersection#

Introduction#

In this example, a set of NHD stream-segments accessed using the HyRiver package. A USGS 3DEP Elevation dataset is pulled in using the ClimateR-Catalog. Finally, gdptools is used to interpolate the 3DEP elevation to the NHD stream-segments.


Geo Data Processing Tools (gdptools)#

gdptools is a python package for calculating area-weighted statistics of grid-to-polygon. grid-to-line, and polygon-to-polygon interpolations. gdptools utilizes metadata from the OPeNDAP catalog to aid generalizing gdptools processing capabilities. There are also methods for calculating area-weighted statistics for any gridded dataset that can be opened by xarray, however some of the metadata encapsulated in the json entries will have to be added by the user. Examples can be found in Non-Catalog Examples.

Open Source Python Tools#

  • xarray: Package to work with labelled multi-dimensional arrays.

  • geopandas: Package to work with geospatial data.

Data#

  • 3DEP: The USGS 3D Elevation Program (3DEP) Datasets are the primary elevation data product produced and distributed by the USGS.

  • NHDPlus_V2: National Hydrography Dataset Plus Version 2, a selection of flowline geometries from HUC 0708

Maintenance Schedule#

  • The intent of this notebook is simply a demonstration of an application or workflow. As we progress in the build-out of NGHF project tools, this notebook may be updated or replaced. There is no other planned maintenance at this time.

Description#

This workflow demonstrates the grid to line interpoltion functionality from the gdptools package. The purpose of this tool is to attribute gridded data to line geometries. This is done by interpolating points along each line geometry and extracting the underlaying gridded data values at those points. From these point values, statistics are calculated and attributed to the original input lines.

The inputs to the line interpolation tool are a Geopandas geodataframe containing the line geometries, and a Pandas dataframe containing information about the gridMET gridded data. The line interpolation tool can process multi-dimensional xarray data, outputing the statistics organized by each time segment. Also, multiple gridded datasets can be processed, so for a gridMET example, three different climate variables can be accessed and processed.

The outputs are a dataframe containing the statistics for each line geometries calculated for each gridded dataset at each time segment and a geodataframe containing the interpolated point geometries which have the linearly interpolated values from the gridded dataset. Depending on the user’s inputs, the sample points can be interpolated from the line geometry at regular intervals, or the line vertices can be used instead.

Create conda environment#

  • gdptools-examples can be downloaded from the project repo

import warnings

import geoviews as gv
import pandas as pd
import pynhd
from pynhd import NLDI, WaterData

from gdptools import InterpGen, UserTiffData

warnings.filterwarnings("ignore")

gv.extension("bokeh")

Create target lines#

The target lines will be the vector dataset to which GDPTools will attribute with gridded data values. The PyNHD package is used to access stream geometries upstream from the USGS stream gage 06730200.

nldi = NLDI()
station_id = "06730200"

flw_main = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamMain",
    source="flowlines",
    distance=1000,
)

nhdp_mr = WaterData("nhdflowline_network")
comids = [int(c) for c in flw_main.nhdplus_comid.to_list()]
nhd_mr_fl = nhdp_mr.byid("comid", comids)
# nhd_mr_fl.columns.to_list()
flw = pynhd.prepare_nhdplus(nhd_mr_fl, 0, 0, purge_non_dendritic=True) # .rename(columns={"comid": "ID"}) # , "tonode": "toID"
strm_segs = pynhd.topoogical_sort(flw, id_col="hydroseq", toid_col="dnhydroseq")

# copy the sorted comids into a list and remove the last element
# List will be used to sort ouput into topological order later
sort_ids = strm_segs[0][:-1]

# create a new dataframe with the sorted comids
mr_fl = nhd_mr_fl.set_index("hydroseq").loc[sort_ids]
mr_fl = mr_fl.reset_index()
mr_fl.head()
hydroseq geometry comid fdate resolution gnis_id gnis_name lengthkm reachcode flowdir ... qc_12 vc_12 qe_12 ve_12 lakefract surfarea rareahload rpuid vpuid enabled
0 550197285.0 MULTILINESTRING ((-105.63211 40.05203, -105.62... 2889852 2000-12-22T05:00:00Z Medium 178467 North Boulder Creek 0.373 10190005000122 With Digitized ... 0.376 -9998.00000 0.376 -9998.00000 1.0 50667.042715 10.004544 10c 10L 1
1 550124544.0 MULTILINESTRING ((-105.62824 40.05363, -105.62... 2889186 1999-05-28T04:00:00Z Medium 178467 North Boulder Creek 0.552 10190005000121 With Digitized ... 0.498 0.57306 0.498 0.57306 NaN NaN NaN 10c 10L 1
2 550099070.0 MULTILINESTRING ((-105.62207 40.05388, -105.62... 2889770 1999-05-28T04:00:00Z Medium 178467 North Boulder Creek 0.478 10190005000120 With Digitized ... 0.592 -9998.00000 0.592 -9998.00000 1.0 52090.136727 6.691098 10c 10L 1
3 550084363.0 MULTILINESTRING ((-105.61719 40.05556, -105.61... 2889188 1999-05-28T04:00:00Z Medium 178467 North Boulder Creek 0.396 10190005000119 With Digitized ... 0.659 0.62836 0.659 0.62836 NaN NaN NaN 10c 10L 1
4 550074675.0 MULTILINESTRING ((-105.61573 40.05258, -105.61... 2889772 1999-05-28T04:00:00Z Medium 178467 North Boulder Creek 0.257 10190005000118 With Digitized ... 0.785 -9998.00000 0.785 -9998.00000 1.0 48345.911611 4.708286 10c 10L 1

5 rows × 138 columns

Plot the target lines#

gv.tile_sources.EsriTerrain * gv.Path(mr_fl, vdims="hydroseq").opts(
    color="hydroseq",
    cmap= "category20",
    width=800,
    height=450,
    line_width=5,
    show_legend=True,
    colorbar=True,
    title="NHDPlus MR Flowlines upstream of USGS Gage 06730200",
)

Define source data#

This is the gridded dataset from which GDPTools will be pulling values along the target lines.

import pprint
climater_cat = "https://github.com/mikejohnson51/climateR-catalogs/releases/download/June-2024/catalog.parquet"
cat = pd.read_parquet(climater_cat)

_id = "USGS 3DEP"
_asset = "10m CONUS DEM"
_vars = ["elevation"]
# create list of catalog parameters for each variable
cat_params = [
    cat.query("id == @_id & asset == @_asset & variable == @_vars").to_dict(orient="records")[0]
    for _var in _vars
]
# create dictionary of variable names and catalog parameters
cat_dict = dict(zip(_vars, cat_params))
pprint.pprint(cat_dict)
{'elevation': {'T_name': nan,
               'URL': '/vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/13/TIFF/USGS_Seamless_DEM_13.vrt',
               'X1': -180.0005555560936,
               'X_name': nan,
               'Xn': 180.00000026239937,
               'Y1': -180.0005555560936,
               'Y_name': nan,
               'Yn': 72.0005555562949,
               'asset': '10m CONUS DEM',
               'crs': '+proj=longlat +datum=NAD83 +no_defs',
               'description': '10m CONUS DEM',
               'dim_order': nan,
               'duration': nan,
               'ensemble': nan,
               'id': 'USGS 3DEP',
               'interval': nan,
               'model': nan,
               'nT': nan,
               'ncols': 3888006.0,
               'nrows': 939612.0,
               'resX': 9.259259266022042e-05,
               'resY': 9.25925926599094e-05,
               'scenario': nan,
               'tiled': '',
               'toptobottom': None,
               'type': 'vrt',
               'units': 'meters',
               'variable': 'elevation',
               'varname': 'elevation'}}

Load the source data#

import rioxarray as rxr
ds_3dep = rxr.open_rasterio(cat_dict["elevation"]["URL"])
ds_3dep
<xarray.DataArray (band: 1, y: 939612, x: 3888006)> Size: 15TB
[3653217093672 values with dtype=float32]
Coordinates:
  * band         (band) int64 8B 1
  * y            (y) float64 8MB 72.0 72.0 72.0 72.0 ... -15.0 -15.0 -15.0 -15.0
  * x            (x) float64 31MB -180.0 -180.0 -180.0 ... 180.0 180.0 180.0
    spatial_ref  int64 8B 0
Attributes:
    _FillValue:    -999999.0
    scale_factor:  1.0
    add_offset:    0.0

Define UserTiffData object#

This data class ingests the data used for zonal stats processing of geotiffs. The inputs are:

  • (ds): the gridded dataset as an Xarray object

  • (proj_ds): the projection EPSG number for the gridded datset

  • (x_coord and y_coord): the name on the X and Y coords of the gridded dataset

  • (f_feature): the geodataframe containing the lines

  • (id_feature): the name of the ID column of the lines table (there can be no duplicates!)

user_data = UserTiffData(
    source_ds=ds_3dep,
    source_crs=4269,
    source_x_coord="x",
    source_y_coord="y",
    target_gdf=mr_fl,
    target_id="hydroseq",
    bname="band",
    band=1,
    source_var="tiff",
)

Define InterpGen object and run the interpolation#

The InterpGen class takes the gridded and vector datasets from the UserTiffData class and when the calc_interp function is called, it preforms the grid to line interpolation regularly spaced intervals. The inputs for the InterpGen class are:

  • (user_data): an instance of a UserData class

  • (pt_spacing): the spacing in meters inbetween the interpolation points

  • (stat): the statistics to calculate from the interpolated points

  • (interp_method): the Xarray dataset.interp() method for interpolating the sample points

interp_object1 = InterpGen(
    user_data,
    pt_spacing=100,
    stat='all',
    interp_method="linear"
)
stats, pts = interp_object1.calc_interp()

Plot the sub-setted 3DEP 10m DEM#

This is the gridded dataset from which the interpolated plots have pull their values.

ds_ss = user_data.get_source_subset(key="").to_dataset(name="elevation")

import hvplot.xarray
ds_ss.elevation.hvplot.quadmesh(
    x="x",
    y="y",
    geo=True,
    project=True,
    rasterize=True,
    cmap="viridis",
    width=600,
    height=300,
    aspect="equal",
    clabel="3DEP 10m DEM",
    xlabel="Longitude",
    ylabel="Latitude",
    title="3DEP 10m DEM",
)

Display the grided values for each of the interpolated points#

pts.head()
pt band spatial_ref values dist hydroseq geometry varname
0 0 1 0 1560.632596 0.0 550025946.0 POINT (-105.18739 40.05048) tiff
1 1 1 0 1560.127862 100.0 550025946.0 POINT (-105.18633 40.05041) tiff
2 2 1 0 1560.555034 200.0 550025946.0 POINT (-105.18528 40.05026) tiff
3 3 1 0 1559.350482 300.0 550025946.0 POINT (-105.18424 40.05007) tiff
4 4 1 0 1558.909865 400.0 550025946.0 POINT (-105.18319 40.05005) tiff

Display the statistics for each stream segment#

stats.head()
index hydroseq varname mean median std min max
0 0 550025946.0 tiff 1558.948776 1558.909865 1.177185 1556.995965 1560.632596
1 0 550026201.0 tiff 1562.467624 1562.329515 1.200751 1560.514595 1564.282919
2 0 550026469.0 tiff 1566.994821 1567.276414 1.405481 1564.551665 1569.605381
3 0 550026743.0 tiff 1570.061471 1569.926945 1.177150 1568.664712 1572.449691
4 0 550027025.0 tiff 1572.780278 1572.780278 0.221672 1572.558607 1573.001950

Plot the mean elevation by stream segment#

import geopandas as gpd
mr_fl = mr_fl.merge(stats, on="hydroseq", how='outer')
pmin = mr_fl["mean"].min()
pmax = mr_fl["mean"].max()
gv.tile_sources.EsriTerrain * gv.Path(
        mr_fl,
        vdims=["hydroseq", "mean", "median"]
    ).opts(
            width=800,
            height=450,
            aspect="equal",
            color="mean",
            cmap="viridis",
            line_width=5,
            colorbar=True,
            clim=(pmin, pmax),
            clabel="Mean Elevation (m)",
            title="Mean Elevation by Stream Segment",
            # tools=['hover']
        )

Plot the interpolated data as a scatter plot by comid#

import plotly.graph_objects as go
fig = go.Figure()
pts = pts.groupby("hydroseq")
for index in range(len(pts)):
    group = pts.get_group(sort_ids[index])
    fig.add_trace(go.Scatter(x=group["dist"], y=group["values"], mode='markers', marker=dict(size=5, color=group["values"], colorscale='Viridis', cmin=pmin, cmax=pmax), name=str(sort_ids[index])))
fig.update_layout(title="Stream-segment Interpolated Elevation 10m DEM", xaxis_title="Distance (m)", yaxis_title="Elevation (m)")
fig.show()