gridMET Grid to Line Interpolation#

Introduction#

In this example, the ClimateR-catalog (CRC), developed by Mike Johnson [Johnson, 2023] is used to discover a gridded dataset of interest. The CRC encompasses a wealth of data sources and the metadata required to simplify the process of grid-to-polygon area weighted intersections. The CRC here uses a parquet catalog with a schema of common metadata for geospatial analysis. In the example presented here we use daily maximum temperature, daily minimum temperature, and precipitation from the dataset gridMET [Abatzoglou et al., 2018].

Geo Data Processing Tools (gdptools)#

gdptools is a python package for calculating area-weighted statistics of grid-to-polygon. grid-to-line, and polygon-to-polygon interpolations. gdptools utilizes metadata from the OPeNDAP catalog to aid generalizing gdptools processing capabilities. There are also methods for calculating area-weighted statistics for any gridded dataset that can be opened by xarray, however some of the metadata encapsulated in the json entries will have to be added by the user. Examples can be found in Non-Catalog Examples.

Open Source Python Tools#

  • xarray: Package to work with labelled multi-dimensional arrays.

  • geopandas: Package to work with geospatial data.

Data#

  • gridMET: A dataset of daily high-spatial resolution (~4-km, 1/24th degree) surface meteorological data covering the contiguous US from 1979-yesterday.

  • NHDPlus_V2: National Hydrography Dataset Plus Version 2, a selection of flowline geometries from HUC 0708

Maintenance Schedule#

  • The intent of this notebook is simply a demonstration of an application or workflow. As we progress in the build-out of NGHF project tools, this notebook may be updated or replaced. There is no other planned maintenance at this time.

Description#

This workflow demonstrates the grid to line interpoltion functionality from the gdptools package. The purpose of this tool is to attribute gridded data to line geometries. This is done by interpolating points along each line geometry and extracting the underlaying gridded data values at those points. From this point values, statistics are calculated and attributed to the original input lines.

The inputs to the line interpolation tool are a Geopandas geodataframe containing the line geometries, and a Pandas dataframe containing information about the gridMET gridded data. The line interpolation tool can process multi-dimensional xarray data, outputing the statistics organized by each time segment. Also, multiple gridded datasets can be processed, so for the gridMET example, three different climate variables will be accessed and processed.

The outputs are a dataframe containing the statistics for each line geometries calculated for each gridded dataset at each time segment, as well as a geodataframe containing the interpolated point geometries which have the linearly interpolated values from the gridded dataset. Depending on the user’s inputs, the sample points can be interpolated from the line geometry at regular intervals, or the line vertices can be used instead.

Create Conda Environment#

  1. Download the gdptools-examples environment file from here.

  2. Navigate to the folder containing the environment.yml file:

    cd gdptools
    
  3. Create the Conda environment:

    conda env create -f gdptools-examples.yml
    
  4. Activate the Conda environment:

    conda activate gdptools-examples
    
  5. Navigate to the folder containing the notebook:

    cd docs/Examples
    
  6. Run Jupyter Notebook:

    jupyter notebook
    
import warnings

import geopandas as gpd
import pandas as pd

from gdptools import ClimRCatData, InterpGen

warnings.filterwarnings("ignore")

Get lines to query data#

Open a lines shapefile as a geodataframe. Use these vector geometries to access underlaying gridded data values and calculate statistics for.

# Open line shapefile
gdf = gpd.read_file('./test_lines/test_lines_4326.shp')

# Select just a few flowlines
gdf = gdf[(gdf.permanent_ == '155843758') | (gdf.permanent_ == '155843335') | (gdf.permanent_ == '155842105')]
gdf.plot()

Query the climateR-catalog#

The variables tmmx, tmmn, pr are available in the gridMET dataset. Pandas is used to query the dataset.

  • In the following steps the parquet catalog file is imported into pandas.

  • Using the pandas query function to search for the gridMET dataset for each variable of interest and return the metadata for the parameter and grid catalog entries.

climater_cat = "https://github.com/mikejohnson51/climateR-catalogs/releases/download/June-2024/catalog.parquet"
cat = pd.read_parquet(climater_cat)
cat.head()

Pandas query function example#

_id = "gridmet"
_varname = "tmmx"

# an example query returns a pandas dataframe.
tc = cat.query("id == @_id & variable == @_varname")
tc

Query the climateR-catalog for variables tmmx, tmmn, pr#

Query the catalog for each variable of interest. The pandas query() function returns a pandas DataFrame but we want just the dictionary representation of the catalog entry so it’s converted to a dictionary with format dict[str, dict[str, Any]], where the first key is the variable of interest and the value is a dictionary of the resulting climateR-catalog data.

# Create a dictionary of parameter dataframes for each variable
tvars = ["tmmx", "tmmn", "pr"]
cat_params = [cat.query("id == @_id & variable == @_var").to_dict(orient="records")[0] for _var in tvars]

cat_dict = dict(zip(tvars, cat_params))

# Output an example of the cat_param entry for "tmmx".
cat_dict.get("tmmx")

Initiate a ClimateR Catalog Data object#

The ClimRCatData class takes the geodataframe and dictionary created above, which contain all the necessary information for accessing and processing the gridded datasets.

# ClimateR Data object, same data class used with polygon aggregation
user_data = ClimRCatData(
    source_cat_dict=cat_dict, target_gdf=gdf, target_id="permanent_", source_time_period=["1980-01-01", "1980-12-31"]
)
user_data.source_cat_dict['tmmx']

Initialize an InterpGen object#

This kicks off the line interpolation alogrithm.

# InterpGen object
interp_object = InterpGen(user_data, pt_spacing=100, stat='all')

Lastly, the calculations actually occur when the calc_interp function is run on the InterGen object. The outputs are two dataframes, one containing the stats for the lines and the other containing the points used to sample the gridded data.

# calculate the stats and interpolated points
stats1, pts1 = interp_object.calc_interp()
pts1

Filter output data#

Since we processed three different gridded datasets, each of which contained data for seven days, and three line geometries, we need to filter the data down to one of each before we display it.

# Select subset of data to display
time = '1980-10-01'
ID = '155842105'
varname = 'daily_maximum_temperature'
varname2 = 'tmmx'
# Define column names
ID_column = "permanent_"
stat_time_column = "date"
stat_varname_column = "varname"
mask = (pts1["day"] == time) & (pts1["permanent_"] == ID) & (pts1["varname"] == varname)
pts = pts1[mask]
# Select gridded data by variable name
da = interp_object._interp_data[varname2][ID].da
pts

With in the InterpGen object that was create, there is an attribute called interp_data. Within this attribute is stored the clipped gridded data (da) from which the values were extracted at the interpolated sample points.

# 'Hidden' gridded data which values are extracted from
da

Display Data#

Below is a simple plot showing the values of the inerpolated points plotted against the distance of the points.

# Plot values of the interpolated points
pts.plot(kind='scatter', x = 'dist', y='values')
stats1
# Get stats for line by select for time, geometry, and grid variable
stats1[(stats1[stat_time_column]==time) & (stats1[ID_column]==ID) & (stats1[stat_varname_column]==varname)]

And here is an interactivate map displaying the gridded data, interpolated points and line geometry.

# Display data
import holoviews as hv
from holoviews.element.tiles import EsriTerrain

hv.extension("bokeh")

# Define the plots
line = gdf[gdf[ID_column] == ID].hvplot(
    frame_width=600,
    frame_height=500,
    geo=True,
    alpha=0.8,
    xlabel="longitude",
    ylabel="latitude",
    color="red",
    line_width=3,
)

# For the sample points, try without the hatch parameters
sample_points = pts.hvplot.points(
    x="geometry",
    geo=True,
    frame_width=600,
    frame_height=500,
    alpha=0.8,
    c="values",
    cmap="viridis",
    xlabel="longitude",
    ylabel="latitude",
    colorbar=True,
    title="Interpolated Points",
    aspect="equal",
    size=50,
)

gridded_data = (
    da.rio.write_crs(4326)
    .isel(day=9)
    .hvplot.quadmesh(
        "lon",
        "lat",
        varname,
        alpha=0.5,
        geo=True,
        grid=False,
        frame_width=600,
        cmap="viridis",
        xlabel="longitude",
        ylabel="latitude",
        colorbar=True,
        title=varname,
        aspect="equal",
    )
)

# Combine plots using the * operator for overlay
overlay = EsriTerrain() * gridded_data * sample_points
# overlay = EsriTerrain() * line

# Display the overlay
overlay