TerraClimate Grid to Line Interpolation#

Introduction#

In this example, the ClimateR-catalog (CRC), developed by Mike Johnson [Johnson, 2023] is used to discover a gridded dataset of interest. The CRC encompasses a wealth of data sources and the metadata required to simplify the process of grid-to-polygon area weighted intersections. The CRC here uses a json catalog uses a schema with common metadata for geospatial analysis. In the example presented here we use monthly water evaporation amount from the dataset TerraClimate [Abatzoglou et al., 2018].

Geo Data Processing Tools (gdptools)#

gdptools is a python package for calculating area-weighted statistics of grid-to-polygon. grid-to-line, and polygon-to-polygon interpolations. gdptools utilizes metadata from the OPeNDAP catalog to aid generalizing gdptools processing capabilities. There are also methods for calculating area-weighted statistics for any gridded dataset that can be opened by xarray, however some of the metadata encapsulated in the json entries will have to be added by the user. Examples can be found in Non-Catalog Examples.

Open Source Python Tools#

  • xarray: Package to work with labelled multi-dimensional arrays.

  • geopandas: Package to work with geospatial data.

Data#

  • gridMET: A dataset of daily high-spatial resolution (~4-km, 1/24th degree) surface meteorological data covering the contiguous US from 1979-yesterday.

  • NHDPlus_V2: National Hydrography Dataset Plus Version 2, a selection of flowline geometries from HUC 0708

Maintenance Schedule#

  • The intent of this notebook is simply a demonstration of an application or workflow. As we progress in the build-out of NGHF project tools, this notebook may be updated or replaced. There is no other planned maintenance at this time.

Description#

This workflow demonstrates the grid to line interpoltion functionality from the gdptools package. The purpose of this tool is to attribute gridded data to line geometries. This is done by interpolating points along each line geometry and extracting the underlaying gridded data values at those points. From this point values, statistics are calculated and attributed to the original input lines.

The inputs to the line interpolation tool are a Geopandas geodataframe containing the line geometries, and a Pandas dataframe containing information about the gridMET gridded data. The line interpolation tool can process multi-dimensional xarray data, outputing the statistics organized by each time segment. Also, multiple gridded datasets can be processed, so for the gridMET example, three different climate variables will be accessed and processed.

The outputs are a dataframe containing the statistics for each line geometries calculated for each gridded dataset at each time segment, as well as a geodataframe containing the interpolated point geometries which have the linearly interpolated values from the gridded dataset. Depending on the user’s inputs, the sample points can be interpolated from the line geometry at regular intervals, or the line vertices can be used instead.

Getting Started#

Directions for creating an environment to run this notebook.

Requirements#

  • Linux OS or Windows OS

  • Conda-based package manager such as miniconda or mamba

  • A copy of the NHDFlowline.shp file which can be retrieved from the sciencebase link above under Data

Create conda environment#

import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
import pandas as pd
import hvplot.pandas
import hvplot.xarray
import warnings
import xarray as xr

from gdptools import ClimRCatData
from gdptools import InterpGen

warnings.filterwarnings("ignore")

Get lines to query data#

Open a lines shapefile as a geodataframe. Use these vector geometries to access underlaying gridded data values and calculate statistics for.

# Open line shapefile
gdf = gpd.read_file('./test_lines/test_lines.shp')

# Select just a few flowlines
gdf = gdf[(gdf.permanent_ == '155843758') | (gdf.permanent_ == '155843335') | (gdf.permanent_ == '155842105')]

Query the climateR-catalog#

The variables aet, pet, PDSI are available in the TerraClimate dataset. Pandas is used to query the dataset.

  • In the following steps the json catalog files are imported into pandas.

  • Using the pandas query function to search for the TerraClimate dataset for each variable of interest and return the json metadata for the parameter and grid catalog entries.

climater_cat = "https://mikejohnson51.github.io/climateR-catalogs/catalog.json"
cat = pd.read_json(climater_cat)
cat.head()
id asset URL type varname variable description units model ensemble ... Xn Y1 Yn resX resY ncols nrows crs toptobottom tiled
0 maca_day agg_macav2metdata_huss_BNU-ESM_r1i1p1_historic... http://thredds.northwestknowledge.net:8080/thr... opendap specific_humidity huss Daily Mean Near-Surface Specific Humidity kg kg-1 BNU-ESM r1i1p1 ... -67.0648 25.0631 49.396 0.0417 0.0417 1386.0 585.0 +proj=longlat +a=6378137 +f=0.0033528106647474... 1.0 T
1 maca_day agg_macav2metdata_huss_CNRM-CM5_r1i1p1_histori... http://thredds.northwestknowledge.net:8080/thr... opendap specific_humidity huss Daily Mean Near-Surface Specific Humidity kg kg-1 CNRM-CM5 r1i1p1 ... -67.0648 25.0631 49.396 0.0417 0.0417 1386.0 585.0 +proj=longlat +a=6378137 +f=0.0033528106647474... 1.0 T
2 maca_day agg_macav2metdata_huss_CSIRO-Mk3-6-0_r1i1p1_hi... http://thredds.northwestknowledge.net:8080/thr... opendap specific_humidity huss Daily Mean Near-Surface Specific Humidity kg kg-1 CSIRO-Mk3-6-0 r1i1p1 ... -67.0648 25.0631 49.396 0.0417 0.0417 1386.0 585.0 +proj=longlat +a=6378137 +f=0.0033528106647474... 1.0 T
3 maca_day agg_macav2metdata_huss_bcc-csm1-1_r1i1p1_histo... http://thredds.northwestknowledge.net:8080/thr... opendap specific_humidity huss Daily Mean Near-Surface Specific Humidity kg kg-1 bcc-csm1-1 r1i1p1 ... -67.0648 25.0631 49.396 0.0417 0.0417 1386.0 585.0 +proj=longlat +a=6378137 +f=0.0033528106647474... 1.0 T
4 maca_day agg_macav2metdata_huss_CanESM2_r1i1p1_historic... http://thredds.northwestknowledge.net:8080/thr... opendap specific_humidity huss Daily Mean Near-Surface Specific Humidity kg kg-1 CanESM2 r1i1p1 ... -67.0648 25.0631 49.396 0.0417 0.0417 1386.0 585.0 +proj=longlat +a=6378137 +f=0.0033528106647474... 1.0 T

5 rows × 28 columns

Pandas query function example#

_id = "terraclim"
_varname = "aet"

# an example query returns a pandas dataframe.
tc = cat.query(
    "id == @_id & variable == @_varname"
)
tc
id asset URL type varname variable description units model ensemble ... Xn Y1 Yn resX resY ncols nrows crs toptobottom tiled
1374 terraclim agg_terraclimate_aet_1958_CurrentYear_GLOBE http://thredds.northwestknowledge.net:8080/thr... opendap aet aet water_evaporation_amount mm NaN NaN ... 179.9792 89.9792 -89.9792 0.0417 0.0417 8640.0 4320.0 +proj=longlat +a=6378137 +f=0.0033528106647474... 0.0

1 rows × 28 columns

Query the climateR-catalog for variables aet, pet, PDSI#

Query the catalog for each variable of interest. The pandas query() function returns a pandas DataFrame but we want just the dictionary representation of the json catalog entry so it’s converted to a dictionary with format dict[str, dict[str, Any]], where the first key is the variable of interest and the value is a dictionary of the resulting climateR-catalog data.

# Create a dictionary of parameter dataframes for each variable
tvars = ["aet", "pet", "PDSI"]
cat_params = [cat.query("id == @_id & variable == @_var").to_dict(orient="records")[0] for _var in tvars]

cat_dict = dict(zip(tvars, cat_params))

# Output an example of the cat_param.json entry for "aet".
cat_dict.get("aet")
{'id': 'terraclim',
 'asset': 'agg_terraclimate_aet_1958_CurrentYear_GLOBE',
 'URL': 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_terraclimate_aet_1958_CurrentYear_GLOBE.nc',
 'type': 'opendap',
 'varname': 'aet',
 'variable': 'aet',
 'description': 'water_evaporation_amount',
 'units': 'mm',
 'model': nan,
 'ensemble': nan,
 'scenario': 'total',
 'T_name': 'time',
 'duration': '1958-01-01/2021-12-01',
 'interval': '1 months',
 'nT': 768.0,
 'X_name': 'lon',
 'Y_name': 'lat',
 'X1': -179.9792,
 'Xn': 179.9792,
 'Y1': 89.9792,
 'Yn': -89.9792,
 'resX': 0.0417,
 'resY': 0.0417,
 'ncols': 8640.0,
 'nrows': 4320.0,
 'crs': '+proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs',
 'toptobottom': 0.0,
 'tiled': ''}

Initiate a ClimateR Catalog Data object#

The ClimRCatData class takes the geodataframe and dictionary created above, which contain all the necessary information for accessing and processing the gridded datasets.

# ClimateR Data object, same data class used with polygon aggregation
user_data = ClimRCatData(
    cat_dict=cat_dict,
    f_feature=gdf,
    id_feature='permanent_',
    period=["1980-01-01", "1980-12-31"]
)
user_data.cat_dict['aet']
{'id': 'terraclim',
 'asset': 'agg_terraclimate_aet_1958_CurrentYear_GLOBE',
 'URL': 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_terraclimate_aet_1958_CurrentYear_GLOBE.nc',
 'type': 'opendap',
 'varname': 'aet',
 'variable': 'aet',
 'description': 'water_evaporation_amount',
 'units': 'mm',
 'model': nan,
 'ensemble': nan,
 'scenario': 'total',
 'T_name': 'time',
 'duration': '1958-01-01/2021-12-01',
 'interval': '1 months',
 'nT': 768.0,
 'X_name': 'lon',
 'Y_name': 'lat',
 'X1': -179.9792,
 'Xn': 179.9792,
 'Y1': 89.9792,
 'Yn': -89.9792,
 'resX': 0.0417,
 'resY': 0.0417,
 'ncols': 8640.0,
 'nrows': 4320.0,
 'crs': '+proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs',
 'toptobottom': 0.0,
 'tiled': ''}

Initialize an InterpGen object#

This kicks off the line interpolation alogrithm.

# InterpGen object
interpObject1 = InterpGen(user_data, pt_spacing=100, stat='all')

Lastly, the calculations actually occur when the calc_interp function is run on the InterGen object. The outputs are two dataframes, one containing the stats for the lines and the other containing the points used to sample the gridded data.

# calculate the stats and interpolated points
stats1, pts1 = interpObject1.calc_interp()

Filter output data#

Since we processed three different gridded datasets, each of which contained data for seven days, and three line geometries, we need to filter the data down to one of each before we display it.

# Select subset of data to display
time = '1980-10-01'
ID = '155842105'
varname = 'aet'
varname2 = 'aet'
# Define column names
ID_column = 'permanent_'
stat_time_column = 'date'
stat_varname_column = 'varname'

# Narrow down pts to display
pts = pts1.loc[time][pts1.loc[time][ID_column]==ID]
pts = pts[pts['varname']==varname]

# Select gridded data by variable name
da = interpObject1._interp_data[varname2][ID].da

With in the InterpGen object that was create, there is an attribute called interp_data. Within this attribute is stored the clipped gridded data (da) from which the values were extracted at the interpolated sample points.

# 'Hidden' gridded data which values are extracted from
da
<xarray.DataArray 'aet' (time: 12, lat: 6, lon: 6)>
dask.array<getitem, shape=(12, 6, 6), dtype=float32, chunksize=(12, 6, 6), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float64 42.73 42.69 42.65 42.6 42.56 42.52
  * lon      (lon) float64 -93.73 -93.69 -93.65 -93.6 -93.56 -93.52
  * time     (time) datetime64[ns] 1980-01-01 1980-02-01 ... 1980-12-01
Attributes:
    units:              mm
    description:        Actual Evapotranspiration
    long_name:          water_evaporation_amount
    standard_name:      water_evaporation_amount
    dimensions:         lon lat time
    grid_mapping:       crs
    coordinate_system:  WGS84,EPSG:4326
    _ChunkSizes:        [   1  720 1440]

Display Data#

Below is a simple plot showing the values of the inerpolated points plotted against the distance of the points.

# Plot values of the interpolated points
pts.plot(kind='scatter', x = 'dist', y='values')
<Axes: xlabel='dist', ylabel='values'>
../../_images/ee831a149f97f990f3fa3648189bafe844a86f8d8dfe60475fe0af317eabdf9e.png

Here the stats dataframe is filtered to show the line statistics for our selected time, line and grid.

# Get stats for line by select for time, geometry, and grid variable
stats1[(stats1[stat_time_column]==time) & (stats1[ID_column]==ID) & (stats1[stat_varname_column]==varname)]
date permanent_ varname mean median std min max
9 1980-10-01 155842105 aet 60.353931 60.404891 0.899203 58.662914 61.844751

And here is an interactivate map displaying the gridded data, interpolated points and line geometry.

# Display data
from holoviews.element.tiles import EsriImagery

line = gdf[gdf[ID_column]==ID].to_crs(4326).hvplot(
    geo=True, frame_width=600, frame_height=500, alpha=1, #cmap='viridis', clim=(60, 66),
    xlabel="longitude", ylabel="latitude", xlim=(-94, -93),
)

sample_points = pts.to_crs(4326).hvplot(
    c='values', geo=True, frame_width=600, frame_height=500, alpha=1, # cmap='viridis', clim=(60, 66),
    xlabel="longitude", ylabel="latitude", clabel=varname, xlim=(-94, -93),
    title="Interpolated Points", aspect='equal'
)
gridded_data = da.isel(time=9).hvplot.quadmesh(
    'lon', 'lat', varname,  alpha=1, grid=True, geo=True, frame_width=500, cmap='viridis', clim=(55, 65),
     xlabel="longitude", ylabel="latitude", clabel=varname, xlim=(-94, -93),
    title=varname, aspect='equal'
)

(EsriImagery() * gridded_data * line * sample_points)