TerraClimate Grid to Line Interpolation

TerraClimate Grid to Line Interpolation#

Introduction#

In this example, the ClimateR-catalog (CRC), developed by Mike Johnson [Johnson, 2023] is used to discover a gridded dataset of interest. The CRC encompasses a wealth of data sources and the metadata required to simplify the process of grid-to-polygon area weighted intersections. The CRC here uses a json catalog uses a schema with common metadata for geospatial analysis. In the example presented here we use monthly water evaporation amount from the dataset TerraClimate [Abatzoglou et al., 2018].

Geo Data Processing Tools (gdptools)#

gdptools is a python package for calculating area-weighted statistics of grid-to-polygon. grid-to-line, and polygon-to-polygon interpolations. gdptools utilizes metadata from the OPeNDAP catalog to aid generalizing gdptools processing capabilities. There are also methods for calculating area-weighted statistics for any gridded dataset that can be opened by xarray, however some of the metadata encapsulated in the json entries will have to be added by the user. Examples can be found in Non-Catalog Examples.

Open Source Python Tools#

xarray: Package to work with labelled multi-dimensional arrays.
geopandas: Package to work with geospatial data.

Data#

gridMET: A dataset of daily high-spatial resolution (~4-km, 1/24th degree) surface meteorological data covering the contiguous US from 1979-yesterday.
NHDPlus_V2: National Hydrography Dataset Plus Version 2, a selection of flowline geometries from HUC 0708

Maintenance Schedule#

The intent of this notebook is simply a demonstration of an application or workflow. As we progress in the build-out of NGHF project tools, this notebook may be updated or replaced. There is no other planned maintenance at this time.

Description#

This workflow demonstrates the grid to line interpoltion functionality from the gdptools package. The purpose of this tool is to attribute gridded data to line geometries. This is done by interpolating points along each line geometry and extracting the underlaying gridded data values at those points. From this point values, statistics are calculated and attributed to the original input lines.

The inputs to the line interpolation tool are a Geopandas geodataframe containing the line geometries, and a Pandas dataframe containing information about the gridMET gridded data. The line interpolation tool can process multi-dimensional xarray data, outputing the statistics organized by each time segment. Also, multiple gridded datasets can be processed, so for the gridMET example, three different climate variables will be accessed and processed.

The outputs are a dataframe containing the statistics for each line geometries calculated for each gridded dataset at each time segment, as well as a geodataframe containing the interpolated point geometries which have the linearly interpolated values from the gridded dataset. Depending on the user’s inputs, the sample points can be interpolated from the line geometry at regular intervals, or the line vertices can be used instead.

Getting Started#

Directions for creating an environment to run this notebook.

Requirements#

Linux OS or Windows OS
Conda-based package manager such as miniconda or mamba
A copy of the NHDFlowline.shp file which can be retrieved from the sciencebase link above under Data

Create conda environment#

gdptools-examples can be downloaded from https://code.usgs.gov/wma/nhgf/toolsteam/gdptools/-/raw/develop/environment-examples.yml?inline=false

import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
import pandas as pd
import hvplot.pandas
import hvplot.xarray
import warnings
import xarray as xr

from gdptools import ClimRCatData
from gdptools import InterpGen

warnings.filterwarnings("ignore")

Get lines to query data#

Open a lines shapefile as a geodataframe. Use these vector geometries to access underlaying gridded data values and calculate statistics for.

# Open line shapefile
gdf = gpd.read_file('./test_lines/test_lines.shp')

# Select just a few flowlines
gdf = gdf[(gdf.permanent_ == '155843758') | (gdf.permanent_ == '155843335') | (gdf.permanent_ == '155842105')]

Query the climateR-catalog#

The variables aet, pet, PDSI are available in the TerraClimate dataset. Pandas is used to query the dataset.

In the following steps the json catalog files are imported into pandas.
Using the pandas query function to search for the TerraClimate dataset for each variable of interest and return the json metadata for the parameter and grid catalog entries.

climater_cat = "https://mikejohnson51.github.io/climateR-catalogs/catalog.json"
cat = pd.read_json(climater_cat)
cat.head()

	id	asset	URL	type	varname	variable	description	units	model	ensemble	...	Xn	Y1	Yn	resX	resY	ncols	nrows	crs	toptobottom	tiled
0	maca_day	agg_macav2metdata_huss_BNU-ESM_r1i1p1_historic...	http://thredds.northwestknowledge.net:8080/thr...	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	BNU-ESM	r1i1p1	...	-67.0648	25.0631	49.396	0.0417	0.0417	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474...	1.0	T
1	maca_day	agg_macav2metdata_huss_CNRM-CM5_r1i1p1_histori...	http://thredds.northwestknowledge.net:8080/thr...	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	CNRM-CM5	r1i1p1	...	-67.0648	25.0631	49.396	0.0417	0.0417	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474...	1.0	T
2	maca_day	agg_macav2metdata_huss_CSIRO-Mk3-6-0_r1i1p1_hi...	http://thredds.northwestknowledge.net:8080/thr...	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	CSIRO-Mk3-6-0	r1i1p1	...	-67.0648	25.0631	49.396	0.0417	0.0417	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474...	1.0	T
3	maca_day	agg_macav2metdata_huss_bcc-csm1-1_r1i1p1_histo...	http://thredds.northwestknowledge.net:8080/thr...	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	bcc-csm1-1	r1i1p1	...	-67.0648	25.0631	49.396	0.0417	0.0417	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474...	1.0	T
4	maca_day	agg_macav2metdata_huss_CanESM2_r1i1p1_historic...	http://thredds.northwestknowledge.net:8080/thr...	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	CanESM2	r1i1p1	...	-67.0648	25.0631	49.396	0.0417	0.0417	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474...	1.0	T

5 rows × 28 columns

Pandas query function example#

_id = "terraclim"
_varname = "aet"

# an example query returns a pandas dataframe.
tc = cat.query(
    "id == @_id & variable == @_varname"
)
tc

	id	asset	URL	type	varname	variable	description	units	model	ensemble	...	Xn	Y1	Yn	resX	resY	ncols	nrows	crs	toptobottom	tiled
1374	terraclim	agg_terraclimate_aet_1958_CurrentYear_GLOBE	http://thredds.northwestknowledge.net:8080/thr...	opendap	aet	aet	water_evaporation_amount	mm	NaN	NaN	...	179.9792	89.9792	-89.9792	0.0417	0.0417	8640.0	4320.0	+proj=longlat +a=6378137 +f=0.0033528106647474...	0.0

1 rows × 28 columns

Query the climateR-catalog for variables aet, pet, PDSI#

Query the catalog for each variable of interest. The pandas query() function returns a pandas DataFrame but we want just the dictionary representation of the json catalog entry so it’s converted to a dictionary with format dict[str, dict[str, Any]], where the first key is the variable of interest and the value is a dictionary of the resulting climateR-catalog data.

# Create a dictionary of parameter dataframes for each variable
tvars = ["aet", "pet", "PDSI"]
cat_params = [cat.query("id == @_id & variable == @_var").to_dict(orient="records")[0] for _var in tvars]

cat_dict = dict(zip(tvars, cat_params))

# Output an example of the cat_param.json entry for "aet".
cat_dict.get("aet")

{'id': 'terraclim',
 'asset': 'agg_terraclimate_aet_1958_CurrentYear_GLOBE',
 'URL': 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_terraclimate_aet_1958_CurrentYear_GLOBE.nc',
 'type': 'opendap',
 'varname': 'aet',
 'variable': 'aet',
 'description': 'water_evaporation_amount',
 'units': 'mm',
 'model': nan,
 'ensemble': nan,
 'scenario': 'total',
 'T_name': 'time',
 'duration': '1958-01-01/2021-12-01',
 'interval': '1 months',
 'nT': 768.0,
 'X_name': 'lon',
 'Y_name': 'lat',
 'X1': -179.9792,
 'Xn': 179.9792,
 'Y1': 89.9792,
 'Yn': -89.9792,
 'resX': 0.0417,
 'resY': 0.0417,
 'ncols': 8640.0,
 'nrows': 4320.0,
 'crs': '+proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs',
 'toptobottom': 0.0,
 'tiled': ''}

Initiate a ClimateR Catalog Data object#

The ClimRCatData class takes the geodataframe and dictionary created above, which contain all the necessary information for accessing and processing the gridded datasets.

# ClimateR Data object, same data class used with polygon aggregation
user_data = ClimRCatData(
    cat_dict=cat_dict,
    f_feature=gdf,
    id_feature='permanent_',
    period=["1980-01-01", "1980-12-31"]
)

user_data.cat_dict['aet']

{'id': 'terraclim',
 'asset': 'agg_terraclimate_aet_1958_CurrentYear_GLOBE',
 'URL': 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_terraclimate_aet_1958_CurrentYear_GLOBE.nc',
 'type': 'opendap',
 'varname': 'aet',
 'variable': 'aet',
 'description': 'water_evaporation_amount',
 'units': 'mm',
 'model': nan,
 'ensemble': nan,
 'scenario': 'total',
 'T_name': 'time',
 'duration': '1958-01-01/2021-12-01',
 'interval': '1 months',
 'nT': 768.0,
 'X_name': 'lon',
 'Y_name': 'lat',
 'X1': -179.9792,
 'Xn': 179.9792,
 'Y1': 89.9792,
 'Yn': -89.9792,
 'resX': 0.0417,
 'resY': 0.0417,
 'ncols': 8640.0,
 'nrows': 4320.0,
 'crs': '+proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs',
 'toptobottom': 0.0,
 'tiled': ''}

Initialize an InterpGen object#

This kicks off the line interpolation alogrithm.

# InterpGen object
interpObject1 = InterpGen(user_data, pt_spacing=100, stat='all')

Lastly, the calculations actually occur when the calc_interp function is run on the InterGen object. The outputs are two dataframes, one containing the stats for the lines and the other containing the points used to sample the gridded data.

# calculate the stats and interpolated points
stats1, pts1 = interpObject1.calc_interp()

Filter output data#

Since we processed three different gridded datasets, each of which contained data for seven days, and three line geometries, we need to filter the data down to one of each before we display it.

# Select subset of data to display
time = '1980-10-01'
ID = '155842105'
varname = 'aet'
varname2 = 'aet'

# Define column names
ID_column = 'permanent_'
stat_time_column = 'date'
stat_varname_column = 'varname'

# Narrow down pts to display
pts = pts1.loc[time][pts1.loc[time][ID_column]==ID]
pts = pts[pts['varname']==varname]

# Select gridded data by variable name
da = interpObject1._interp_data[varname2][ID].da

With in the InterpGen object that was create, there is an attribute called interp_data. Within this attribute is stored the clipped gridded data (da) from which the values were extracted at the interpolated sample points.

# 'Hidden' gridded data which values are extracted from
da

<xarray.DataArray 'aet' (time: 12, lat: 6, lon: 6)>
dask.array<getitem, shape=(12, 6, 6), dtype=float32, chunksize=(12, 6, 6), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float64 42.73 42.69 42.65 42.6 42.56 42.52
  * lon      (lon) float64 -93.73 -93.69 -93.65 -93.6 -93.56 -93.52
  * time     (time) datetime64[ns] 1980-01-01 1980-02-01 ... 1980-12-01
Attributes:
    units:              mm
    description:        Actual Evapotranspiration
    long_name:          water_evaporation_amount
    standard_name:      water_evaporation_amount
    dimensions:         lon lat time
    grid_mapping:       crs
    coordinate_system:  WGS84,EPSG:4326
    _ChunkSizes:        [   1  720 1440]

Display Data#

Below is a simple plot showing the values of the inerpolated points plotted against the distance of the points.

# Plot values of the interpolated points
pts.plot(kind='scatter', x = 'dist', y='values')

<Axes: xlabel='dist', ylabel='values'>

../../_images/ee831a149f97f990f3fa3648189bafe844a86f8d8dfe60475fe0af317eabdf9e.png

Here the stats dataframe is filtered to show the line statistics for our selected time, line and grid.

# Get stats for line by select for time, geometry, and grid variable
stats1[(stats1[stat_time_column]==time) & (stats1[ID_column]==ID) & (stats1[stat_varname_column]==varname)]

	date	permanent_	varname	mean	median	std	min	max
9	1980-10-01	155842105	aet	60.353931	60.404891	0.899203	58.662914	61.844751

And here is an interactivate map displaying the gridded data, interpolated points and line geometry.

# Display data
from holoviews.element.tiles import EsriImagery

line = gdf[gdf[ID_column]==ID].to_crs(4326).hvplot(
    geo=True, frame_width=600, frame_height=500, alpha=1, #cmap='viridis', clim=(60, 66),
    xlabel="longitude", ylabel="latitude", xlim=(-94, -93),
)

sample_points = pts.to_crs(4326).hvplot(
    c='values', geo=True, frame_width=600, frame_height=500, alpha=1, # cmap='viridis', clim=(60, 66),
    xlabel="longitude", ylabel="latitude", clabel=varname, xlim=(-94, -93),
    title="Interpolated Points", aspect='equal'
)
gridded_data = da.isel(time=9).hvplot.quadmesh(
    'lon', 'lat', varname,  alpha=1, grid=True, geo=True, frame_width=500, cmap='viridis', clim=(55, 65),
     xlabel="longitude", ylabel="latitude", clabel=varname, xlim=(-94, -93),
    title=varname, aspect='equal'
)

(EsriImagery() * gridded_data * line * sample_points)