ClimateR-Catalog - Terraclime data

ClimateR-Catalog - Terraclime data#

Introduction#

In this example, the ClimateR-catalog (CRC), developed by Mike Johnson [Johnson, 2023] is used to discover a gridded dataset of interest. The CRC encompasses a wealth of data sources and the metadata required to simplify the process of grid-to-polygon area weighted intersections. The CRC here uses a json catalog uses a schema with common metadata for geospatial analysis. In the example presented here we use monthly water evaporation amount from the dataset TerraClimate [Abatzoglou et al., 2018].

Geo Data Processing Tools (gdptools)#

gdptools is a python package for calculating area-weighted statistics of grid-to-polygon. grid-to-line, and polygon-to-polygon interpolations. gdptools utilizes metadata from the CRC to aid generalizing gdptools processing capabilities. There are also methods for calculating area-weighted statistics for any gridded dataset that can be opened by xarray, however some of the metadata encapsulated in the json entries will have to be added by the user. Examples can be found in Non-Catalog Examples.

NLDI#

We use a client to the U.S. Geological Survey’s Hydro Network-Linked Data Index [Blodgett, 2022] and a python client (pynhd) available through the HyRiver suite of packages [Chegini et al., 2021].

../../_images/TerraClimate_aet_catalog_example.png — Fig. 1 Example grid-to-polygon interpolation of TerraClimate data. A) Huc12 basins for Delaware River Watershed. B) Gridded monthly water evaporation amount (mm) from TerraClimate dataset. C) Area-weighted-average interpolation of gridded TerraClimate data to Huc12 polygons.#

References#

ADPH18: John T. Abatzoglou, Solomon Z. Dobrowski, Sean A. Parks, and Katherine C. Hegewisch. TerraClimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958–2015. Scientific Data, January 2018. URL: https://doi.org/10.1038/sdata.2017.191, doi:10.1038/sdata.2017.191.
Blo22: David Blodgett. Hydro-Network Linked Data Index. 2022. URL: https://labs.waterdata.usgs.gov/about-nldi/index.html.
CLL21: Taher Chegini, Hong-Yi Li, and L. Ruby Leung. Hyriver: hydroclimate data retriever. Journal of Open Source Software, 6(66):3175, 2021. URL: https://doi.org/10.21105/joss.03175, doi:10.21105/joss.03175.
Joh23: Mike Johnson. climateR-catalog. 2023. URL: mikejohnson51/climateR-catalogs.

import warnings
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
import pandas as pd
import numpy as np
import xarray as xr
import pyproj
from pynhd import NLDI
from pynhd import WaterData

import hvplot.xarray
import hvplot.pandas

from gdptools import WeightGen
from gdptools import AggGen
from gdptools import ClimRCatData

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.colheader_justify', 'center')
pd.set_option('display.precision', 3)

warnings.filterwarnings("ignore")

Create GeoDataFrame of Delaware River Basin HUC12s#

In this cell we use clients to several https://labs.waterdata.ugsgs.gov tools:

Tool: USGS NLDI Client: pynhd.NLDI
Tool: USGS GeoServer Client: pynhd.WaterData

More examples of using the HyRiver pynhd package can be found here

# USGS gage 01482100 Delaware River at Del Mem Bridge at Wilmington De
gage_id = "01482100"
nldi = NLDI()
del_basins = nldi.get_basins(gage_id)
huc12_basins = WaterData(layer="huc12").bygeom(del_basins.geometry[0])

---------------------------------------------------------------------------
MissingInputError                         Traceback (most recent call last)
Cell In[2], line 5
nldi = NLDI()
del_basins = nldi.get_basins(gage_id)
----> 5 huc12_basins = WaterData(layer="huc12").bygeom(del_basins.geometry[0])

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/pynhd/pynhd.py:676, in WaterData.bygeom(self, geometry, geo_crs, xy, predicate, sort_attr)
def bygeom(
   self,
   geometry: Polygon | MultiPolygon,
   (...)
   sort_attr: str | None = None,
) -> gpd.GeoDataFrame:
   """Get features within a geometry.

   Parameters
   (...)
       The requested features in the given geometry.
   """
--> 676     resp = self.wfs.getfeature_bygeom(
       geometry, geo_crs, always_xy=not xy, predicate=predicate.upper(), sort_attr=sort_attr
   )
   resp = cast("list[dict[str, Any]]", resp)
   return self._to_geodf(resp)

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/pygeoogc/pygeoogc.py:689, in WFS.getfeature_bygeom(self, geometry, geo_crs, always_xy, predicate, sort_attr)
if predicate.upper() not in valid_predicates:
   raise InputValueError("predicate", valid_predicates)
--> 689 return self.getfeature_byfilter(
   f"{predicate.upper()}({geom_name}, {g_wkt})", method="POST", sort_attr=sort_attr
)

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/pygeoogc/pygeoogc.py:790, in WFS.getfeature_byfilter(self, cql_filter, method, sort_attr)
except (IndexError, ValueError) as ex:
   raise ServiceError(resp[0], self.url) from ex
--> 790 payloads = [
   {
       "service": "wfs",
       "version": self.version,
       "outputFormat": self.outformat,
       "request": "GetFeature",
       "typeName": self.layer,
       "srsName": self.crs_str,
       "cql_filter": cql_filter,
       **self.sort_params(sort_attr, nfeatures, i),
   }
   for i in range(0, nfeatures, self.max_nrecords)
]
if method == "GET":
   return self.retrieve([self.url] * len(payloads), [{"params": p} for p in payloads])

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/pygeoogc/pygeoogc.py:799, in <listcomp>(.0)
except (IndexError, ValueError) as ex:
   raise ServiceError(resp[0], self.url) from ex
payloads = [
   {
       "service": "wfs",
       "version": self.version,
       "outputFormat": self.outformat,
       "request": "GetFeature",
       "typeName": self.layer,
       "srsName": self.crs_str,
       "cql_filter": cql_filter,
--> 799         **self.sort_params(sort_attr, nfeatures, i),
   }
   for i in range(0, nfeatures, self.max_nrecords)
]
if method == "GET":
   return self.retrieve([self.url] * len(payloads), [{"params": p} for p in payloads])

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/pygeoogc/core.py:626, in WFSBase.sort_params(self, sort_attr, nfeatures, start_index)
       msg += " Please set the sort_attr manually. Available columns are:\n"
       msg += ", ".join(list(valid_attrs))
--> 626         raise MissingInputError(msg)
if valid_attrs and sort_attr not in valid_attrs:
   raise InputValueError("sort_attr", list(valid_attrs))

MissingInputError: sort_attr is None and no id column found in the schema. Please set the sort_attr manually. Available columns are:
huc12, tohuc, areaacres, areasqkm, hutype, humod, states, noncontrib

huc12_basins.head()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 huc12_basins.head()

NameError: name 'huc12_basins' is not defined

Create a plot of the HUC12 Basins.

from holoviews.element.tiles import EsriTerrain
drb = huc12_basins.hvplot(
    geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
    xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)",
    title="Delaware River HUC12 basins", xlim=(-76.8, -73.8), aspect='equal'
)
EsriTerrain() * drb

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 2
      1 from holoviews.element.tiles import EsriTerrain
----> 2 drb = huc12_basins.hvplot(
      3     geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
      4     xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)",
      5     title="Delaware River HUC12 basins", xlim=(-76.8, -73.8), aspect='equal'
      6 )
      7 EsriTerrain() * drb

NameError: name 'huc12_basins' is not defined

Query the climateR-catalog#

The variables aet, pet, PDSI are available in the TerraClimate dataset. Pandas is used to query the dataset.

In the following steps the json catalog files are imported into pandas.
Using the pandas query function to search for the TerraClimate dataset for each variable of interest and return the json metadata for the parameter and grid catalog entries.

climater_cat = "https://mikejohnson51.github.io/climateR-catalogs/catalog.parquet"
cat = pd.read_parquet(climater_cat)
cat.head()

	id	asset	URL	type	varname	variable	description	units	model	ensemble	scenario	T_name	duration	interval	nT	X_name	Y_name	X1	Xn	Y1	Yn	resX	resY	ncols	nrows	crs	toptobottom	tiled
0	maca_day	agg_macav2metdata_huss_BNU-ESM_r1i1p1_historic...	http://thredds.northwestknowledge.net:8080/thr...	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	BNU-ESM	r1i1p1	historical	time	1950-01-01/2005-12-31	1 days	20454.0	lon	lat	-124.772	-67.065	25.063	49.396	0.042	0.042	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474...	True	T
1	maca_day	agg_macav2metdata_huss_CNRM-CM5_r1i1p1_histori...	http://thredds.northwestknowledge.net:8080/thr...	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	CNRM-CM5	r1i1p1	historical	time	1950-01-01/2005-12-31	1 days	20454.0	lon	lat	-124.772	-67.065	25.063	49.396	0.042	0.042	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474...	True	T
2	maca_day	agg_macav2metdata_huss_CSIRO-Mk3-6-0_r1i1p1_hi...	http://thredds.northwestknowledge.net:8080/thr...	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	CSIRO-Mk3-6-0	r1i1p1	historical	time	1950-01-01/2005-12-31	1 days	20454.0	lon	lat	-124.772	-67.065	25.063	49.396	0.042	0.042	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474...	True	T
3	maca_day	agg_macav2metdata_huss_bcc-csm1-1_r1i1p1_histo...	http://thredds.northwestknowledge.net:8080/thr...	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	bcc-csm1-1	r1i1p1	historical	time	1950-01-01/2005-12-31	1 days	20454.0	lon	lat	-124.772	-67.065	25.063	49.396	0.042	0.042	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474...	True	T
4	maca_day	agg_macav2metdata_huss_CanESM2_r1i1p1_historic...	http://thredds.northwestknowledge.net:8080/thr...	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	CanESM2	r1i1p1	historical	time	1950-01-01/2005-12-31	1 days	20454.0	lon	lat	-124.772	-67.065	25.063	49.396	0.042	0.042	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474...	True	T

Pandas query function example#

_id = "terraclim"
_varname = "aet"

# an example query returns a pandas dataframe.
tc = cat.query(
    "id == @_id & variable == @_varname"
)
tc

	id	asset	URL	type	varname	variable	description	units	model	ensemble	scenario	T_name	duration	interval	nT	X_name	Y_name	X1	Xn	Y1	Yn	resX	resY	ncols	nrows	crs	toptobottom	tiled
1374	terraclim	agg_terraclimate_aet_1958_CurrentYear_GLOBE	http://thredds.northwestknowledge.net:8080/thr...	opendap	aet	aet	water_evaporation_amount	mm	None	None	total	time	1958-01-01/2021-12-01	1 months	768.0	lon	lat	-179.979	179.979	89.979	-89.979	0.042	0.042	8640.0	4320.0	+proj=longlat +a=6378137 +f=0.0033528106647474...	False

Query the climateR-catalog for variables aet, pet, PDSI#

Query the catalog for each variable of interest. The pandas query() function returns a pandas DataFrame but we want just the dictionary representation of the json catalog entry so it’s converted to a dictionary with format dict[str, dict[str, Any]], where the first key is the variable of interest and the value is a dictionary of the resulting climateR-catalog data.

# Create a dictionary of parameter dataframes for each variable
tvars = ["aet", "pet", "PDSI"]
cat_params = [cat.query("id == @_id & variable == @_var").to_dict(orient="records")[0] for _var in tvars]

cat_dict = dict(zip(tvars, cat_params))

# Output an example of the cat_param.json entry for "aet".
cat_dict.get("aet")

{'id': 'terraclim',
 'asset': 'agg_terraclimate_aet_1958_CurrentYear_GLOBE',
 'URL': 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_terraclimate_aet_1958_CurrentYear_GLOBE.nc',
 'type': 'opendap',
 'varname': 'aet',
 'variable': 'aet',
 'description': 'water_evaporation_amount',
 'units': 'mm',
 'model': None,
 'ensemble': None,
 'scenario': 'total',
 'T_name': 'time',
 'duration': '1958-01-01/2021-12-01',
 'interval': '1 months',
 'nT': 768.0,
 'X_name': 'lon',
 'Y_name': 'lat',
 'X1': -179.97916666666666,
 'Xn': 179.97916666666666,
 'Y1': 89.97916666666667,
 'Yn': -89.97916666666664,
 'resX': 0.041666666666666664,
 'resY': 0.041666666666666664,
 'ncols': 8640.0,
 'nrows': 4320.0,
 'crs': '+proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs',
 'toptobottom': False,
 'tiled': ''}

Calculate the weights that will be used in the area-weighted statistics#

A note on the methods applied here:

There are currently three data classes, ClimRCatData, for use with input data derived from the climateR-catalog, ODAPCatData, for use with input data derived from the OPeNDAP catalog, and UserCatData for user-defined input data. The non-catalog examples demonstrate the use of UserCatData, and here ClimRCatData is demonstrated.

WeightGen() is the tool for generating weights. There are several methods available including: “serial”, “parallel”, and “dask”. If generating weights over a scale similar to the DRB huc12 basins here, serial using the serial method is appropriate. If for example weights are generated for huc12 basins over all of CONUS then “parallel” or “dask” (if you have a dask cluster available) will provide a significant speed-up.

The calculate_weights() method has the option to set interesection=True, which will save the intersected polygon geometries to wght_gen and the can be retrieved with the wght_gen.intersections property. We set it to True here so we could use it to evaluate the calculated weights and resulting aggregated values later in this notebook. Normally, one could just calculate weights() and the default value for intersections is False.

user_data = ClimRCatData(
    cat_dict=cat_dict,
    f_feature=huc12_basins,
    id_feature='huc12',
    period=["1980-01-01", "1980-12-31"]
)

wght_gen = WeightGen(
    user_data=user_data,
    method="serial",
    output_file="terraclime_daily_wghts.csv",
    weight_gen_crs=6931
)

wghts = wght_gen.calculate_weights(intersections=True)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 3
user_data = ClimRCatData(
   cat_dict=cat_dict,
----> 3     f_feature=huc12_basins,
   id_feature='huc12',
   period=["1980-01-01", "1980-12-31"]
)
wght_gen = WeightGen(
   user_data=user_data,
   method="serial",
   output_file="terraclime_daily_wghts.csv",
   weight_gen_crs=6931
)
wghts = wght_gen.calculate_weights(intersections=True)

NameError: name 'huc12_basins' is not defined

Returned weights dataframe#

wghts.head()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 wghts.head()

NameError: name 'wghts' is not defined

Calculate the area-weighted averages#

For each variable in the TerraClimate dataset and for each HUC12 in the Delaware River basin
Output the result as a netcdf file.
The data can also be output as a .csv file.

Returned values#

ngdf: The original huc12_basins geodataframe is sorted by user_data.id_feature, and dissolved by user_data.id_feature. So the returned geodataframe is a modified version of the original huc12_basins, and is returned for easy evaluation and for plotting.
ds_out: Xarray dataset representing the aggregated values and dimensioned by (time, user_data.id_feature). Again return for evaluation and plotting purposes.

properties#

agg_data. The user_data prepared for aggregating. Can be used to get the gridded dataset subsetted by time and feature bounding box.

agg_gen = AggGen(
    user_data=user_data,
    stat_method="masked_mean",
    agg_engine="serial",
    agg_writer="netcdf",
    weights='terraclime_daily_wghts.csv',
    out_path='.',
    file_prefix="testing_tc_drb_huc12"
)
ngdf, ds_out = agg_gen.calculate_agg()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 2
      1 agg_gen = AggGen(
----> 2     user_data=user_data,
      3     stat_method="masked_mean",
      4     agg_engine="serial",
      5     agg_writer="netcdf",
      6     weights='terraclime_daily_wghts.csv',
      7     out_path='.',
      8     file_prefix="testing_tc_drb_huc12"
      9 )
     10 ngdf, ds_out = agg_gen.calculate_agg()

NameError: name 'user_data' is not defined

Return the subset of TerraClimate data#

agg_gen prepares the user_data by subsetting the gridded data by the bounding box of the user.feature data and the user_data.period. This prepared data is available via the agg_gen.agg_data property. The return value is a dict[var, AggData]. To retrieve the prepared data, use the get() method on the dict by variable. The da property is the sub-setted xarray DataArray of the variable “aet”.

processed_data = agg_gen.agg_data
da = processed_data.get("aet").da
da

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 processed_data = agg_gen.agg_data
      2 da = processed_data.get("aet").da
      3 da

NameError: name 'agg_gen' is not defined

Plot the resulting interpolation#

Calulate min/max of value and time=7 to adjust plotting paramters below

var = 'aet'
time_step = "1980-08-01"

vmin = np.nanmin(da.sel(time=time_step).values)
vmax = np.nanmax(da.sel(time=time_step).values)
print(vmin, vmax)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 1
----> 1 vmin = np.nanmin(da.sel(time=time_step).values)
      2 vmax = np.nanmax(da.sel(time=time_step).values)
      3 print(vmin, vmax)

NameError: name 'da' is not defined

ngdf['aet'] = ds_out[var].sel(time=time_step)
terraclim_agg = ngdf.to_crs(4326).hvplot(
    c='aet', geo=True, coastline='50m', frame_width=300, alpha=1, cmap='viridis', clim=(vmin, vmax),
    xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)", xlim=(-76.8, -73.8),
    title="DRB HUC12 area-weighted average", aspect='equal'
)
terraclim_raw = da.sel(time=time_step).hvplot.quadmesh(
    'lon', 'lat', 'aet', cmap='viridis', alpha=1, grid=True, geo=True, coastline='50m', frame_width=300, clim=(vmin,vmax),
    xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)", xlim=(-76.8, -73.8),
    title="TerraClimate Monthly AET", aspect='equal'
)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 ngdf['aet'] = ds_out[var].sel(time=time_step)
      2 terraclim_agg = ngdf.to_crs(4326).hvplot(
      3     c='aet', geo=True, coastline='50m', frame_width=300, alpha=1, cmap='viridis', clim=(vmin, vmax),
      4     xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)", xlim=(-76.8, -73.8),
      5     title="DRB HUC12 area-weighted average", aspect='equal'
      6 )
      7 terraclim_raw = da.sel(time=time_step).hvplot.quadmesh(
      8     'lon', 'lat', 'aet', cmap='viridis', alpha=1, grid=True, geo=True, coastline='50m', frame_width=300, clim=(vmin,vmax),
      9     xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)", xlim=(-76.8, -73.8),
     10     title="TerraClimate Monthly AET", aspect='equal'
     11 )

NameError: name 'ds_out' is not defined

(EsriTerrain() * terraclim_raw) + (EsriTerrain() * terraclim_agg) # + (EsriTerrain() * drb)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 1
----> 1 (EsriTerrain() * terraclim_raw) + (EsriTerrain() * terraclim_agg) # + (EsriTerrain() * drb)

NameError: name 'terraclim_raw' is not defined

(EsriTerrain() * terraclim_raw  * terraclim_agg)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 (EsriTerrain() * terraclim_raw  * terraclim_agg)

NameError: name 'terraclim_raw' is not defined

Inspect structure of saved netcdf file#

ncf = xr.open_dataset('testing_tc_drb_huc12.nc')
ncf

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/file_manager.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    210 try:
--> 211     file = self._cache[self._key]
    212 except KeyError:

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.__getitem__(self, key)
     55 with self._lock:
---> 56     value = self._cache[key]
     57     self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/home/docs/checkouts/readthedocs.org/user_builds/gdptools/checkouts/latest/docs/Examples/ClimateR-Catalog/testing_tc_drb_huc12.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), 'df6084c7-66e0-4709-868f-868168576347']

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
Cell In[17], line 1
----> 1 ncf = xr.open_dataset('testing_tc_drb_huc12.nc')
      2 ncf

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/api.py:572, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    560 decoders = _resolve_decoders_kwargs(
    561     decode_cf,
    562     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    568     decode_coords=decode_coords,
    569 )
    571 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 572 backend_ds = backend.open_dataset(
    573     filename_or_obj,
    574     drop_variables=drop_variables,
    575     **decoders,
    576     **kwargs,
    577 )
    578 ds = _dataset_from_backend_dataset(
    579     backend_ds,
    580     filename_or_obj,
   (...)
    590     **kwargs,
    591 )
    592 return ds

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:644, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose)
    623 def open_dataset(  # type: ignore[override]  # allow LSP violation, not supporting **kwargs
    624     self,
    625     filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
   (...)
    641     autoclose=False,
    642 ) -> Dataset:
    643     filename_or_obj = _normalize_path(filename_or_obj)
--> 644     store = NetCDF4DataStore.open(
    645         filename_or_obj,
    646         mode=mode,
    647         format=format,
    648         group=group,
    649         clobber=clobber,
    650         diskless=diskless,
    651         persist=persist,
    652         lock=lock,
    653         autoclose=autoclose,
    654     )
    656     store_entrypoint = StoreBackendEntrypoint()
    657     with close_on_error(store):

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:407, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
    401 kwargs = dict(
    402     clobber=clobber, diskless=diskless, persist=persist, format=format
    403 )
    404 manager = CachingFileManager(
    405     netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
    406 )
--> 407 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:354, in NetCDF4DataStore.__init__(self, manager, group, mode, lock, autoclose)
    352 self._group = group
    353 self._mode = mode
--> 354 self.format = self.ds.data_model
    355 self._filename = self.ds.filepath()
    356 self.is_remote = is_remote_uri(self._filename)

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:416, in NetCDF4DataStore.ds(self)
    414 @property
    415 def ds(self):
--> 416     return self._acquire()

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:410, in NetCDF4DataStore._acquire(self, needs_lock)
    409 def _acquire(self, needs_lock=True):
--> 410     with self._manager.acquire_context(needs_lock) as root:
    411         ds = _nc4_require_group(root, self._group, self._mode)
    412     return ds

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/contextlib.py:119, in _GeneratorContextManager.__enter__(self)
    117 del self.args, self.kwds, self.func
    118 try:
--> 119     return next(self.gen)
    120 except StopIteration:
    121     raise RuntimeError("generator didn't yield") from None

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/file_manager.py:199, in CachingFileManager.acquire_context(self, needs_lock)
    196 @contextlib.contextmanager
    197 def acquire_context(self, needs_lock=True):
    198     """Context manager for acquiring a file."""
--> 199     file, cached = self._acquire_with_cache_info(needs_lock)
    200     try:
    201         yield file

File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/file_manager.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    215     kwargs = kwargs.copy()
    216     kwargs["mode"] = self._mode
--> 217 file = self._opener(*self._args, **kwargs)
    218 if self._mode == "w":
    219     # ensure file doesn't get overridden when opened again
    220     self._mode = "a"

File src/netCDF4/_netCDF4.pyx:2353, in netCDF4._netCDF4.Dataset.__init__()

File src/netCDF4/_netCDF4.pyx:1963, in netCDF4._netCDF4._ensure_nc_success()

FileNotFoundError: [Errno 2] No such file or directory: b'/home/docs/checkouts/readthedocs.org/user_builds/gdptools/checkouts/latest/docs/Examples/ClimateR-Catalog/testing_tc_drb_huc12.nc'

Validation of area-weighted mean#

In the following sections we use some helper functions, which return a information on the intersection calculation to create an illustration of the resulting intersection and calculated value at a polygon within the huc12_basins geodataframe.

The anlysis is performed on one of the huc12_basins polygons using the huc12 id - “020200050301”. It extracts the grid_cells intersected by the huc12 polygon, the resulting intersections of the grid_cells with the huc12 polygon, and the aggregated value of the huc12 polygon for one of the days in the resulting aggregation.

grid_cells (GeoDataFrame): The polygons representing each cell of the Terraclim grid.
intersections (GeoDataFrame): The grid_cells interesections with the huc12_basins polygons.
ngdf_poly (GeoDataFrame): The resorted and dissolved GeoDataFrame from agg_gen.
weights (xarray.Dataset): The generated weights read into xarray Dataset.

time = 7
huc12_id = "020200050301"
grid_cells = wght_gen.grid_cells

weights = pd.read_csv("terraclime_daily_wghts.csv").groupby("huc12").get_group(int(huc12_id))
i_ind = weights.i.values
j_ind = weights.j.values
wght = weights.wght.values

i_max = max(i_ind)
i_min = min(i_ind)
j_max = max(j_ind)
j_min = min(j_ind)

tmp = [grid_cells.query("i_index == @i and j_index == @j") for i,j in zip(i_ind,j_ind)]
grid_cell_sub = pd.concat(tmp)
grid_cell_sub

aet_vals = da.values[:, i_ind, j_ind]
grid_cell_sub["aet"] = aet_vals[time]
min_val = min(grid_cell_sub["aet"].values)
max_val = max(grid_cell_sub["aet"].values)
grid_cell_sub

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 3
      1 time = 7
      2 huc12_id = "020200050301"
----> 3 grid_cells = wght_gen.grid_cells
      5 weights = pd.read_csv("terraclime_daily_wghts.csv").groupby("huc12").get_group(int(huc12_id))
      6 i_ind = weights.i.values

NameError: name 'wght_gen' is not defined

inter_sect_sub = wght_gen.intersections.groupby("huc12").get_group(huc12_id)
inter_sect_sub.to_crs(4326, inplace=True)
inter_sect_sub.reset_index(drop=True, inplace=True)
inter_sect_sub["aet"] = aet_vals[time]
inter_sect_sub["wght"] = wght

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 1
----> 1 inter_sect_sub = wght_gen.intersections.groupby("huc12").get_group(huc12_id)
      2 inter_sect_sub.to_crs(4326, inplace=True)
      3 inter_sect_sub.reset_index(drop=True, inplace=True)

NameError: name 'wght_gen' is not defined

ngdf_poly = ngdf.groupby("huc12").get_group("020200050301")


aggvals = xr.open_dataset("testing_tc_drb_huc12.nc")
huc12_index = np.where(aggvals.huc12.values == huc12_id)[0][0]
ngdf_poly["aet"] = aggvals.aet.values[time, huc12_index]

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 ngdf_poly = ngdf.groupby("huc12").get_group("020200050301")
      4 aggvals = xr.open_dataset("testing_tc_drb_huc12.nc")
      5 huc12_index = np.where(aggvals.huc12.values == huc12_id)[0][0]

NameError: name 'ngdf' is not defined

grid_cell_sub["aet"] = da.values[time, i_ind, j_ind]
inter_sect_sub["aet"] = da.values[time, i_ind, j_ind]
ngdf_poly["aet"] = aggvals.aet.values[time,huc12_index]
min_val = min(grid_cell_sub["aet"].values)
max_val = max(grid_cell_sub["aet"].values)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[21], line 1
----> 1 grid_cell_sub["aet"] = da.values[time, i_ind, j_ind]
      2 inter_sect_sub["aet"] = da.values[time, i_ind, j_ind]
      3 ngdf_poly["aet"] = aggvals.aet.values[time,huc12_index]

NameError: name 'da' is not defined

print(
    grid_cell_sub.crs,
    ngdf_poly.crs,
    inter_sect_sub.crs
)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[22], line 2
      1 print(
----> 2     grid_cell_sub.crs,
      3     ngdf_poly.crs,
      4     inter_sect_sub.crs
      5 )

NameError: name 'grid_cell_sub' is not defined

grid_cell_sub.to_crs(4326, inplace=True)
ngdf_poly.to_crs(4326, inplace=True)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 grid_cell_sub.to_crs(4326, inplace=True)
      2 ngdf_poly.to_crs(4326, inplace=True)

NameError: name 'grid_cell_sub' is not defined

Plot the results#

Create labels:
- inter_sect_sub: for each intersection, determine the centroid of the cell to position the label. The text of the label will be the weight corresponding to the intersected area.
- gric_cell_sub: For each grid cell determine the centroid of the cell to position the label. The text of the lable will be the aet value associated with the time variable.
- ngdf_poly: Determin the centroid of the polygon to position the label. The text of the label is the aggregated value for the time and huc12_id.
Create plots: Use hvplot to create plots of the 3 geodataframes and add the corresponding labels to each of the three figures.

for idx, row in inter_sect_sub.iterrows():
    cent = row.geometry.centroid
    inter_sect_sub.at[idx, 'lon'] = cent.x
    inter_sect_sub.at[idx, 'lat'] = cent.y
is_labels = inter_sect_sub.hvplot.labels(x='lon', y='lat', text='wght', text_color='r', text_baseline='center')

for idx, row in grid_cell_sub.iterrows():
    cent = row.geometry.centroid
    grid_cell_sub.at[idx, 'lon'] = cent.x
    grid_cell_sub.at[idx, 'lat'] = cent.y
grd_labels = grid_cell_sub.hvplot.labels(x='lon', y='lat', text='aet', text_color='black', text_baseline='center')

for idx, row in ngdf_poly.iterrows():
    cent = row.geometry.centroid
    ngdf_poly.at[idx, 'lon'] = cent.x
    ngdf_poly.at[idx, 'lat'] = cent.y
ngdf_labels = ngdf_poly.hvplot.labels(x='lon', y='lat', text='aet', text_color='black', text_baseline='center')

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[24], line 1
----> 1 for idx, row in inter_sect_sub.iterrows():
      2     cent = row.geometry.centroid
      3     inter_sect_sub.at[idx, 'lon'] = cent.x

NameError: name 'inter_sect_sub' is not defined

from holoviews import opts
gc_p = grid_cell_sub.hvplot(c='aet', cmap='viridis', clim=(min_val, max_val))
int_p = inter_sect_sub.hvplot(c='aet', cmap='viridis', clim=(min_val, max_val))
gdf_p = ngdf_poly.hvplot(c='aet', cmap='viridis', clim=(min_val, max_val))

layout = ((gc_p * grd_labels) + (int_p * is_labels) + (gdf_p * ngdf_labels)).cols(1)
layout.opts(opts.Layout(shared_axes=True))

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[25], line 2
      1 from holoviews import opts
----> 2 gc_p = grid_cell_sub.hvplot(c='aet', cmap='viridis', clim=(min_val, max_val))
      3 int_p = inter_sect_sub.hvplot(c='aet', cmap='viridis', clim=(min_val, max_val))
      4 gdf_p = ngdf_poly.hvplot(c='aet', cmap='viridis', clim=(min_val, max_val))

NameError: name 'grid_cell_sub' is not defined

np.sum(grid_cell_sub.aet.values * inter_sect_sub.wght.values)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[26], line 1
----> 1 np.sum(grid_cell_sub.aet.values * inter_sect_sub.wght.values)

NameError: name 'grid_cell_sub' is not defined