ClimateR-Catalog - Terraclime data#
Introduction#
In this example, the ClimateR-catalog (CRC), developed by Mike Johnson [Johnson, 2023] is used to discover a gridded dataset of interest. The CRC encompasses a wealth of data sources and the metadata required to simplify the process of grid-to-polygon area weighted intersections. The CRC here uses a json catalog uses a schema with common metadata for geospatial analysis. In the example presented here we use monthly water evaporation amount from the dataset TerraClimate [Abatzoglou et al., 2018].
Geo Data Processing Tools (gdptools)#
gdptools is a python package for calculating area-weighted statistics of grid-to-polygon. grid-to-line, and polygon-to-polygon interpolations. gdptools utilizes metadata from the CRC to aid generalizing gdptools processing capabilities. There are also methods for calculating area-weighted statistics for any gridded dataset that can be opened by xarray, however some of the metadata encapsulated in the json entries will have to be added by the user. Examples can be found in Non-Catalog Examples.
NLDI#
We use a client to the U.S. Geological Survey’s Hydro Network-Linked Data Index [Blodgett, 2022] and a python client (pynhd) available through the HyRiver suite of packages [Chegini et al., 2021].
References#
- ADPH18
John T. Abatzoglou, Solomon Z. Dobrowski, Sean A. Parks, and Katherine C. Hegewisch. TerraClimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958–2015. Scientific Data, January 2018. URL: https://doi.org/10.1038/sdata.2017.191, doi:10.1038/sdata.2017.191.
- Blo22
David Blodgett. Hydro-Network Linked Data Index. 2022. URL: https://labs.waterdata.usgs.gov/about-nldi/index.html.
- CLL21
Taher Chegini, Hong-Yi Li, and L. Ruby Leung. Hyriver: hydroclimate data retriever. Journal of Open Source Software, 6(66):3175, 2021. URL: https://doi.org/10.21105/joss.03175, doi:10.21105/joss.03175.
- Joh23
Mike Johnson. climateR-catalog. 2023. URL: mikejohnson51/climateR-catalogs.
import warnings
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
import pandas as pd
import numpy as np
import xarray as xr
import pyproj
from pynhd import NLDI
from pynhd import WaterData
import hvplot.xarray
import hvplot.pandas
from gdptools import WeightGen
from gdptools import AggGen
from gdptools import ClimRCatData
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.colheader_justify', 'center')
pd.set_option('display.precision', 3)
warnings.filterwarnings("ignore")
Create GeoDataFrame of Delaware River Basin HUC12s#
In this cell we use clients to several https://labs.waterdata.ugsgs.gov tools:
Tool: USGS NLDI Client: pynhd.NLDI
Tool: USGS GeoServer Client: pynhd.WaterData
More examples of using the HyRiver pynhd package can be found here
# USGS gage 01482100 Delaware River at Del Mem Bridge at Wilmington De
gage_id = "01482100"
nldi = NLDI()
del_basins = nldi.get_basins(gage_id)
huc12_basins = WaterData(layer="huc12").bygeom(del_basins.geometry[0])
---------------------------------------------------------------------------
MissingInputError Traceback (most recent call last)
Cell In[2], line 5
3 nldi = NLDI()
4 del_basins = nldi.get_basins(gage_id)
----> 5 huc12_basins = WaterData(layer="huc12").bygeom(del_basins.geometry[0])
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/pynhd/pynhd.py:676, in WaterData.bygeom(self, geometry, geo_crs, xy, predicate, sort_attr)
634 def bygeom(
635 self,
636 geometry: Polygon | MultiPolygon,
(...)
640 sort_attr: str | None = None,
641 ) -> gpd.GeoDataFrame:
642 """Get features within a geometry.
643
644 Parameters
(...)
674 The requested features in the given geometry.
675 """
--> 676 resp = self.wfs.getfeature_bygeom(
677 geometry, geo_crs, always_xy=not xy, predicate=predicate.upper(), sort_attr=sort_attr
678 )
679 resp = cast("list[dict[str, Any]]", resp)
680 return self._to_geodf(resp)
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/pygeoogc/pygeoogc.py:689, in WFS.getfeature_bygeom(self, geometry, geo_crs, always_xy, predicate, sort_attr)
686 if predicate.upper() not in valid_predicates:
687 raise InputValueError("predicate", valid_predicates)
--> 689 return self.getfeature_byfilter(
690 f"{predicate.upper()}({geom_name}, {g_wkt})", method="POST", sort_attr=sort_attr
691 )
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/pygeoogc/pygeoogc.py:790, in WFS.getfeature_byfilter(self, cql_filter, method, sort_attr)
787 except (IndexError, ValueError) as ex:
788 raise ServiceError(resp[0], self.url) from ex
--> 790 payloads = [
791 {
792 "service": "wfs",
793 "version": self.version,
794 "outputFormat": self.outformat,
795 "request": "GetFeature",
796 "typeName": self.layer,
797 "srsName": self.crs_str,
798 "cql_filter": cql_filter,
799 **self.sort_params(sort_attr, nfeatures, i),
800 }
801 for i in range(0, nfeatures, self.max_nrecords)
802 ]
803 if method == "GET":
804 return self.retrieve([self.url] * len(payloads), [{"params": p} for p in payloads])
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/pygeoogc/pygeoogc.py:799, in <listcomp>(.0)
787 except (IndexError, ValueError) as ex:
788 raise ServiceError(resp[0], self.url) from ex
790 payloads = [
791 {
792 "service": "wfs",
793 "version": self.version,
794 "outputFormat": self.outformat,
795 "request": "GetFeature",
796 "typeName": self.layer,
797 "srsName": self.crs_str,
798 "cql_filter": cql_filter,
--> 799 **self.sort_params(sort_attr, nfeatures, i),
800 }
801 for i in range(0, nfeatures, self.max_nrecords)
802 ]
803 if method == "GET":
804 return self.retrieve([self.url] * len(payloads), [{"params": p} for p in payloads])
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/pygeoogc/core.py:626, in WFSBase.sort_params(self, sort_attr, nfeatures, start_index)
624 msg += " Please set the sort_attr manually. Available columns are:\n"
625 msg += ", ".join(list(valid_attrs))
--> 626 raise MissingInputError(msg)
628 if valid_attrs and sort_attr not in valid_attrs:
629 raise InputValueError("sort_attr", list(valid_attrs))
MissingInputError: sort_attr is None and no id column found in the schema. Please set the sort_attr manually. Available columns are:
huc12, tohuc, areaacres, areasqkm, hutype, humod, states, noncontrib
huc12_basins.head()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 huc12_basins.head()
NameError: name 'huc12_basins' is not defined
Create a plot of the HUC12 Basins.
from holoviews.element.tiles import EsriTerrain
drb = huc12_basins.hvplot(
geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)",
title="Delaware River HUC12 basins", xlim=(-76.8, -73.8), aspect='equal'
)
EsriTerrain() * drb
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 2
1 from holoviews.element.tiles import EsriTerrain
----> 2 drb = huc12_basins.hvplot(
3 geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
4 xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)",
5 title="Delaware River HUC12 basins", xlim=(-76.8, -73.8), aspect='equal'
6 )
7 EsriTerrain() * drb
NameError: name 'huc12_basins' is not defined
Query the climateR-catalog#
The variables aet, pet, PDSI are available in the TerraClimate dataset. Pandas is used to query the dataset.
In the following steps the json catalog files are imported into pandas.
Using the pandas query function to search for the TerraClimate dataset for each variable of interest and return the json metadata for the parameter and grid catalog entries.
climater_cat = "https://mikejohnson51.github.io/climateR-catalogs/catalog.parquet"
cat = pd.read_parquet(climater_cat)
cat.head()
id | asset | URL | type | varname | variable | description | units | model | ensemble | scenario | T_name | duration | interval | nT | X_name | Y_name | X1 | Xn | Y1 | Yn | resX | resY | ncols | nrows | crs | toptobottom | tiled | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | maca_day | agg_macav2metdata_huss_BNU-ESM_r1i1p1_historic... | http://thredds.northwestknowledge.net:8080/thr... | opendap | specific_humidity | huss | Daily Mean Near-Surface Specific Humidity | kg kg-1 | BNU-ESM | r1i1p1 | historical | time | 1950-01-01/2005-12-31 | 1 days | 20454.0 | lon | lat | -124.772 | -67.065 | 25.063 | 49.396 | 0.042 | 0.042 | 1386.0 | 585.0 | +proj=longlat +a=6378137 +f=0.0033528106647474... | True | T |
1 | maca_day | agg_macav2metdata_huss_CNRM-CM5_r1i1p1_histori... | http://thredds.northwestknowledge.net:8080/thr... | opendap | specific_humidity | huss | Daily Mean Near-Surface Specific Humidity | kg kg-1 | CNRM-CM5 | r1i1p1 | historical | time | 1950-01-01/2005-12-31 | 1 days | 20454.0 | lon | lat | -124.772 | -67.065 | 25.063 | 49.396 | 0.042 | 0.042 | 1386.0 | 585.0 | +proj=longlat +a=6378137 +f=0.0033528106647474... | True | T |
2 | maca_day | agg_macav2metdata_huss_CSIRO-Mk3-6-0_r1i1p1_hi... | http://thredds.northwestknowledge.net:8080/thr... | opendap | specific_humidity | huss | Daily Mean Near-Surface Specific Humidity | kg kg-1 | CSIRO-Mk3-6-0 | r1i1p1 | historical | time | 1950-01-01/2005-12-31 | 1 days | 20454.0 | lon | lat | -124.772 | -67.065 | 25.063 | 49.396 | 0.042 | 0.042 | 1386.0 | 585.0 | +proj=longlat +a=6378137 +f=0.0033528106647474... | True | T |
3 | maca_day | agg_macav2metdata_huss_bcc-csm1-1_r1i1p1_histo... | http://thredds.northwestknowledge.net:8080/thr... | opendap | specific_humidity | huss | Daily Mean Near-Surface Specific Humidity | kg kg-1 | bcc-csm1-1 | r1i1p1 | historical | time | 1950-01-01/2005-12-31 | 1 days | 20454.0 | lon | lat | -124.772 | -67.065 | 25.063 | 49.396 | 0.042 | 0.042 | 1386.0 | 585.0 | +proj=longlat +a=6378137 +f=0.0033528106647474... | True | T |
4 | maca_day | agg_macav2metdata_huss_CanESM2_r1i1p1_historic... | http://thredds.northwestknowledge.net:8080/thr... | opendap | specific_humidity | huss | Daily Mean Near-Surface Specific Humidity | kg kg-1 | CanESM2 | r1i1p1 | historical | time | 1950-01-01/2005-12-31 | 1 days | 20454.0 | lon | lat | -124.772 | -67.065 | 25.063 | 49.396 | 0.042 | 0.042 | 1386.0 | 585.0 | +proj=longlat +a=6378137 +f=0.0033528106647474... | True | T |
Pandas query function example#
_id = "terraclim"
_varname = "aet"
# an example query returns a pandas dataframe.
tc = cat.query(
"id == @_id & variable == @_varname"
)
tc
id | asset | URL | type | varname | variable | description | units | model | ensemble | scenario | T_name | duration | interval | nT | X_name | Y_name | X1 | Xn | Y1 | Yn | resX | resY | ncols | nrows | crs | toptobottom | tiled | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1374 | terraclim | agg_terraclimate_aet_1958_CurrentYear_GLOBE | http://thredds.northwestknowledge.net:8080/thr... | opendap | aet | aet | water_evaporation_amount | mm | None | None | total | time | 1958-01-01/2021-12-01 | 1 months | 768.0 | lon | lat | -179.979 | 179.979 | 89.979 | -89.979 | 0.042 | 0.042 | 8640.0 | 4320.0 | +proj=longlat +a=6378137 +f=0.0033528106647474... | False |
Query the climateR-catalog for variables aet, pet, PDSI#
Query the catalog for each variable of interest. The pandas query() function returns a pandas DataFrame but we want just the dictionary representation of the json catalog entry so it’s converted to a dictionary with format dict[str, dict[str, Any]], where the first key is the variable of interest and the value is a dictionary of the resulting climateR-catalog data.
# Create a dictionary of parameter dataframes for each variable
tvars = ["aet", "pet", "PDSI"]
cat_params = [cat.query("id == @_id & variable == @_var").to_dict(orient="records")[0] for _var in tvars]
cat_dict = dict(zip(tvars, cat_params))
# Output an example of the cat_param.json entry for "aet".
cat_dict.get("aet")
{'id': 'terraclim',
'asset': 'agg_terraclimate_aet_1958_CurrentYear_GLOBE',
'URL': 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_terraclimate_aet_1958_CurrentYear_GLOBE.nc',
'type': 'opendap',
'varname': 'aet',
'variable': 'aet',
'description': 'water_evaporation_amount',
'units': 'mm',
'model': None,
'ensemble': None,
'scenario': 'total',
'T_name': 'time',
'duration': '1958-01-01/2021-12-01',
'interval': '1 months',
'nT': 768.0,
'X_name': 'lon',
'Y_name': 'lat',
'X1': -179.97916666666666,
'Xn': 179.97916666666666,
'Y1': 89.97916666666667,
'Yn': -89.97916666666664,
'resX': 0.041666666666666664,
'resY': 0.041666666666666664,
'ncols': 8640.0,
'nrows': 4320.0,
'crs': '+proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs',
'toptobottom': False,
'tiled': ''}
Calculate the weights that will be used in the area-weighted statistics#
A note on the methods applied here:
There are currently three data classes, ClimRCatData, for use with input data derived from the climateR-catalog, ODAPCatData, for use with input data derived from the OPeNDAP catalog, and UserCatData for user-defined input data. The non-catalog examples demonstrate the use of UserCatData, and here ClimRCatData is demonstrated.
WeightGen() is the tool for generating weights. There are several methods available including: “serial”, “parallel”, and “dask”. If generating weights over a scale similar to the DRB huc12 basins here, serial using the serial method is appropriate. If for example weights are generated for huc12 basins over all of CONUS then “parallel” or “dask” (if you have a dask cluster available) will provide a significant speed-up.
The calculate_weights() method has the option to set interesection=True, which will save the intersected polygon geometries to wght_gen and the can be retrieved with the wght_gen.intersections property. We set it to True here so we could use it to evaluate the calculated weights and resulting aggregated values later in this notebook. Normally, one could just calculate weights() and the default value for intersections is False.
user_data = ClimRCatData(
cat_dict=cat_dict,
f_feature=huc12_basins,
id_feature='huc12',
period=["1980-01-01", "1980-12-31"]
)
wght_gen = WeightGen(
user_data=user_data,
method="serial",
output_file="terraclime_daily_wghts.csv",
weight_gen_crs=6931
)
wghts = wght_gen.calculate_weights(intersections=True)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 3
1 user_data = ClimRCatData(
2 cat_dict=cat_dict,
----> 3 f_feature=huc12_basins,
4 id_feature='huc12',
5 period=["1980-01-01", "1980-12-31"]
6 )
8 wght_gen = WeightGen(
9 user_data=user_data,
10 method="serial",
11 output_file="terraclime_daily_wghts.csv",
12 weight_gen_crs=6931
13 )
15 wghts = wght_gen.calculate_weights(intersections=True)
NameError: name 'huc12_basins' is not defined
Returned weights dataframe#
wghts.head()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 1
----> 1 wghts.head()
NameError: name 'wghts' is not defined
Calculate the area-weighted averages#
For each variable in the TerraClimate dataset and for each HUC12 in the Delaware River basin
Output the result as a netcdf file.
The data can also be output as a .csv file.
Returned values#
ngdf: The original huc12_basins geodataframe is sorted by user_data.id_feature, and dissolved by user_data.id_feature. So the returned geodataframe is a modified version of the original huc12_basins, and is returned for easy evaluation and for plotting.
ds_out: Xarray dataset representing the aggregated values and dimensioned by (time, user_data.id_feature). Again return for evaluation and plotting purposes.
properties#
agg_data. The user_data prepared for aggregating. Can be used to get the gridded dataset subsetted by time and feature bounding box.
agg_gen = AggGen(
user_data=user_data,
stat_method="masked_mean",
agg_engine="serial",
agg_writer="netcdf",
weights='terraclime_daily_wghts.csv',
out_path='.',
file_prefix="testing_tc_drb_huc12"
)
ngdf, ds_out = agg_gen.calculate_agg()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 2
1 agg_gen = AggGen(
----> 2 user_data=user_data,
3 stat_method="masked_mean",
4 agg_engine="serial",
5 agg_writer="netcdf",
6 weights='terraclime_daily_wghts.csv',
7 out_path='.',
8 file_prefix="testing_tc_drb_huc12"
9 )
10 ngdf, ds_out = agg_gen.calculate_agg()
NameError: name 'user_data' is not defined
Return the subset of TerraClimate data#
agg_gen prepares the user_data by subsetting the gridded data by the bounding box of the user.feature data and the user_data.period. This prepared data is available via the agg_gen.agg_data property. The return value is a dict[var, AggData]. To retrieve the prepared data, use the get() method on the dict by variable. The da property is the sub-setted xarray DataArray of the variable “aet”.
processed_data = agg_gen.agg_data
da = processed_data.get("aet").da
da
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 1
----> 1 processed_data = agg_gen.agg_data
2 da = processed_data.get("aet").da
3 da
NameError: name 'agg_gen' is not defined
Plot the resulting interpolation#
Calulate min/max of value and time=7 to adjust plotting paramters below
var = 'aet'
time_step = "1980-08-01"
vmin = np.nanmin(da.sel(time=time_step).values)
vmax = np.nanmax(da.sel(time=time_step).values)
print(vmin, vmax)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[13], line 1
----> 1 vmin = np.nanmin(da.sel(time=time_step).values)
2 vmax = np.nanmax(da.sel(time=time_step).values)
3 print(vmin, vmax)
NameError: name 'da' is not defined
ngdf['aet'] = ds_out[var].sel(time=time_step)
terraclim_agg = ngdf.to_crs(4326).hvplot(
c='aet', geo=True, coastline='50m', frame_width=300, alpha=1, cmap='viridis', clim=(vmin, vmax),
xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)", xlim=(-76.8, -73.8),
title="DRB HUC12 area-weighted average", aspect='equal'
)
terraclim_raw = da.sel(time=time_step).hvplot.quadmesh(
'lon', 'lat', 'aet', cmap='viridis', alpha=1, grid=True, geo=True, coastline='50m', frame_width=300, clim=(vmin,vmax),
xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)", xlim=(-76.8, -73.8),
title="TerraClimate Monthly AET", aspect='equal'
)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[14], line 1
----> 1 ngdf['aet'] = ds_out[var].sel(time=time_step)
2 terraclim_agg = ngdf.to_crs(4326).hvplot(
3 c='aet', geo=True, coastline='50m', frame_width=300, alpha=1, cmap='viridis', clim=(vmin, vmax),
4 xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)", xlim=(-76.8, -73.8),
5 title="DRB HUC12 area-weighted average", aspect='equal'
6 )
7 terraclim_raw = da.sel(time=time_step).hvplot.quadmesh(
8 'lon', 'lat', 'aet', cmap='viridis', alpha=1, grid=True, geo=True, coastline='50m', frame_width=300, clim=(vmin,vmax),
9 xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)", xlim=(-76.8, -73.8),
10 title="TerraClimate Monthly AET", aspect='equal'
11 )
NameError: name 'ds_out' is not defined
(EsriTerrain() * terraclim_raw) + (EsriTerrain() * terraclim_agg) # + (EsriTerrain() * drb)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[15], line 1
----> 1 (EsriTerrain() * terraclim_raw) + (EsriTerrain() * terraclim_agg) # + (EsriTerrain() * drb)
NameError: name 'terraclim_raw' is not defined
(EsriTerrain() * terraclim_raw * terraclim_agg)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[16], line 1
----> 1 (EsriTerrain() * terraclim_raw * terraclim_agg)
NameError: name 'terraclim_raw' is not defined
Inspect structure of saved netcdf file#
ncf = xr.open_dataset('testing_tc_drb_huc12.nc')
ncf
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/file_manager.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
210 try:
--> 211 file = self._cache[self._key]
212 except KeyError:
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.__getitem__(self, key)
55 with self._lock:
---> 56 value = self._cache[key]
57 self._cache.move_to_end(key)
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/home/docs/checkouts/readthedocs.org/user_builds/gdptools/checkouts/latest/docs/Examples/ClimateR-Catalog/testing_tc_drb_huc12.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), 'df6084c7-66e0-4709-868f-868168576347']
During handling of the above exception, another exception occurred:
FileNotFoundError Traceback (most recent call last)
Cell In[17], line 1
----> 1 ncf = xr.open_dataset('testing_tc_drb_huc12.nc')
2 ncf
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/api.py:572, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
560 decoders = _resolve_decoders_kwargs(
561 decode_cf,
562 open_backend_dataset_parameters=backend.open_dataset_parameters,
(...)
568 decode_coords=decode_coords,
569 )
571 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 572 backend_ds = backend.open_dataset(
573 filename_or_obj,
574 drop_variables=drop_variables,
575 **decoders,
576 **kwargs,
577 )
578 ds = _dataset_from_backend_dataset(
579 backend_ds,
580 filename_or_obj,
(...)
590 **kwargs,
591 )
592 return ds
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:644, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose)
623 def open_dataset( # type: ignore[override] # allow LSP violation, not supporting **kwargs
624 self,
625 filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
(...)
641 autoclose=False,
642 ) -> Dataset:
643 filename_or_obj = _normalize_path(filename_or_obj)
--> 644 store = NetCDF4DataStore.open(
645 filename_or_obj,
646 mode=mode,
647 format=format,
648 group=group,
649 clobber=clobber,
650 diskless=diskless,
651 persist=persist,
652 lock=lock,
653 autoclose=autoclose,
654 )
656 store_entrypoint = StoreBackendEntrypoint()
657 with close_on_error(store):
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:407, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
401 kwargs = dict(
402 clobber=clobber, diskless=diskless, persist=persist, format=format
403 )
404 manager = CachingFileManager(
405 netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
406 )
--> 407 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:354, in NetCDF4DataStore.__init__(self, manager, group, mode, lock, autoclose)
352 self._group = group
353 self._mode = mode
--> 354 self.format = self.ds.data_model
355 self._filename = self.ds.filepath()
356 self.is_remote = is_remote_uri(self._filename)
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:416, in NetCDF4DataStore.ds(self)
414 @property
415 def ds(self):
--> 416 return self._acquire()
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:410, in NetCDF4DataStore._acquire(self, needs_lock)
409 def _acquire(self, needs_lock=True):
--> 410 with self._manager.acquire_context(needs_lock) as root:
411 ds = _nc4_require_group(root, self._group, self._mode)
412 return ds
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/contextlib.py:119, in _GeneratorContextManager.__enter__(self)
117 del self.args, self.kwds, self.func
118 try:
--> 119 return next(self.gen)
120 except StopIteration:
121 raise RuntimeError("generator didn't yield") from None
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/file_manager.py:199, in CachingFileManager.acquire_context(self, needs_lock)
196 @contextlib.contextmanager
197 def acquire_context(self, needs_lock=True):
198 """Context manager for acquiring a file."""
--> 199 file, cached = self._acquire_with_cache_info(needs_lock)
200 try:
201 yield file
File ~/checkouts/readthedocs.org/user_builds/gdptools/conda/latest/lib/python3.9/site-packages/xarray/backends/file_manager.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
215 kwargs = kwargs.copy()
216 kwargs["mode"] = self._mode
--> 217 file = self._opener(*self._args, **kwargs)
218 if self._mode == "w":
219 # ensure file doesn't get overridden when opened again
220 self._mode = "a"
File src/netCDF4/_netCDF4.pyx:2353, in netCDF4._netCDF4.Dataset.__init__()
File src/netCDF4/_netCDF4.pyx:1963, in netCDF4._netCDF4._ensure_nc_success()
FileNotFoundError: [Errno 2] No such file or directory: b'/home/docs/checkouts/readthedocs.org/user_builds/gdptools/checkouts/latest/docs/Examples/ClimateR-Catalog/testing_tc_drb_huc12.nc'
Validation of area-weighted mean#
In the following sections we use some helper functions, which return a information on the intersection calculation to create an illustration of the resulting intersection and calculated value at a polygon within the huc12_basins geodataframe.
The anlysis is performed on one of the huc12_basins polygons using the huc12 id - “020200050301”. It extracts the grid_cells intersected by the huc12 polygon, the resulting intersections of the grid_cells with the huc12 polygon, and the aggregated value of the huc12 polygon for one of the days in the resulting aggregation.
grid_cells (GeoDataFrame): The polygons representing each cell of the Terraclim grid.
intersections (GeoDataFrame): The grid_cells interesections with the huc12_basins polygons.
ngdf_poly (GeoDataFrame): The resorted and dissolved GeoDataFrame from agg_gen.
weights (xarray.Dataset): The generated weights read into xarray Dataset.
time = 7
huc12_id = "020200050301"
grid_cells = wght_gen.grid_cells
weights = pd.read_csv("terraclime_daily_wghts.csv").groupby("huc12").get_group(int(huc12_id))
i_ind = weights.i.values
j_ind = weights.j.values
wght = weights.wght.values
i_max = max(i_ind)
i_min = min(i_ind)
j_max = max(j_ind)
j_min = min(j_ind)
tmp = [grid_cells.query("i_index == @i and j_index == @j") for i,j in zip(i_ind,j_ind)]
grid_cell_sub = pd.concat(tmp)
grid_cell_sub
aet_vals = da.values[:, i_ind, j_ind]
grid_cell_sub["aet"] = aet_vals[time]
min_val = min(grid_cell_sub["aet"].values)
max_val = max(grid_cell_sub["aet"].values)
grid_cell_sub
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[18], line 3
1 time = 7
2 huc12_id = "020200050301"
----> 3 grid_cells = wght_gen.grid_cells
5 weights = pd.read_csv("terraclime_daily_wghts.csv").groupby("huc12").get_group(int(huc12_id))
6 i_ind = weights.i.values
NameError: name 'wght_gen' is not defined
inter_sect_sub = wght_gen.intersections.groupby("huc12").get_group(huc12_id)
inter_sect_sub.to_crs(4326, inplace=True)
inter_sect_sub.reset_index(drop=True, inplace=True)
inter_sect_sub["aet"] = aet_vals[time]
inter_sect_sub["wght"] = wght
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[19], line 1
----> 1 inter_sect_sub = wght_gen.intersections.groupby("huc12").get_group(huc12_id)
2 inter_sect_sub.to_crs(4326, inplace=True)
3 inter_sect_sub.reset_index(drop=True, inplace=True)
NameError: name 'wght_gen' is not defined
ngdf_poly = ngdf.groupby("huc12").get_group("020200050301")
aggvals = xr.open_dataset("testing_tc_drb_huc12.nc")
huc12_index = np.where(aggvals.huc12.values == huc12_id)[0][0]
ngdf_poly["aet"] = aggvals.aet.values[time, huc12_index]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[20], line 1
----> 1 ngdf_poly = ngdf.groupby("huc12").get_group("020200050301")
4 aggvals = xr.open_dataset("testing_tc_drb_huc12.nc")
5 huc12_index = np.where(aggvals.huc12.values == huc12_id)[0][0]
NameError: name 'ngdf' is not defined
grid_cell_sub["aet"] = da.values[time, i_ind, j_ind]
inter_sect_sub["aet"] = da.values[time, i_ind, j_ind]
ngdf_poly["aet"] = aggvals.aet.values[time,huc12_index]
min_val = min(grid_cell_sub["aet"].values)
max_val = max(grid_cell_sub["aet"].values)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[21], line 1
----> 1 grid_cell_sub["aet"] = da.values[time, i_ind, j_ind]
2 inter_sect_sub["aet"] = da.values[time, i_ind, j_ind]
3 ngdf_poly["aet"] = aggvals.aet.values[time,huc12_index]
NameError: name 'da' is not defined
print(
grid_cell_sub.crs,
ngdf_poly.crs,
inter_sect_sub.crs
)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[22], line 2
1 print(
----> 2 grid_cell_sub.crs,
3 ngdf_poly.crs,
4 inter_sect_sub.crs
5 )
NameError: name 'grid_cell_sub' is not defined
grid_cell_sub.to_crs(4326, inplace=True)
ngdf_poly.to_crs(4326, inplace=True)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[23], line 1
----> 1 grid_cell_sub.to_crs(4326, inplace=True)
2 ngdf_poly.to_crs(4326, inplace=True)
NameError: name 'grid_cell_sub' is not defined
Plot the results#
Create labels:
inter_sect_sub: for each intersection, determine the centroid of the cell to position the label. The text of the label will be the weight corresponding to the intersected area.
gric_cell_sub: For each grid cell determine the centroid of the cell to position the label. The text of the lable will be the aet value associated with the time variable.
ngdf_poly: Determin the centroid of the polygon to position the label. The text of the label is the aggregated value for the time and huc12_id.
Create plots: Use hvplot to create plots of the 3 geodataframes and add the corresponding labels to each of the three figures.
for idx, row in inter_sect_sub.iterrows():
cent = row.geometry.centroid
inter_sect_sub.at[idx, 'lon'] = cent.x
inter_sect_sub.at[idx, 'lat'] = cent.y
is_labels = inter_sect_sub.hvplot.labels(x='lon', y='lat', text='wght', text_color='r', text_baseline='center')
for idx, row in grid_cell_sub.iterrows():
cent = row.geometry.centroid
grid_cell_sub.at[idx, 'lon'] = cent.x
grid_cell_sub.at[idx, 'lat'] = cent.y
grd_labels = grid_cell_sub.hvplot.labels(x='lon', y='lat', text='aet', text_color='black', text_baseline='center')
for idx, row in ngdf_poly.iterrows():
cent = row.geometry.centroid
ngdf_poly.at[idx, 'lon'] = cent.x
ngdf_poly.at[idx, 'lat'] = cent.y
ngdf_labels = ngdf_poly.hvplot.labels(x='lon', y='lat', text='aet', text_color='black', text_baseline='center')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[24], line 1
----> 1 for idx, row in inter_sect_sub.iterrows():
2 cent = row.geometry.centroid
3 inter_sect_sub.at[idx, 'lon'] = cent.x
NameError: name 'inter_sect_sub' is not defined
from holoviews import opts
gc_p = grid_cell_sub.hvplot(c='aet', cmap='viridis', clim=(min_val, max_val))
int_p = inter_sect_sub.hvplot(c='aet', cmap='viridis', clim=(min_val, max_val))
gdf_p = ngdf_poly.hvplot(c='aet', cmap='viridis', clim=(min_val, max_val))
layout = ((gc_p * grd_labels) + (int_p * is_labels) + (gdf_p * ngdf_labels)).cols(1)
layout.opts(opts.Layout(shared_axes=True))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[25], line 2
1 from holoviews import opts
----> 2 gc_p = grid_cell_sub.hvplot(c='aet', cmap='viridis', clim=(min_val, max_val))
3 int_p = inter_sect_sub.hvplot(c='aet', cmap='viridis', clim=(min_val, max_val))
4 gdf_p = ngdf_poly.hvplot(c='aet', cmap='viridis', clim=(min_val, max_val))
NameError: name 'grid_cell_sub' is not defined
np.sum(grid_cell_sub.aet.values * inter_sect_sub.wght.values)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[26], line 1
----> 1 np.sum(grid_cell_sub.aet.values * inter_sect_sub.wght.values)
NameError: name 'grid_cell_sub' is not defined