Data Classes

Data Classes#

User data classes#

Prepare user data for weight generation.

class gdptools.data.user_data.ClimRCatData(*, cat_dict: dict[str, dict[str, Any]], f_feature: Union[str, Path, GeoDataFrame], id_feature: str, period: List[str])#

Instance of UserData using Climate-R catalog data.

Parameters

cat_dict (dict[str, dict[str, Any]]) –
f_feature (Union[str, Path, GeoDataFrame]) –
id_feature (str) –
period (List[str]) –

__init__(*, cat_dict: dict[str, dict[str, Any]], f_feature: Union[str, Path, GeoDataFrame], id_feature: str, period: List[str]) → None#

Initialize ClimRCatData class.

This class uses wraps the ClimateR-catalogs developed by Mike Johnson and available here mikejohnson51/climateR-catalogs.

This can be queried in pandas to return the dictionary associated with a specific source and variable in the ClimateR-catalog. The cat_dict argument is composed of a key defined by the variable name and a dictionary of the corresponding ClimateR-catalog dictionary from the variable.

Parameters

cat_dict (dict[str, dict[str, Any]]) – Parameter metadata from
climateR-catalog. –
f_feature (Union[str, Path, gpd.GeoDataFrame]) – GeoDataFrame or any path-like object that can be read by geopandas.read_file().
id_feature (str) – Header in GeoDataFrame to use as index for weights.
period (List[str]) – List containing date strings defining start and end time slice for aggregation.
self (ClimRCatData) –

Raises

KeyError – Raises error if id_feature not in f_feature columns.

Return type

None

Example

# Example of using climateR-catalog to prep cat_dict parameter
>>> cat_url = "https://mikejohnson51.github.io/climateR-catalogs/catalog.json"
>>> cat = pd.read_json(cat_url)
>>> _id = "terraclim"
>>> cat_vars = ["aet", "pet", "PDSI"]
>>> cat_params = [
... cat.query("id == @_id & variable == @_var")
... .to_dict(orient="records")[0]
... for _var in cat_vars
... ]
>>> cat_dict = dict(zip(cat_vars, cat_params))
>>> cat_dict.get("aet")
{'id': 'terraclim',
 'asset': 'agg_terraclimate_aet_1958_CurrentYear_GLOBE',
 'URL': 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_terraclimate_aet_1958_CurrentYear_GLOBE.nc',  # noqa: B950
 'type': 'opendap',
 'varname': 'aet',
 'variable': 'aet',
 'description':
 'water_evaporation_amount',
 'units': 'mm',
 'model': nan,
 'ensemble': nan,
 'scenario': 'total',
 'T_name': 'time',
 'duration':'1958-01-01/2021-12-01',
 'interval': '1 months',
 'nT': 768.0,
 'X_name': 'lon',
 'Y_name': 'lat',
 'X1': -179.9792,
 'Xn': 179.9792,
 'Y1': 89.9792,
 'Yn': -89.9792,
 'resX': 0.0417,
 'resY': 0.0417,
 'ncols': 8640.0,
 'nrows': 4320.0,
 'crs': '+proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs',  # noqa: B950
 'toptobottom': 0.0,
 'tiled': '
}

get_class_type() → str#

Abstract method for returning the type of the data class.

Return type: str

get_feature_id() → str#

Return id_feature.

Return type: str

get_source_subset(key: str) → DataArray#

get_source_subset Get subset of source data by key.

_extended_summary_

Parameters: key (str) – _description_
Returns: _description_
Return type: xr.DataArray

get_vars() → list[str]#

Return list of param_dict keys, proxy for varnames.

Return type: list[str]

prep_agg_data(key: str) → AggData#

Prepare ClimRCatData data for aggregation methods.

Parameters: key (str) – _description_
Returns: _description_
Return type: AggData

prep_interp_data(key: str, poly_id: int) → AggData#

Prep AggData from ClimRCatData.

Parameters

key (str) – Name of the xarray grided data variable
poly_id (int) – ID number of the geodataframe geometry to clip the gridded data to

Returns

An instance of the AggData class

Return type

AggData

prep_wght_data() → WeightData#

Prepare and return WeightData for weight generation.

Return type: WeightData

class gdptools.data.user_data.ODAPCatData(*, param_dict: dict[str, dict[str, Any]], grid_dict: dict[str, dict[str, Any]], f_feature: Union[str, Path, GeoDataFrame], id_feature: str, period: List[str])#

Instance of UserData using OPeNDAP catalog data.

Parameters

param_dict (dict[str, dict[str, Any]]) –
grid_dict (dict[str, dict[str, Any]]) –
f_feature (Union[str, Path, GeoDataFrame]) –
id_feature (str) –
period (List[str]) –

__init__(*, param_dict: dict[str, dict[str, Any]], grid_dict: dict[str, dict[str, Any]], f_feature: Union[str, Path, GeoDataFrame], id_feature: str, period: List[str]) → None#

Initialize ODAPCatData class.

This class uses a parameter and grid json catalog developed by Mike Johnson and available here:

Param: <https://mikejohnson51.github.io/opendap.catalog/cat_params.json> Grids: <https://mikejohnson51.github.io/opendap.catalog/cat_grids.json>

These can be queried in pandas to return the dictionary associated with a specific source and variable as in the OPENDaP Catalog examples. The param_dict and grid_dict arguments are composed of a key defined by the variable name and a dictionary of the corresponding param and grid json string from the OPENDaP Catelog.

Parameters

param_dict (dict[str, dict]) – Parameter metadata from OPeNDAP catalog.
grid_dict (dict[str, dict]) – Grid metadata from OPeNDAP catalog.
f_feature (Union[str, Path, gpd.GeoDataFrame]) – GeoDataFrame or any path-like object that can be read by geopandas.read_file().
id_feature (str) – Header in GeoDataFrame to use as index for weights.
period (List[str]) – List containing date strings defining start and end time slice for aggregation.

Raises

KeyError – Raises error if id_feature not in f_feature columns.

Return type

None

get_class_type() → str#

Abstract method for returning the type of the data class.

Return type: str

get_feature_id() → str#

Return id_feature.

Return type: str

get_source_subset(key: str) → DataArray#

get_source_subset Get data subset from source by key.

_extended_summary_

Parameters: key (str) – _description_
Returns: _description_
Return type: xr.DataArray

get_vars() → list[str]#

Return list of param_dict keys, proxy for varnames.

Return type: list[str]

prep_agg_data(key: str) → AggData#

Prepare ODAPCatData data for aggregation methods.

Parameters: key (str) – _description_
Returns: _description_
Return type: AggData

prep_interp_data(key: str, poly_id: int) → AggData#

Prep AggData from ODAPCatData.

Parameters

key (str) – Name of the xarray grided data variable
poly_id (int) – ID number of the geodataframe geometry to clip the gridded data to

Returns

An instance of the AggData class

Return type

AggData

prep_wght_data() → WeightData#

Prepare and return WeightData for weight generation.

Return type: WeightData

class gdptools.data.user_data.UserCatData(*, ds: Union[str, Dataset], proj_ds: Any, x_coord: str, y_coord: str, t_coord: str, var: Union[str, List[str]], f_feature: Union[str, Path, GeoDataFrame], proj_feature: Any, id_feature: str, period: List[str])#

Instance of UserData using minimum input variables to map to ODAPCatData.

Parameters

ds (Union[str, Dataset]) –
proj_ds (Any) –
x_coord (str) –
y_coord (str) –
t_coord (str) –
var (Union[str, List[str]]) –
f_feature (Union[str, Path, GeoDataFrame]) –
proj_feature (Any) –
id_feature (str) –
period (List[str]) –

__init__(*, ds: Union[str, Dataset], proj_ds: Any, x_coord: str, y_coord: str, t_coord: str, var: Union[str, List[str]], f_feature: Union[str, Path, GeoDataFrame], proj_feature: Any, id_feature: str, period: List[str]) → None#

__init__ Contains data preparation methods based on UserData.

_extended_summary_

Parameters

ds (Union[str, Path, xr.Dataset]) – Xarray Dataset or str, URL or Path object that can be read by xarray.
proj_ds (Any) – Any object that can be passed to pyproj.crs.CRS.from_user_input for ds
x_coord (str) – String of x coordinate name in ds
y_coord (str) – String of y coordinate name in ds
t_coord (str) – string of time coordinate name in ds
var (Union[str, List[str]]) – List of variables to be used in aggregation. They must be present in ds.
f_feature (Union[str, Path, gpd.GeoDataFrame]) – GeoDataFrame or str, URL or Path object that can be read by geopandas.
proj_feature (Any) – Any object that can be passed to pyproj.crs.CRS.from_user_input for f_feature
id_feature (str) – String of id column name in f_feature.
period (List[str]) – List of two strings of the form ‘YYYY-MM-DD’ that define the start and end of the period to be used in aggregation. The format may be ‘YYYY-MM-DD’ or ‘YYYY-MM-DD HH:MM:SS’. depending on the format of the time coordinate in ds.

Raises

KeyError – Raises error if id_feature not in f_feature columns.

Return type

None

get_class_type() → str#

Abstract method for returning the type of the data class.

Return type: str

get_feature_id() → str#

Return id_feature.

Return type: str

get_source_subset(key: str) → DataArray#

get_source_subset Get source subset by key.

_extended_summary_

Parameters: key (str) – _description_
Returns: _description_
Return type: xr.DataArray

get_vars() → list[str]#

Return list of vars in data.

Return type: list[str]

prep_agg_data(key: str) → AggData#

Prep AggData from UserData.

Parameters: key (str) –
Return type: AggData

prep_interp_data(key: str, poly_id: int) → AggData#

Prep AggData from UserCatData.

Parameters

key (str) – Name of the xarray grided data variable
poly_id (int) – ID number of the geodataframe geometry to clip the gridded data to

Returns

An instance of the AggData class

Return type

AggData

prep_wght_data() → WeightData#

Prepare and return WeightData for weight generation.

Return type: WeightData

class gdptools.data.user_data.UserData#

Prepare data for different sources for weight generation.

abstract __init__() → None#

Init class.

Return type: None

abstract get_class_type() → str#

Abstract method for returning the type of the data class.

Return type: str

abstract get_feature_id() → str#

Abstract method for returning the id_feature parameter.

Return type: str

abstract get_source_subset(key: str) → DataArray#

Abstract method for getting subset of source data.

Parameters: key (str) –
Return type: DataArray

abstract get_vars() → list[str]#

Return a list of variables.

Return type: list[str]

abstract prep_agg_data(key: str) → AggData#

Abstract method for preparing data for aggregation.

Parameters: key (str) –
Return type: AggData

abstract prep_interp_data(key: str, poly_id: int) → AggData#

Abstract method for preparing data for interpolation.

Parameters

key (str) –
poly_id (int) –

Return type

AggData

abstract prep_wght_data() → WeightData#

Abstract interface for generating weight data.

Return type: WeightData

class gdptools.data.user_data.UserTiffData(var: str, ds: Union[str, DataArray, Dataset], proj_ds: Any, x_coord: str, y_coord: str, bname: str, band: int, f_feature: Union[str, Path, GeoDataFrame], id_feature: str, proj_feature: Any)#

Instance of UserData for zonal stats processing of geotiffs.

Parameters

var (str) –
ds (Union[str, DataArray, Dataset]) –
proj_ds (Any) –
x_coord (str) –
y_coord (str) –
bname (str) –
band (int) –
f_feature (Union[str, Path, GeoDataFrame]) –
id_feature (str) –
proj_feature (Any) –

__init__(var: str, ds: Union[str, DataArray, Dataset], proj_ds: Any, x_coord: str, y_coord: str, bname: str, band: int, f_feature: Union[str, Path, GeoDataFrame], id_feature: str, proj_feature: Any) → None#

Initialize UserTiffData.

UserTiffData is a structure that aids calculating zonal stats.

Parameters

var (str) – _description_
ds (Union[str, xr.Dataset]) – _description_
proj_ds (Any) – _description_
x_coord (str) – _description_
y_coord (str) – _description_
bname (str) – _description_
band (int) – _description_
f_feature (Union[str, Path, gpd.GeoDataFrame]) – _description_
id_feature (str) – _description_
proj_feature (Any) – _description_

Return type

None

get_class_type() → str#

Abstract method for returning the type of the data class.

Return type: str

get_feature_id() → str#

Get Feature id.

Return type: str

get_source_subset(key: str) → DataArray#

get_source_subset Get subset of source data.

_extended_summary_

Parameters: key (str) – _description_
Returns: _description_
Return type: xr.DataArray

get_vars() → list[str]#

Return list of varnames.

Return type: list[str]

prep_agg_data(key: str) → AggData#

Prepare data for aggregation or zonal stats.

Parameters: key (str) –
Return type: AggData

prep_interp_data(key: str, poly_id: int) → AggData#

Prep AggData from UserTiffData.

Parameters

key (str) – Name of the xarray grided data variable
poly_id (int) – ID number of the geodataframe geometry to clip the gridded data to

Returns

An instance of the AggData class

Return type

AggData

prep_wght_data() → WeightData#

Prepare data for weight generation.

Return type: WeightData

Data passed to WeightGen class#

Data classes used in aggregation.

class gdptools.data.weight_gen_data.WeightData(feature: GeoDataFrame, id_feature: str, grid_cells: GeoDataFrame)#

Simple dataclass for tranferring prepared user data to the CalcWeightEngine.

Parameters

feature (GeoDataFrame) –
id_feature (str) –
grid_cells (GeoDataFrame) –

Data passed to AggGen class#

class gdptools.data.agg_gen_data.AggData(variable: str, cat_param: CatParams, cat_grid: CatGrids, da: DataArray, feature: GeoDataFrame, id_feature: str, period: List[str])#

AggData is a convenience container of data necessary for aggregation.

Data provided in one of UserData inherited classes will be prepped for aggregation, including subsetting the gridded data by the features bounding box, and by the time- period selected. In addition if the gridded data is defined between 0-360 degrees longitude it will be shifted to -180 - 180 degrees. For each variable in the user_data attribute of either the WeightGen or AggGen classes, a dict of {var: AggData} will be generated in the AggGen.calculate_agg() method.

Parameters

variable (str) – Variable name.
cat_param (CatParams) – Catparams data class containing parameter metadata.
cat_grid (CatGrids) – CatGrids data class containing grid metadata.
da (DataArray) – (DataArray): The spatially and temporally subsetted variable DataArray.
feature (GeoDataFrame) – The user-supplied feature file represented as a GeoDataFrame.
id_feature (str) – The feature id (column header) in the GeoDataFrame.
period (List[str]) – A list of dates representing the starting and ending date to process.

OPeNDAP Catalog data classes#

OpenDAP Catalog Data classes.

class gdptools.data.odap_cat_data.CatClimRItem(*, id: str, asset: Optional[str] = None, URL: str, varname: str, variable: Optional[str] = None, description: Optional[str] = None, units: Optional[str] = None, model: Optional[str] = None, ensemble: Optional[str] = None, scenario: Optional[str] = None, T_name: Optional[str] = None, duration: Optional[str] = None, interval: Optional[str] = None, nT: Optional[int] = 0, X_name: str, Y_name: str, X1: float, Xn: float, Y1: float, Yn: float, resX: float, resY: float, ncols: int, nrows: int, crs: str, toptobottom: str, tiled: Optional[str] = None)#

Mike Johnson’s CatClimRItem class.

Source data from which this is derived comes from:: ‘https://mikejohnson51.github.io/climateR-catalogs/catalog.json’

Parameters

id (str) –
asset (Optional[str]) –
URL (str) –
varname (str) –
variable (Optional[str]) –
description (Optional[str]) –
units (Optional[str]) –
model (Optional[str]) –
ensemble (Optional[str]) –
scenario (Optional[str]) –
T_name (Optional[str]) –
duration (Optional[str]) –
interval (Optional[str]) –
nT (Optional[int]) –
X_name (str) –
Y_name (str) –
X1 (float) –
Xn (float) –
Y1 (float) –
Yn (float) –
resX (float) –
resY (float) –
ncols (int) –
nrows (int) –
crs (str) –
toptobottom (str) –
tiled (Optional[str]) –

class Config#: interior class to direct pydantic’s behavior.

class gdptools.data.odap_cat_data.CatGrids(*, grid_id: Optional[int] = None, X_name: str, Y_name: str, X1: Optional[float] = None, Xn: Optional[float] = None, Y1: Optional[float] = None, Yn: Optional[float] = None, resX: Optional[float] = None, resY: Optional[float] = None, ncols: Optional[int] = None, nrows: Optional[int] = None, proj: str, toptobottom: int, tile: Optional[str] = None, **extra_data: Any)#

Class representing elements of Mike Johnsons OpenDAP catalog grids.

https://mikejohnson51.github.io/opendap.catalog/cat_grids.json

Parameters

grid_id (Optional[int]) –
X_name (str) –
Y_name (str) –
X1 (Optional[float]) –
Xn (Optional[float]) –
Y1 (Optional[float]) –
Yn (Optional[float]) –
resX (Optional[float]) –
resY (Optional[float]) –
ncols (Optional[int]) –
nrows (Optional[int]) –
proj (str) –
toptobottom (int) –
tile (Optional[str]) –
extra_data (Any) –

classmethod get_toptobottom(v: int) → int#

Convert str to int.

Parameters: v (int) –
Return type: int

class gdptools.data.odap_cat_data.CatParams(*, id: Optional[str] = None, URL: str, grid_id: Optional[int] = - 1, variable: Optional[str] = None, varname: str, long_name: Optional[str], T_name: Optional[str], duration: Optional[str] = None, units: Optional[str], interval: Optional[str] = None, nT: Optional[int] = 0, tiled: Optional[str] = None, model: Optional[str] = None, ensemble: Optional[str] = None, scenario: Optional[str] = None)#

Class representing elements of Mike Johnsons OpenDAP catalog params.

https://mikejohnson51.github.io/opendap.catalog/cat_params.json

Parameters

id (Optional[str]) –
URL (str) –
grid_id (Optional[int]) –
variable (Optional[str]) –
varname (str) –
long_name (Optional[str]) –
T_name (Optional[str]) –
duration (Optional[str]) –
units (Optional[str]) –
interval (Optional[str]) –
nT (Optional[int]) –
tiled (Optional[str]) –
model (Optional[str]) –
ensemble (Optional[str]) –
scenario (Optional[str]) –

classmethod set_grid_id(v: int) → int#

Convert to int.

Parameters: v (int) –
Return type: int

classmethod set_nt(v: int) → int#

Convert to int.

Parameters: v (int) –
Return type: int

gdptools.data.odap_cat_data.climr_to_odap(climr: CatClimRItem) → Tuple[CatParams, CatGrids]#

Convert a CatClimRItem to a CatParams and CatGrids object.

Parameters: climr (CatClimRItem) – The CatClimRItem object to convert.
Returns: The CatParams and CatGrids objects.
Return type: CatParams, CatGrids

Data Classes

Contents

Data Classes#

User data classes#

Data passed to WeightGen class#

Data passed to AggGen class#

OPeNDAP Catalog data classes#