Aggregation#
Calculate aggregation methods.
- gdptools.agg_gen.AGGENGINES#
List of aggregation methods.
- Parameters
serial – performes weighted-area aggregation by iterating through polygons.
parallel – performes weighted-area aggregation by number of jobs.
dask – performs weighted-area aggregation in the presence of a dask client, the number of jobs should be specified.
- Raises
TypeError – If supplied attribute is not one of AGGENGINES.
- Returns
str
- Return type
_type_
alias of
Literal
[‘serial’, ‘parallel’, ‘dask’]
- gdptools.agg_gen.AGGWRITERS#
List of available writers applied to the aggregation.
- Parameters
none – Output not written to a file.
csv – Output data in csv format.
parquet – Output data to parquet.gzip file.
netcdf – Output data in netcdf format.
json – Output data as json.
- Raises
TypeError – If supplied attribute is not one of AGGWRITERS.
- Returns
str
- Return type
_type_
alias of
Literal
[‘none’, ‘csv’, ‘parquet’, ‘netcdf’, ‘json’]
- class gdptools.agg_gen.AggGen(user_data: UserData, stat_method: Literal['masked_mean', 'mean', 'masked_std', 'std', 'masked_median', 'median', 'masked_count', 'count', 'masked_min', 'min', 'masked_max', 'max'], agg_engine: Literal['serial', 'parallel', 'dask'], agg_writer: Literal['none', 'csv', 'parquet', 'netcdf', 'json'], weights: Union[str, DataFrame], out_path: Optional[str] = None, file_prefix: Optional[str] = None, append_date: Optional[bool] = False, jobs: Optional[int] = - 1)#
Class for aggregating grid-to-polygons.
- Parameters
user_data (UserData) –
stat_method (Literal['masked_mean', 'mean', 'masked_std', 'std', 'masked_median', 'median', 'masked_count', 'count', 'masked_min', 'min', 'masked_max', 'max']) –
agg_engine (Literal['serial', 'parallel', 'dask']) –
agg_writer (Literal['none', 'csv', 'parquet', 'netcdf', 'json']) –
weights (Union[str, DataFrame]) –
out_path (Optional[str]) –
file_prefix (Optional[str]) –
append_date (Optional[bool]) –
jobs (Optional[int]) –
- __init__(user_data: UserData, stat_method: Literal['masked_mean', 'mean', 'masked_std', 'std', 'masked_median', 'median', 'masked_count', 'count', 'masked_min', 'min', 'masked_max', 'max'], agg_engine: Literal['serial', 'parallel', 'dask'], agg_writer: Literal['none', 'csv', 'parquet', 'netcdf', 'json'], weights: Union[str, DataFrame], out_path: Optional[str] = None, file_prefix: Optional[str] = None, append_date: Optional[bool] = False, jobs: Optional[int] = - 1) None #
__init__ Initalize AggGen.
AggGen is a class for aggregating gridded datasets to polygons using area- weighted statistics. The class is initialized with a user_data object, a stat_method, an agg_engine, an agg_writer, and a weights object. Once the class is initialized the user can call the
AggGen.calculate_agg()
method to perform the aggregation over the period defined in the user-data parameter.The user_data object is one of the catalog data object:(
ODAPCatData
,ClimateCatData
) or aUserCatData
object which requires some additional information that would otherwise be provided by the catalog data objects, such as the name of the coordinates and projection.The stat_method is one of the
STATSMETHODS
which are either masked or standard methods. If one contributing grid cell that is being interpolated to a polygon is missing then that statistic will return a missing value. Using the masked statistic will eturn the value of all the non-missing cells.The agg_engine is one of the
AGGENGINES
.The agg_writer is one of the
AGGWRITERS
.The weights object is either a path to a csv file containing weights or a pandas dataframe containing weights.
The out_path, file_prefix, and append date are optional parameters for for writting the weights to a csv file.
The jobs parameter is optional, and used for both the parallel and dask engines. If the jobs parameter is not specified then half the number of processors on the machine will be used.
- Parameters
user_data (
UserData
) – One ofUserCatData
,ODAPCatData
,ClimateCatData
stat_method (
STATSMETHODS
) – One ofSTATSMETHODS
.agg_engine (
AGGENGINES
) – One ofAGGENGINES
.agg_writer (
AGGWRITERS
) – One ofAGGWRITERS
.weights (
Union[str, pd.DataFrame]
) – Either a path to a csv file containing weights or a pandas dataframe containing weights.out_path (
Optional[Union[str, None]], optional
) – Optional path to output file as a string or None. Defaults to None.file_prefix (
Optional[Union[str, None]], optional
) – Optional string as prefix to fine name or None if not generating outputfile. Defaults to None.append_date (
Optional[bool], optional
) – Optional, True will append processing date to file name. Defaults to False.jobs (
Optional[int], optional
) – Optional, number of processors used in parallel or dask methods (dask uses dask bag). If set to default value (-1) jobs is defined as the number of processors available/2. In this case, because the data needs to be distributed amoung processors choosing half the processors available is a reasonable choice. Defaults to -1.
- Return type
None
- property agg_data: dict[str, gdptools.data.agg_gen_data.AggData]#
Return agg_data.
- calculate_agg() Tuple[GeoDataFrame, Dataset] #
Calculate aggregations.
- Returns
_description_
- Return type
Tuple[gpd.GeoDataFrame, xr.Dataset]
- class gdptools.agg_gen.InterpGen(user_data: Union[ODAPCatData, UserCatData], *, pt_spacing: Optional[Union[float, int]] = 50, stat: str = 'all', interp_method: str = 'linear', mask_data: Optional[bool] = False, output_file: Optional[str] = None, calc_crs: Any = 6931, method: str = 'Serial', jobs: Optional[int] = - 1)#
Class for calculating grid statistics for a polyline geometry.
- Parameters
user_data (Union[ODAPCatData, UserCatData]) –
pt_spacing (Optional[Union[float, int]]) –
stat (str) –
interp_method (str) –
mask_data (Optional[bool]) –
output_file (Optional[str]) –
calc_crs (Any) –
method (str) –
jobs (Optional[int]) –
- __init__(user_data: Union[ODAPCatData, UserCatData], *, pt_spacing: Optional[Union[float, int]] = 50, stat: str = 'all', interp_method: str = 'linear', mask_data: Optional[bool] = False, output_file: Optional[str] = None, calc_crs: Any = 6931, method: str = 'Serial', jobs: Optional[int] = - 1) None #
Initiate InterpGen Class.
- Parameters
user_data (
ODAPCatData or UserCatData
) – Data Class for input datapt_spacing (
float
) – Optional; Numerical value in meters for the spacing of the interpolated sample points (default is 50)stat (
str
) – Optional; A string indicating which statistics to calculate during the query. Options: ‘all’, ‘mean’, ‘median’, ‘std’, ‘max’, ‘min’ (default is ‘all’)interp_method (
str
) – Optional; String indicating the xarray interpolation method. Default method in ‘linear’. Options: “linear”, “nearest”, “zero”, “slinear”, “quadratic”, “cubic”, “polynomial”.mask_data (
bool or None
) – Optional; When True, nodata values are removed from statistical calculations.output_file (
str or None
) – Optional; When a file path is specified, a CSV of the statistics will be written to that file path. Must end with .csv file ending.calc_crs (
Any
) – Optional; OGC WKT string, Proj.4 string or int EPSG code. Determines which projection is used for the area weighted calculations of the line buffer geometry. (default is 6933)method (
str
) – Optional; Indicates which methodology to preform the query (currently, only the Serial method is available, default is ‘Serial’)jobs (
Optional[int], optional
) – Optional, number of processors used in parallel or dask methods (dask uses dask bag). If set to default value (-1) jobs is defined as the number of processors available/2. In this case, because the data needs to be distributed amoung processors choosing half the processors available is a reasonable choice. Defaults to -1.
- Return type
None
- calc_interp() Union[Tuple[DataFrame, GeoDataFrame], DataFrame] #
calc_interp Run the interpolation and stat calculations.
_extended_summary_
- Returns
_description_
- Return type
_type_
- gdptools.agg_gen.STATSMETHODS#
List of available aggregation methods.
- Masked methods below account for missing values in the gridded data standard methods
do not. If there is a missing value in the gridded data then the standard methods will return a nan for that polygon.
- Parameters
masked_mean – masked mean of the data.
mean – mean of the data.
masked_std – masked standard deviation of the data.
std – standard deviation of the data.
masked_median – masked median of the data.
median – median of the data.
masked_count – masked count of the data.
count – count of the data.
masked_min – masked minimum of the data.
min – minimum of the data.
masked_max – masked maximum of the data.
max – maximum of the data.
- Raises
TypeError – If supplied attribute is not one of STATSMETHODS.
- Returns
str
- Return type
_type_
alias of
Literal
[‘masked_mean’, ‘mean’, ‘masked_std’, ‘std’, ‘masked_median’, ‘median’, ‘masked_count’, ‘count’, ‘masked_min’, ‘min’, ‘masked_max’, ‘max’]