Aggregation

Aggregation#

Calculate aggregation methods.

gdptools.agg_gen.AGGENGINES#

List of aggregation methods.

Parameters
  • serial – performes weighted-area aggregation by iterating through polygons.

  • parallel – performes weighted-area aggregation by number of jobs.

  • dask – performs weighted-area aggregation in the presence of a dask client, the number of jobs should be specified.

Raises

TypeError – If supplied attribute is not one of AGGENGINES.

Returns

str

Return type

_type_

alias of Literal[‘serial’, ‘parallel’, ‘dask’]

gdptools.agg_gen.AGGWRITERS#

List of available writers applied to the aggregation.

Parameters
  • none – Output not written to a file.

  • csv – Output data in csv format.

  • parquet – Output data to parquet.gzip file.

  • netcdf – Output data in netcdf format.

  • json – Output data as json.

Raises

TypeError – If supplied attribute is not one of AGGWRITERS.

Returns

str

Return type

_type_

alias of Literal[‘none’, ‘csv’, ‘parquet’, ‘netcdf’, ‘json’]

class gdptools.agg_gen.AggGen(user_data: UserData, stat_method: Literal['masked_mean', 'mean', 'masked_std', 'std', 'masked_median', 'median', 'masked_count', 'count', 'masked_min', 'min', 'masked_max', 'max'], agg_engine: Literal['serial', 'parallel', 'dask'], agg_writer: Literal['none', 'csv', 'parquet', 'netcdf', 'json'], weights: Union[str, DataFrame], out_path: Optional[str] = None, file_prefix: Optional[str] = None, append_date: Optional[bool] = False, jobs: Optional[int] = - 1)#

Class for aggregating grid-to-polygons.

Parameters
  • user_data (UserData) –

  • stat_method (Literal['masked_mean', 'mean', 'masked_std', 'std', 'masked_median', 'median', 'masked_count', 'count', 'masked_min', 'min', 'masked_max', 'max']) –

  • agg_engine (Literal['serial', 'parallel', 'dask']) –

  • agg_writer (Literal['none', 'csv', 'parquet', 'netcdf', 'json']) –

  • weights (Union[str, DataFrame]) –

  • out_path (Optional[str]) –

  • file_prefix (Optional[str]) –

  • append_date (Optional[bool]) –

  • jobs (Optional[int]) –

__init__(user_data: UserData, stat_method: Literal['masked_mean', 'mean', 'masked_std', 'std', 'masked_median', 'median', 'masked_count', 'count', 'masked_min', 'min', 'masked_max', 'max'], agg_engine: Literal['serial', 'parallel', 'dask'], agg_writer: Literal['none', 'csv', 'parquet', 'netcdf', 'json'], weights: Union[str, DataFrame], out_path: Optional[str] = None, file_prefix: Optional[str] = None, append_date: Optional[bool] = False, jobs: Optional[int] = - 1) None#

__init__ Initalize AggGen.

AggGen is a class for aggregating gridded datasets to polygons using area- weighted statistics. The class is initialized with a user_data object, a stat_method, an agg_engine, an agg_writer, and a weights object. Once the class is initialized the user can call the AggGen.calculate_agg() method to perform the aggregation over the period defined in the user-data parameter.

The user_data object is one of the catalog data object:(ODAPCatData, ClimateCatData) or a UserCatData object which requires some additional information that would otherwise be provided by the catalog data objects, such as the name of the coordinates and projection.

The stat_method is one of the STATSMETHODS which are either masked or standard methods. If one contributing grid cell that is being interpolated to a polygon is missing then that statistic will return a missing value. Using the masked statistic will eturn the value of all the non-missing cells.

The agg_engine is one of the AGGENGINES.

The agg_writer is one of the AGGWRITERS.

The weights object is either a path to a csv file containing weights or a pandas dataframe containing weights.

The out_path, file_prefix, and append date are optional parameters for for writting the weights to a csv file.

The jobs parameter is optional, and used for both the parallel and dask engines. If the jobs parameter is not specified then half the number of processors on the machine will be used.

Parameters
  • user_data (UserData) – One of UserCatData, ODAPCatData, ClimateCatData

  • stat_method (STATSMETHODS) – One of STATSMETHODS.

  • agg_engine (AGGENGINES) – One of AGGENGINES.

  • agg_writer (AGGWRITERS) – One of AGGWRITERS.

  • weights (Union[str, pd.DataFrame]) – Either a path to a csv file containing weights or a pandas dataframe containing weights.

  • out_path (Optional[Union[str, None]], optional) – Optional path to output file as a string or None. Defaults to None.

  • file_prefix (Optional[Union[str, None]], optional) – Optional string as prefix to fine name or None if not generating outputfile. Defaults to None.

  • append_date (Optional[bool], optional) – Optional, True will append processing date to file name. Defaults to False.

  • jobs (Optional[int], optional) – Optional, number of processors used in parallel or dask methods (dask uses dask bag). If set to default value (-1) jobs is defined as the number of processors available/2. In this case, because the data needs to be distributed amoung processors choosing half the processors available is a reasonable choice. Defaults to -1.

Return type

None

property agg_data: dict[str, gdptools.data.agg_gen_data.AggData]#

Return agg_data.

calculate_agg() Tuple[GeoDataFrame, Dataset]#

Calculate aggregations.

Returns

_description_

Return type

Tuple[gpd.GeoDataFrame, xr.Dataset]

class gdptools.agg_gen.InterpGen(user_data: Union[ODAPCatData, UserCatData], *, pt_spacing: Optional[Union[float, int]] = 50, stat: str = 'all', interp_method: str = 'linear', mask_data: Optional[bool] = False, output_file: Optional[str] = None, calc_crs: Any = 6931, method: str = 'Serial', jobs: Optional[int] = - 1)#

Class for calculating grid statistics for a polyline geometry.

Parameters
  • user_data (Union[ODAPCatData, UserCatData]) –

  • pt_spacing (Optional[Union[float, int]]) –

  • stat (str) –

  • interp_method (str) –

  • mask_data (Optional[bool]) –

  • output_file (Optional[str]) –

  • calc_crs (Any) –

  • method (str) –

  • jobs (Optional[int]) –

__init__(user_data: Union[ODAPCatData, UserCatData], *, pt_spacing: Optional[Union[float, int]] = 50, stat: str = 'all', interp_method: str = 'linear', mask_data: Optional[bool] = False, output_file: Optional[str] = None, calc_crs: Any = 6931, method: str = 'Serial', jobs: Optional[int] = - 1) None#

Initiate InterpGen Class.

Parameters
  • user_data (ODAPCatData or UserCatData) – Data Class for input data

  • pt_spacing (float) – Optional; Numerical value in meters for the spacing of the interpolated sample points (default is 50)

  • stat (str) – Optional; A string indicating which statistics to calculate during the query. Options: ‘all’, ‘mean’, ‘median’, ‘std’, ‘max’, ‘min’ (default is ‘all’)

  • interp_method (str) – Optional; String indicating the xarray interpolation method. Default method in ‘linear’. Options: “linear”, “nearest”, “zero”, “slinear”, “quadratic”, “cubic”, “polynomial”.

  • mask_data (bool or None) – Optional; When True, nodata values are removed from statistical calculations.

  • output_file (str or None) – Optional; When a file path is specified, a CSV of the statistics will be written to that file path. Must end with .csv file ending.

  • calc_crs (Any) – Optional; OGC WKT string, Proj.4 string or int EPSG code. Determines which projection is used for the area weighted calculations of the line buffer geometry. (default is 6933)

  • method (str) – Optional; Indicates which methodology to preform the query (currently, only the Serial method is available, default is ‘Serial’)

  • jobs (Optional[int], optional) – Optional, number of processors used in parallel or dask methods (dask uses dask bag). If set to default value (-1) jobs is defined as the number of processors available/2. In this case, because the data needs to be distributed amoung processors choosing half the processors available is a reasonable choice. Defaults to -1.

Return type

None

calc_interp() Union[Tuple[DataFrame, GeoDataFrame], DataFrame]#

calc_interp Run the interpolation and stat calculations.

_extended_summary_

Returns

_description_

Return type

_type_

gdptools.agg_gen.STATSMETHODS#

List of available aggregation methods.

Masked methods below account for missing values in the gridded data standard methods

do not. If there is a missing value in the gridded data then the standard methods will return a nan for that polygon.

Parameters
  • masked_mean – masked mean of the data.

  • mean – mean of the data.

  • masked_std – masked standard deviation of the data.

  • std – standard deviation of the data.

  • masked_median – masked median of the data.

  • median – median of the data.

  • masked_count – masked count of the data.

  • count – count of the data.

  • masked_min – masked minimum of the data.

  • min – minimum of the data.

  • masked_max – masked maximum of the data.

  • max – maximum of the data.

Raises

TypeError – If supplied attribute is not one of STATSMETHODS.

Returns

str

Return type

_type_

alias of Literal[‘masked_mean’, ‘mean’, ‘masked_std’, ‘std’, ‘masked_median’, ‘median’, ‘masked_count’, ‘count’, ‘masked_min’, ‘min’, ‘masked_max’, ‘max’]