Weight Generation

Weight Generation#

Calculate weights.

gdptools.weight_gen.WEIGHT_GEN_METHODS#

Methods used in WeightGen class.

serial: Iterates through polygons to calculate weights. Sufficient for most cases. parallel: Chunks polygons and distributes to available processors. Provides a substantial speedup when there is a large number of polygons.

Raises

TypeError – If value is not one of “serial” or “parallel”.

Returns

str

Return type

_type_

alias of Literal[‘serial’, ‘parallel’, ‘dask’]

class gdptools.weight_gen.WeightGen(*, user_data: UserData, method: Literal['serial', 'parallel', 'dask'], weight_gen_crs: Any, output_file: Optional[str] = None, jobs: Optional[int] = - 1, verbose: Optional[bool] = False)#

Class for weight calculation.

Parameters
  • user_data (UserData) –

  • method (Literal['serial', 'parallel', 'dask']) –

  • weight_gen_crs (Any) –

  • output_file (Optional[str]) –

  • jobs (Optional[int]) –

  • verbose (Optional[bool]) –

__init__(*, user_data: UserData, method: Literal['serial', 'parallel', 'dask'], weight_gen_crs: Any, output_file: Optional[str] = None, jobs: Optional[int] = - 1, verbose: Optional[bool] = False) None#

Weight generation class.

The WeightGen class is used to calculate weights for a given UserData object. Once initialized the The weights are calculated via the calculate_weights() method. The weights are returned as a pandas DataFrame and can be optionally save to a .csv file. The DataFrame has the following columns:

  • target_id: The target polygon id

  • i_index The i index of the source grid

  • j_index The j index of the source grid

  • weight: The calculated weight for the target/source pair

As long as the source grid and targe polygons remain the same, the weights file can be saved and reused for future calculations, for example if the user wants a different variable or statistic and does not want to recalculate the weights.

Parameters
  • user_data (UserData) – One of UserCatData, ODAPCatData, ClimateCatData

  • method (WEIGHT_GEN_METHODS) – One of WEIGHT_GEN_METHODS

  • weight_gen_crs (Any) – Any projection that can be used by pyproj.CRS.from_user_input

  • output_file (Optional[Union[str, None]], optional) – Sting of the /path/to/file or None if no output is desired. Defaults to None.

  • jobs (Optional[int], optional) – Optional, number of processors used in parallel or dask methods (dask uses dask bag). If set to default value (-1) jobs is defined as the number of processors available/2. In this case, because the data needs to be distributed amoung processors choosing half the processors available is a reasonable choice. Defaults to -1.

  • verbose (Optional[bool], optional) – If True then extra output is printed. Defaults to False.

Raises

TypeError – If one of the method arguments does not match WEIGHT_GEN_METHODS

Return type

None

calculate_weights(intersections: bool = False) DataFrame#

Calculate weights.

Parameters

intersections (bool) – _description_. Defaults to False.

Returns

_description_

Return type

pd.DataFrame

property grid_cells: GeoDataFrame#

Return grid_cells.

property intersections: GeoDataFrame#

Return intersections.

Calculate weights.

gdptools.weight_gen_p2p.WEIGHT_GEN_METHODS#

Methods used in WeightGen class.

serial: Iterates through polygons to calculate weights. Sufficient for most cases. parallel: Chunks polygons and distributes to available processors. Provides a substantial speedup when there is a large number of polygons.

Raises

TypeError – If value is not one of “serial” or “parallel”.

Returns

str

Return type

_type_

alias of Literal[‘serial’, ‘parallel’, ‘dask’]

class gdptools.weight_gen_p2p.WeightGenP2P(*, target_poly: GeoDataFrame, target_poly_idx: str, source_poly: GeoDataFrame, source_poly_idx: str, method: Literal['serial', 'parallel', 'dask'], weight_gen_crs: Any, output_file: Optional[str] = None, jobs: Optional[int] = - 1, intersections: Optional[int] = False, verbose: Optional[bool] = False)#

Class for weight calculation.

Parameters
  • target_poly (GeoDataFrame) –

  • target_poly_idx (str) –

  • source_poly (GeoDataFrame) –

  • source_poly_idx (str) –

  • method (Literal['serial', 'parallel', 'dask']) –

  • weight_gen_crs (Any) –

  • output_file (Optional[str]) –

  • jobs (Optional[int]) –

  • intersections (Optional[int]) –

  • verbose (Optional[bool]) –

__init__(*, target_poly: GeoDataFrame, target_poly_idx: str, source_poly: GeoDataFrame, source_poly_idx: str, method: Literal['serial', 'parallel', 'dask'], weight_gen_crs: Any, output_file: Optional[str] = None, jobs: Optional[int] = - 1, intersections: Optional[int] = False, verbose: Optional[bool] = False) None#

Weight generation class.

The WeightGenP2P class is used to calculate weights between 2 polygonal GeoDataFrames. The weights are returned as a pandas DataFrame and can be optionally save to a .csv file. The DataFrame has the following columns: * target_id: The target polygon feature id * source_id: The source polygon feature id * weight: The calculated weight for the target/source pair. The weight represents the fractional area that the source polygon contributes to the target polygon. If the source polygons are spatially continuous with no overlaps, suming the weights for a given target polygon should result in a value of 1. As long as the source polygon and targe polygons remain the same, the weights file can be saved and reused for future calculations, for example if the user wants a different variable or statistic and does not want to recalculate the weights.

Parameters
  • target_poly (gpd.GeoDataFrame) – Geodatafram consisting of a column with heading string of target_poly_idx, and a geometry column.

  • target_poly_idx (str) – String of the feature id that will be tagged in the resulting weights file.

  • source_poly (gpd.GeoDataFrame) – Geodatafram consisting of a column with heading string of source_poly_idx, and a geometry column.

  • source_poly_idx (str) – String of the feature id that will be tagged in the resulting weights file.

  • method (WEIGHT_GEN_METHODS) – One of WEIGHT_GEN_METHODS

  • weight_gen_crs (Any) – Any projection that can be used by pyproj.CRS.from_user_input

  • output_file (Optional[Union[str, None]], optional) – Sting of the /path/to/file or None if no output is desired. Defaults to None.

  • jobs (Optional[int], optional) – Optional, number of processors used in parallel or dask methods (dask uses dask bag). If set to default value (-1) jobs is defined as the number of processors available/2. In this case, because the data needs to be distributed amoung processors choosing half the processors available is a reasonable choice. Defaults to -1.

  • intersections (Optional[bool], optional) – _description_. Defaults to False.

  • verbose (Optional[bool], optional) – If True then extra output is printed. Defaults to False.

Raises

TypeError – If one of the method arguments does not match WEIGHT_GEN_METHODS

Return type

None

calculate_weights() DataFrame#

Calculate weights and return weights dataframe.

Return type

DataFrame

create_wght_df() DataFrame#

Create dataframe from weight components.

Return type

DataFrame

property intersections: GeoDataFrame#

Return intersections.

Abstract Base Class for Template behavior pattern for calculating weights.

class gdptools.weights.calc_weight_engines.CalcWeightEngine#

Abstract Base Class (ABC) implementing the template behavioral pattern.

Abstract Base Class for calculating weights. There are several weight generation methods implemented and they all share a common workflow with different methods for calculating the weights. This ABC defines the calc_weights() workflow, with an @abstractmethod for get_weight_components() where new methods can be plugged in for weight generation.

These methods create a table, that describes the intersection between each source polygon and each target polygon. In the case of grid-to-poly weight generation, a source polygon represents the cell of a grid, each cell is identified by it’s i,j index (row, column). In the case of poly-to-poly intersections the indexes are simply the index of the source and the index of the target. The weights are the fraction area of the source polygon to the total area of the target polygon.

The result is a table that can be used for area-weighted interpolation of the source to the target.

Note: For grid-to-poly weight generation the use of these classes is controlled by the WeightGen class and for

poly-to-poly weight generation, these classes are controlled by the WeightGenP2P class, thus most users will not use these classes directly.

calc_weights(target_poly: GeoDataFrame, target_poly_idx: str, source_poly: GeoDataFrame, source_poly_idx: List[str], source_type: Literal['grid', 'poly'], wght_gen_crs: Any, filename: str = '', intersections: bool = False, jobs: int = - 1, verbose: bool = False) Union[Tuple[DataFrame, GeoDataFrame], DataFrame]#

Template method for calculating weights.

Parameters
  • target_poly (gpd.GeoDataFrame) – GeoDataFrame containing the target polygons for weight calculation.

  • target_poly_idx (str) – Column name in the target_poly GeoDataFrame containing unique identifiers for each target polygon.

  • source_poly (gpd.GeoDataFrame) – GeoDataFrame containing the source polygons used in weight calculation.

  • source_poly_idx (List[str]) – List of column names in the source_poly GeoDataFrame containing unique identifiers for each source polygon.

  • source_type (SOURCE_TYPES) – Type of the source polygons, possibly indicating their use or attributes.

  • wght_gen_crs (Any) – Coordinate reference system used when generating weights.

  • filename (str) – Optional filename to save results. If not provided, results won’t be saved.

  • intersections (bool) – Whether to calculate intersections between the target and source polygons. Defaults to False.

  • jobs (int) – Number of parallel jobs to run. If set to -1, all available cores will be used. Defaults to -1.

  • verbose (bool) – If set to True, prints detailed information during execution. Defaults to False.

Returns

Union[pd.DataFrame, Optional[gpd.GeoDataFrame]]: Either a

DataFrame or, based on certain conditions, a GeoDataFrame containing the calculated weights.

Return type

Union[pd.DataFrame, Optional[gpd.GeoDataFrame]]

create_wght_df() DataFrame#

Create dataframe from weight components.

Return type

DataFrame

abstract get_weight_components() Union[Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]]#

Abstract method for calculating weights.

Classes that inherit this method will override this method for weight-generation.

if self.source_type == “grid” Returned tuples in order:

  1. plist: list of target poly ids.

  2. ilist i-index of grid_cells.

  3. jlist j-index of grid_cells.

  4. wghtslist weight values of i,j index of grid_cells.

if self.source_type == “poly” Returned tuples in order:

  1. plist: list of target poly ids.

  2. splist: list of source poly ids.

  3. wghtslist weight values of source polygons.

Returns

Union[

Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]

]

Return type

Union[Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]]

abstract get_weight_components_and_intesections() Union[Tuple[List[object], List[int], List[int], List[float], GeoDataFrame], Tuple[List[object], List[object], List[float], GeoDataFrame]]#

Abstract method for calculating weights.

Classes that inherit this method will override this method for weight-generation.

Returns

Union[
Tuple[List[object], List[int], List[int], List[float],

gpd.GeoDataFrame],

Tuple[List[object], List[object], List[float], gpd.GeoDataFrame]

]

if self.source_type == “grid” Returned tuples in order:

  1. plist: list of target poly ids.

  2. ilist i-index of grid_cells.

  3. jlist j-index of grid_cells.

  4. wghtslist weight values of i,j index of grid_cells.

  5. GeoDataFrame of intersection geometries.

if self.source_type == “poly” Returned tuples in order:

  1. plist: list of target poly ids.

  2. splist: list of source poly ids.

  3. wghtslist weight values of source polygons.

  4. GeoDataFrame of intersection geometries.

Return type

Union[Tuple[List[object], List[int], List[int], List[float], GeoDataFrame], Tuple[List[object], List[object], List[float], GeoDataFrame]]

class gdptools.weights.calc_weight_engines.DaskWghtGenEngine#

Method to generate grid-to-polygon weight.

This class is based on methods provided in the Tobbler package. See

area_tables_bining_parallel() method.

Parameters

CalcWeightEngine (ABC) – Abstract Base Class (ABC) employing the Template behavior pattern. The abstractmethod get weight components povides a method to plug- in new weight generation methods.

get_weight_components() Union[Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]]#

Template method from CalcWeightEngine class for generating weight components.

Returns

Union[

Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]

]

  • If source_type is “grid”, the tuple contains:
    1. plist: List of target polygon IDs.

    2. ilist: i-index of grid cells.

    3. jlist: j-index of grid cells.

    4. wghtslist: Weight values for i,j index of grid cells.

    5. GeoDataFrame of intersection geometries.

  • If source_type is “poly”, the tuple contains:
    1. plist: List of target polygon IDs.

    2. splist: List of source polygon IDs.

    3. wghtslist: Weight values of source polygons.

    4. GeoDataFrame of intersection geometries.

Return type

Union[Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]]

get_weight_components_and_intesections() Union[Tuple[List[object], List[int], List[int], List[float], GeoDataFrame], Tuple[List[object], List[object], List[float], GeoDataFrame]]#

Template method from CalcWeightEngine class for generating weight components.

Returns

Union[
Tuple[List[object], List[int], List[int], List[float],

gpd.GeoDataFrame],

Tuple[List[object], List[object], List[float], gpd.GeoDataFrame]

]

  • If source_type is “grid”, the tuple contains:
    1. plist: List of target polygon IDs.

    2. ilist: i-index of grid cells.

    3. jlist: j-index of grid cells.

    4. wghtslist: Weight values for i,j index of grid cells.

    5. GeoDataFrame of intersection geometries.

  • If source_type is “poly”, the tuple contains:
    1. plist: List of target polygon IDs.

    2. splist: List of source polygon IDs.

    3. wghtslist: Weight values of source polygons.

    4. GeoDataFrame of intersection geometries.

Return type

Union[Tuple[List[object], List[int], List[int], List[float], GeoDataFrame], Tuple[List[object], List[object], List[float], GeoDataFrame]]

class gdptools.weights.calc_weight_engines.ParallelWghtGenEngine#

Method to generate grid-to-polygon weight using multi-processing.

This class is based on adapted from methods provided in the Tobler package. See

_area_tables_binning_parallel() method.

Parameters

CalcWeightEngine (ABC) – Abstract Base Class (ABC) employing the Template behavior pattern. The abstractmethod get weight components provides a method to plug- in new weight generation methods.

get_weight_components() Union[Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]]#

Template method from CalcWeightEngine class for generating weight components.

Returns

Union[

Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]

]

if self.source_type == “grid” Returned tuples in order:

  1. plist: list of target poly ids.

  2. ilist i-index of grid_cells.

  3. jlist j-index of grid_cells.

  4. wghtslist weight values of i,j index of grid_cells.

if self.source_type == “poly” Returned tuples in order:

  1. plist: list of target poly ids.

  2. splist: list of source poly ids.

  3. wghtslist weight values of source polygons.

Return type

Union[Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]]

get_weight_components_and_intesections() Union[Tuple[List[object], List[int], List[int], List[float], GeoDataFrame], Tuple[List[object], list[object], List[float], GeoDataFrame]]#

Template method from CalcWeightEngine class for generating weight components.

Returns

Union[
Tuple[List[object], List[int], List[int], List[float],

gpd.GeoDataFrame],

Tuple[List[object], List[object], List[float], gpd.GeoDataFrame]

]

if self.source_type == “grid” Returned tuples in order:

  1. plist: list of target poly ids.

  2. ilist i-index of grid_cells.

  3. jlist j-index of grid_cells.

  4. wghtslist weight values of i,j index of grid_cells.

  5. GeoDataFrame of intersection geometries.

if self.source_type == “poly” Returned tuples in order:

  1. plist: list of target poly ids.

  2. splist: list of source poly ids.

  3. wghtslist weight values of source polygons.

  4. GeoDataFrame of intersection geometries.

Return type

Union[Tuple[List[object], List[int], List[int], List[float], GeoDataFrame], Tuple[List[object], list[object], List[float], GeoDataFrame]]

class gdptools.weights.calc_weight_engines.SerialWghtGenEngine#

Method to generate grid-to-polygon weight.

This class is based on and adapted from methods provided in the Tobler package. See

area_tables_binning() method.

Parameters

CalcWeightEngine (ABC) – Abstract Base Class (ABC) employing the Template behavior pattern. The abstractmethod get weight components provides a method to plug- in new weight generation methods.

area_tables_binning(source_df: GeoDataFrame, target_df: GeoDataFrame, source_type: Literal['grid', 'poly']) Union[Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]]#

Construct intersection tables.

Construct area allocation and source-target correspondence tables using a parallel spatial indexing approach. This method and its associated functions are based on and adapted from the Tobbler package:

Parameters
  • source_df (gpd.GeoDataFrame) – GeoDataFrame containing input data and polygons

  • target_df (gpd.GeoDataFrame) – GeoDataFrame defining the output geometries

  • source_type (SOURCE_TYPES) – “grid” or “poly” determines output format.

  • self (SerialWghtGenEngine) –

Returns

Union[

Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]

]

if self.source_type == “grid” Returned tuples in order:

  1. plist: list of poly_idx strings.

  2. ilist i-index of grid_cells.

  3. jlist j-index of grid_cells.

  4. wghtslist weight values of i,j index of grid_cells.

  5. gdf - GeoDataFrame of intersection geometries

if self.source_type == “poly” Returned tuples in order:

  1. plist: list of poly_idx strings.

  2. splist: list of source poly ids.

  3. wghtslist weight values of source polygons.

  4. gdf - GeoDataFrame of intersection geometries

Return type

Union[Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]]

area_tables_binning_and_intersections(source_df: GeoDataFrame, target_df: GeoDataFrame, source_type: Literal['grid', 'poly']) Union[Tuple[List[object], List[int], List[int], List[float], GeoDataFrame], Tuple[List[object], List[object], List[float], GeoDataFrame]]#

Construct intersection tables.

Construct area allocation and source-target correspondence tables using a parallel spatial indexing approach. This method and its associated functions are based on and adapted from the Tobbler package:

Parameters
  • source_df (gpd.GeoDataFrame) – GeoDataFrame containing input data and polygons

  • target_df (gpd.GeoDataFrame) – GeoDataFrame defining the output geometries

  • source_type (SOURCE_TYPES) – “grid” or “poly”

  • self (SerialWghtGenEngine) –

Returns

Union[
Tuple[List[object], List[int], List[int], List[float],

gpd.GeoDataFrame],

Tuple[List[object], List[object], List[float], gpd.GeoDataFrame]

]

if self.source_type == “grid” Returned tuples in order:

  1. plist: list of poly_idx strings.

  2. ilist i-index of grid_cells.

  3. jlist j-index of grid_cells.

  4. wghtslist weight values of i,j index of grid_cells.

  5. gdf - GeoDataFrame of intersection geometries

if self.source_type == “poly” Returned tuples in order:

  1. plist: list of poly_idx strings.

  2. splist: list of source poly ids.

  3. wghtslist weight values of source polygons.

  4. gdf - GeoDataFrame of intersection geometries

Return type

Union[Tuple[List[object], List[int], List[int], List[float], GeoDataFrame], Tuple[List[object], List[object], List[float], GeoDataFrame]]

get_weight_components() Union[Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]]#

Template method from CalcWeightEngine class for generating weight components.

Returns

Union[

Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]

]

if self.source_type == “grid” Returned tuples in order:

  1. plist: list of target poly ids.

  2. ilist i-index of grid_cells.

  3. jlist j-index of grid_cells.

  4. wghtslist weight values of i,j index of grid_cells.

if self.source_type == “poly” Returned tuples in order:

  1. plist: list of target poly ids.

  2. splist: list of source poly ids.

  3. wghtslist weight values of source polygons.

Return type

Union[Tuple[List[object], List[int], List[int], List[float]], Tuple[List[object], List[object], List[float]]]

get_weight_components_and_intesections() Union[Tuple[List[object], List[int], List[int], List[float], GeoDataFrame], Tuple[List[object], List[object], List[float], GeoDataFrame]]#

Template method from CalcWeightEngine class for generating weight components.

Returns

Union[
Tuple[List[object], List[int], List[int], List[float],

gpd.GeoDataFrame],

Tuple[List[object], List[object], List[float], gpd.GeoDataFrame]

]

if self.source_type == “grid” Returned tuples in order:

  1. plist: list of poly_idx strings.

  2. ilist i-index of grid_cells.

  3. jlist j-index of grid_cells.

  4. wghtslist weight values of i,j index of grid_cells.

  5. gdf - GeoDataFrame of intersection geometries

if self.source_type == “poly” Returned tuples in order:

  1. plist: list of poly_idx strings.

  2. splist: list of source poly ids.

  3. wghtslist weight values of source polygons.

  4. gdf - GeoDataFrame of intersection geometries

Return type

Union[Tuple[List[object], List[int], List[int], List[float], GeoDataFrame], Tuple[List[object], List[object], List[float], GeoDataFrame]]