Make indigenous status lookup — make

Finds indigenous status for taxa based on highest frequency of occurrence in data using make_attribute. Compared to a straight run of make_attribute, this function first attempts to find an indigenous status for a taxa based on values found in a primary data source (if supplied). Taxa with no indigenous values in the primary data source are then given an indigenous status based on all other data sources. In addition, flags and overrides indicating non-indigenous taxa can also be provided to overcome errors in the data, and the indigenous calculation can be restricted to an area of interest (aoi).

make_ind(
  df,
  taxa_col = "original_name",
  ind_col = "ind",
  taxonomy,
  max_guess = "species",
  context = "kingdom",
  remove_strings = c("n/a", "''", "NA", "^\\s*$"),
  primary_data_source = NULL,
  data_source_col = NULL,
  non_ind_terms = NULL,
  common_df = NULL,
  genus_overrides = NULL,
  species_overrides = NULL,
  use_aoi = NULL,
  df_x = "long",
  df_y = "lat",
  crs_df = 4326
)

Arguments

df: Dataframe with taxa_col and ind_col, and optionally x, y coordinate columns if use_aoi == TRUE.
taxa_col: Character name of column in df that was passed to get_taxonomy as taxa_col.
ind_col: Character name of column in df that contains the common names.
taxonomy: List resulting from call to make_taxonomy().
max_guess: Character. If indigenous values are not available for taxa, try guessing from values up to max_guess level of taxonomic hierarchy. See lurank. Note it does not make sense to provide a rank here that is lower than the target_rank provided to make_taxonomy when taxonomy was made.
context: Any other columns in df to maintain throughout summarising.
remove_strings: Character. Any values in ind_col to exclude.
primary_data_source: Character value from data_source_col indicating the name of the primary data source.
data_source_col: Character name of the data source column with the name of the primary data source.
non_ind_terms: Character vector of non-indigenous terms found in common names to use for applying non-indigenous status regardless of the indigenous values in the data.
common_df: Data frame containing taxa_col and 'common' field with common names for each taxa to use for applying non-indigenous status with the non_ind_terms.
genus_overrides: Character vector of known non-indigenous genera to apply non-indigenous status to all species within those genera regardless of the indigenous values in the data.
species_overrides: Character vector of known non-indigenous species to apply non-indigenous status regardless of the indigenous values in the data.
use_aoi: sf. Name of sf object for filtering data for indigenous status generation using envClean::filter_geo_range.
df_x: Character. Name of column with x coordinate.
df_y: Character. Name of column with y coordinate.
crs_df: Anything that will return a legitimate crs when passed to the crs attribute of st_transform or st_as_sf.

Value

Dataframe with one row for each taxa with best guess at a common name based on the values in ind_col.