Finds indigenous status for taxa based on highest frequency of occurrence in data using make_attribute. Compared to a straight run of make_attribute, this function first attempts to find an indigenous status for a taxa based on values found in a primary data source (if supplied). Taxa with no indigenous values in the primary data source are then given an indigenous status based on all other data sources. In addition, flags and overrides indicating non-indigenous taxa can also be provided to overcome errors in the data, and the indigenous calculation can be restricted to an area of interest (aoi).

make_ind(
  df,
  taxa_col = "original_name",
  ind_col = "ind",
  taxonomy,
  max_guess = "species",
  context = "kingdom",
  remove_strings = c("n/a", "''", "NA", "^\\s*$"),
  primary_data_source = NULL,
  data_source_col = NULL,
  non_ind_terms = NULL,
  common_df = NULL,
  genus_overrides = NULL,
  species_overrides = NULL,
  use_aoi = NULL,
  df_x = "long",
  df_y = "lat",
  crs_df = 4326
)

Arguments

df

Dataframe with taxa_col and ind_col, and optionally x, y coordinate columns if use_aoi == TRUE.

taxa_col

Character name of column in df that was passed to get_taxonomy as taxa_col.

ind_col

Character name of column in df that contains the common names.

taxonomy

List resulting from call to make_taxonomy().

max_guess

Character. If indigenous values are not available for taxa, try guessing from values up to max_guess level of taxonomic hierarchy. See lurank. Note it does not make sense to provide a rank here that is lower than the target_rank provided to make_taxonomy when taxonomy was made.

context

Any other columns in df to maintain throughout summarising.

remove_strings

Character. Any values in ind_col to exclude.

primary_data_source

Character value from data_source_col indicating the name of the primary data source.

data_source_col

Character name of the data source column with the name of the primary data source.

non_ind_terms

Character vector of non-indigenous terms found in common names to use for applying non-indigenous status regardless of the indigenous values in the data.

common_df

Data frame containing taxa_col and 'common' field with common names for each taxa to use for applying non-indigenous status with the non_ind_terms.

genus_overrides

Character vector of known non-indigenous genera to apply non-indigenous status to all species within those genera regardless of the indigenous values in the data.

species_overrides

Character vector of known non-indigenous species to apply non-indigenous status regardless of the indigenous values in the data.

use_aoi

sf. Name of sf object for filtering data for indigenous status generation using envClean::filter_geo_range.

df_x

Character. Name of column with x coordinate.

df_y

Character. Name of column with y coordinate.

crs_df

Anything that will return a legitimate crs when passed to the crs attribute of st_transform or st_as_sf.

Value

Dataframe with one row for each taxa with best guess at a common name based on the values in ind_col.