Only queries GBIF for taxa not already in taxonomy_file.
get_taxonomy(
df,
taxa_col = "original_name",
taxonomy_file = tempfile(),
force_new = list(original_name = NULL, timediff = as.difftime(26, units = "weeks")),
remove_taxa = c("BOLD:", "dead", "unverified", "annual herb", "annual grass", "\\?"),
remove_strings = c("\\sx\\s.*", "\\sX\\s.*", "\\s\\-\\-\\s.*",
"\\s\\(.*\\)", "\\ssp\\.$", "\\sssp\\.$", "\\sspec\\.$"),
remove_dead = FALSE,
...
)Dataframe with taxa column.
Character. Name of column with taxa names. Each unique taxa
in this column will appear in the results in a column called original_name
Character. Path to save results to.
List with elements taxa_col and difftime. If
taxonomy_file already exists any taxa_col matches between force_new and
taxonomy_file will be requeried. Likewise any original_name that has not
been searched since difftime will be requeried. Note the name taxa_col
should be as provided as per the taxa_col argument. Set either to NULL
to ignore.
Character. Regular expressions to be matched. Any matches will be filtered before searching. Removes any rows that match.
Character. Regular expressions to be matched. Any matches will be removed from the string before searching. Removes any text that matches, but the row remains.
Arguments passed to rgbif::name_backbone_checklist().
Dataframe. Results from envClean::get_gbif_tax(). Tweaked by column
rank being lowercase and ordered factor as per envClean::lurank. Writes
taxonomy_file and gsub("\\.", "_accepted.", taxonomy_file)
Common (vernacularName) no longer supported here. Use get_gbif_common() on
a downstream result. It may be helpful to keep a usageKey through the
cleaning process for use in getting common names. Part of the reason for
removing that functionality here was the ambiguity of which key to use,
particularly around species vs subspecies.