The goal of envClean
is to help clean large amounts of unstructured, biological data (for further analysis elsewhere).
Not all functions will be relevant to all projects.
If a typical species list from a typical observer is required, then make_effort_mod()
may be useful to filter out excessively rich or depauperate lists.
If many data sources are included in the incoming data, taxonomic alignment via make_taxonomy()
is likely to be required. If those data sources are likely to contain duplicates, using taxonomic, geographic and temporal bins may be the easiest way to ensure duplicates are removed.
Some functions could be considered ‘experimental’. add_cover()
uses principal components analysis on environmental variables to generate a best guess for percentage cover where some records are missing that attribute.
envClean
is not on CRAN.
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("Acanthiza/envClean")
Load envClean
These are unrelated to envClean
and are possibly much better documented:
envClean
The following functions and data sets are provided in envClean
. See https://acanthiza.github.io/envClean/ for more examples.
object | class | description |
---|---|---|
envClean::add_cover() |
function | Generate best guess of cover for each taxa*context |
envClean::add_height() |
function | Generate best guess of height for each taxa*context |
envClean::add_lifeform() |
function | Generate best guess of lifeform for each taxa*context |
envClean::aoi |
sf and data.frame | Simple feature to define a geographic area of interest. |
envClean::bin_date() |
function | Add temporal bins to a dataframe |
envClean::bin_geo_rel() |
function | Add a spatial reliability column, binned to contexts |
envClean::bin_taxa() |
function | Add code{taxa} column |
envClean::cleaning_summary() |
function | Describte change in taxa, records, visits and sites between cleaning steps |
envClean::cleaning_text() |
function | Write a sentence describing change in taxa, records, visits and sites between |
envClean::clean_quotes() |
function | Remove any ’ or ” from specified columns in a dataframe |
envClean::filter_counts() |
function | Filter any context with less instances than a threshold value |
envClean::filter_geo_range() |
function | Filter a dataframe with e/n or lat/long to an area of interest polygon (sf) |
envClean::filter_prop() |
function | Filter taxa recorded in less than x percent of contexts |
envClean::filter_taxa() |
function | Clean/Tidy to one row per taxa*Visit |
envClean::filter_text_col() |
function | Filter a dataframe column on character string(s) |
envClean::find_outliers() |
function | Find local outliers |
envClean::find_taxa() |
function | Find how taxa changed through the cleaning/filtering/tidying process |
envClean::flag_local_outliers() |
function | Find local outliers |
envClean::flor_all |
data.frame | Example of data combined from several data sources. |
envClean::get_taxonomy() |
function | Get GBIF backbone taxonomy |
envClean::luclean |
tbl_df, tbl and data.frame | Dataframe of cleaning steps |
envClean::lurank |
tbl_df, tbl and data.frame | Dataframe of taxonomic ranks |
envClean::make_attribute() |
function | Title |
envClean::make_con_status() |
function | Make conservation status from existing status codes |
envClean::make_cover() |
function | Make a single (numeric, proportion) cover column from different sorts of |
envClean::make_effort_mod() |
function | Distribution of credible values for taxa richness. |
envClean::make_effort_mod_pca() |
function | Model the effect of principal components on taxa richness. |
envClean::make_env_pca() |
function | Principal components analysis and various outputs from environmental data |
envClean::make_gbif_taxonomy() |
function | Make taxonomy lookups |
envClean::make_ind_status() |
function | Make indigenous status lookup |
envClean::make_lifeform() |
function | Get unique lifeform across taxa, perhaps including further context |
envClean::make_subspecies_col() |
function | Make a subspecies column |
envClean::make_taxonomy() |
function | Get taxonomy via code{galah::taxa_search()} |
envClean::make_unmatched_overrides() |
function | Attempt to find a taxa for names with no match in code{galah::search_taxa()} |
envClean::rec_vis_sit_tax() |
function | How many records, visits, sites and taxa in a dataframe |
envClean::reduce_geo_rel() |
function | Reduce data frame to a single spatial reliability within a context |
envClean::taxonomy_overrides |
tbl_df, tbl and data.frame | Manual taxonomic overrides |
envClean::try_name_via_gbif() |
function | Attempt to find an unmatched scientific name using GBIF Backbone Taxonomy |