Tune, and evaluate, species distribution models
Usage
tune_sdm(
prep,
out_dir = FALSE,
return_val = "path",
algo = c("all", "maxnet", "envelope", "rf"),
max_corr = list(maxnet = 0.7, envelope = 0.9, rf = 0.99),
fc = "auto_feature",
limit_p = FALSE,
rm = seq(1, 6, 0.5),
trees = c(999),
mtry = TRUE,
limit_spat_mtry = 4,
nodesize = c(1, 2),
keep_model = FALSE,
best_run = FALSE,
metrics_df = envSDM::sdm_metrics,
use_metrics = c("auc_po", "CBI_rescale", "IMAE"),
do_gc = FALSE,
force_new = FALSE,
...
)Arguments
- prep
Character or named list. If character, the path to an existing
prep.rds. Otherwise, the result of a call to prep_sdm with return_val = "object"- out_dir
FALSE or character. If FALSE the result of tune_sdm will be saved to a temporary folder. If character, a file 'tune.rds' will be created at the path defined by out_dir.
- return_val
Character: "object" or "path". Both return a named list. In the case of "path" the named list is simply list(tune = out_dir). Will be set to "object" if
out_diris FALSE.- algo
Character. Name of algorithm to use.
- max_corr
Named list. Names of list elements must match algorithms being used. For each pair of predictor variables correlated at or above
max_corrone will be dropped usingcaret::findCorrelation().- fc
Character. Used to generate levels of
classesargument tomaxnet::maxnet()that are tuned.- limit_p
TRUE,FALSEor number of predictor variables above which to limit the use ofpin the classes argument used inmaxnet::maxnet(). Useful with many predictor variables when it becomes unwieldy to generate interactions for all predictors.- rm
Numeric. Used to generate levels of
regmultargument tomaxnet::maxnet()that are tuned.- trees
Used to generate the levels of
ntreeargument torandomForest::randomForest()that are tuned.TRUE(tune with defaulttrees),FALSE(don't tunetrees) or numeric (thetreesvalues to tune with).- mtry
Used to generate the levels of
mtryargument torandomForest::randomForest()that are tuned.TRUE(tune with sensible guesses formtry),FALSE(only use defaultrandomForest::randomForest()mtry) or numeric (themtryvalues to tune with).- limit_spat_mtry
Numeric. If
mtryisTRUEand if using spatial cross validation, the values ofmtryto tune will be limited to less than or equal tolimit_spat_mtry.- nodesize
Used to generate the levels of
nodesizeargument torandomForest::randomForest()that are tuned.TRUE(tune with defaultnodesize),FALSE(only use defaultrandomForest::randomForest()nodesize) or numeric (thenodesizevalues to tune with).- keep_model
Logical. If
TRUEthe model results will be appended as a list column in the returned tibble (as columnm)- best_run
Logical. If
TRUEthis alters the behaviour of thetune_sdm()by, well, not tuning. :). Sets all folds to the same value so no cross-validation.- metrics_df
Dataframe. Defines which metrics to use when deciding on 'good' SDMs.
- use_metrics
Character. Vector of values in metrics_df$metric to use when finding the 'best' model.
- do_gc
Logical. Run
base::rm(list = ls)andbase::gc()at end of function? Useful when running SDMs for many, many taxa, especially if done in parallel.- force_new
Logical. If outputs already exist, should they be remade?
- ...
Passed to
evaluate_sdm(). e.g. thresholds for use inpredicts::pa_evaluate()(astrargument, although if used, the values of thethresholdselement of thepa_ModelEvaluationobject returned bypredicts::pa_evaluate()will be limited to the values intr).
Value
If return_val is "object" a named list. If return_val is "path"
a path to the saved file. If out_dir is a valid path, the 'full
result' (irrespective of return_val) is also saved to
fs::path(out_dir, "prep.rds"). The 'full result' is a named list with
elements:
Examples
out_dir <- file.path(system.file(package = "envSDM"), "examples")
# setup -------
data <- readRDS(fs::path(out_dir, "data.rds"))
future::plan(future::multisession())
furrr::future_walk(data$out_dir
, \(x) tune_sdm(prep = fs::path(x, "prep.rds")
, out_dir = x
, fc = "lq"
, rm = c(1, 2)
, trees = 500
, mtry = c(1:3)
, nodesize = c(1, 3)
, limit_p = 3
, use_metrics = c("auc_po", "CBI_rescale", "IMAE")
#, force_new = TRUE
)
)
future::plan(future::sequential())
# which tune args were 'best' using combo?
# BUT, possibly spurious comparison as, between rows, the models are not all built on the same data! Same presences though.
data |>
dplyr::mutate(tuned = purrr::map_lgl(tune, file.exists)) |>
dplyr::filter(tuned) |>
dplyr::mutate(tune = purrr::map(tune, rio::import, trust = TRUE)
, tune_mean = purrr::map(tune, "tune_mean")
) |>
tidyr::unnest(cols = c(tune_mean)) |>
dplyr::filter(combo == max(combo), .by = taxa) |> # used 'combo' to determine 'best' as default in tune_sdm
dplyr::select(taxa, algo, hold_prop, stretch, spatial_folds, tidyselect::where(is.numeric))
#> # A tibble: 2 × 25
#> taxa algo hold_prop stretch spatial_folds tunes reps rm trees nodesize
#> <chr> <chr> <dbl> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 chg rf 0.3 10 FALSE 25 5 NA 500 1
#> 2 mjs rf 0 10 FALSE 25 5 NA 500 1
#> # ℹ 15 more variables: mtry <int>, spatial_tunes <int>,
#> # non_spatial_tunes <dbl>, max_spec_sens <dbl>, no_omission <dbl>,
#> # equal_prevalence <dbl>, equal_sens_spec <dbl>, auc_po <dbl>, ODP <dbl>,
#> # or10 <dbl>, CBI <dbl>, CBI_rescale <dbl>, IMAE <dbl>, auc_po_flexsdm <dbl>,
#> # combo <dbl>