Run random forest, returning only diagnostic values.
Source:R/make_rf_diagnostics.R
make_rf_diagnostics.Rd
Run random forest, returning only diagnostic values.
Usage
make_rf_diagnostics(
env_df,
clust_col = "cluster",
folds = 3L,
reps = 5L,
down_sample = TRUE,
range_m = as.integer(seq(20000L, 100000L, length.out = reps)),
set_min = FALSE,
mlr3_cv_method = "repeated_cv",
coords = c("long", "lat"),
crs_df = 4283
)
Arguments
- env_df
Dataframe with clusters and environmental columns.
- clust_col
Character. Name of column with cluster membership.
- folds
Numeric. How many folds to use in cross-validation?
- reps
Numeric. How many repeats of cross-validation?
- down_sample
Logical. If TRUE, the
sample.fraction
argument toranger::ranger()
is set to the minimum number of sites in any one cluster divided by the total number of sites.- range_m
Numeric. The distance in metres (regardless of the unit of the reference system of the input data) for block size(s) if using
blockCV::spatialBlock()
. If reps > 1, an equivalent number of range_m values are required to ensure the folds are different between repetitions.repeated_spcv_block
.- set_min
FALSE or numeric. If numeric, classes in
clust_col
with less thanset_min
cases will be filtered.- mlr3_cv_method
Method to use with
mlr3::rsmp()
(as character, e.g. "repeated_cv" or "repeated_spcv_block".- coords
Character vector of length 2. Names of columns in
env_df
with x and y coordinates.- crs_df
Coordinate reference system for
coords
. Passed to thecrs
argument ofsf::st_as_sf()
.