Skip to contents

Run random forest, returning only diagnostic values.

Usage

make_rf_diagnostics(
  env_df,
  clust_col = "cluster",
  folds = 3L,
  reps = 5L,
  down_sample = TRUE,
  range_m = as.integer(seq(20000L, 100000L, length.out = reps)),
  set_min = FALSE,
  mlr3_cv_method = "repeated_cv",
  coords = c("long", "lat"),
  crs_df = 4283
)

Arguments

env_df

Dataframe with clusters and environmental columns.

clust_col

Character. Name of column with cluster membership.

folds

Numeric. How many folds to use in cross-validation?

reps

Numeric. How many repeats of cross-validation?

down_sample

Logical. If TRUE, the sample.fraction argument to ranger::ranger() is set to the minimum number of sites in any one cluster divided by the total number of sites.

range_m

Numeric. The distance in metres (regardless of the unit of the reference system of the input data) for block size(s) if using blockCV::spatialBlock(). If reps > 1, an equivalent number of range_m values are required to ensure the folds are different between repetitions. repeated_spcv_block.

set_min

FALSE or numeric. If numeric, classes in clust_col with less than set_min cases will be filtered.

mlr3_cv_method

Method to use with mlr3::rsmp() (as character, e.g. "repeated_cv" or "repeated_spcv_block".

coords

Character vector of length 2. Names of columns in env_df with x and y coordinates.

crs_df

Coordinate reference system for coords. Passed to the crs argument of sf::st_as_sf().