Skip to contents

The background sampling includes code based on a Geographic Information Systems stack exchange answer by user Spacedman.

Usage

prep_sdm(
  this_taxa = NULL,
  out_dir = FALSE,
  return_val = "path",
  presence,
  pres_crs = 4326,
  pres_x = "long",
  pres_y = "lat",
  pred_limit = TRUE,
  limit_buffer = 0,
  pred_clip = NULL,
  predictors,
  is_env_pred = TRUE,
  max_cells_in_memory = 3e+07,
  terra_options = NULL,
  cat_preds = NULL,
  num_bg = 10000,
  prop_abs = "abs",
  many_p_prop = 2,
  folds = 5,
  spatial_folds = TRUE,
  min_fold_n = 8,
  stretch_value = 10,
  dens_res = 1000,
  save_pngs = FALSE,
  reduce_env = TRUE,
  thresh = 0.9,
  do_gc = FALSE,
  force_new = FALSE
)

Arguments

this_taxa

Character. Name of taxa. Only used to print some messages. Ignored if NULL

out_dir

FALSE or character. If FALSE the result of prep_sdm will be saved to a temporary folder. If character, a file 'prep.rds' will be created at the path defined by out_dir.

return_val

Character: "object" or "path". Both return a named list. In the case of "path" the named list is simply list(prep = out_dir). Will be set to "object" if out_dir is FALSE.

presence

Dataframe of presences with columns pres_x and pres_y.

pres_crs

Anything that will return a legitimate crs when passed to the crs attribute of sf::st_transform() or sf::st_as_sf().

pres_x, pres_y

Character. Name of the columns in presence that have the x and y coordinates

pred_limit

Limit the background points and predictions? Can be TRUE (use presence to generate a minimum convex polygon to use as a limit. Not recommended as the points in presence have usually been filtered to very accurate spatial reliability and thus may be missing a large number of legitimate records); FALSE (the full extent of the predictors will be used); path to existing .parquet to use; or sf object.

limit_buffer

Numeric. Apply this buffer to pred_limit. Only used if pred_limit is TRUE. Passed to the dist argument of sf::st_buffer().

pred_clip

sf. Optional sf to clip the pred_limit back to (e.g. to prevent prediction into ocean).

predictors

Character. Vector of paths to predictor .tif files.

is_env_pred

Logical. Does the naming of the directory and files in predictors follow the pattern required by envRaster::parse_env_tif()?

max_cells_in_memory

Numeric passed to exactextractr::exactextract() argument with the same name. prep_sdm() will be quicker with larger values, but if running on many cores, memory issues are likely. The default of 30000000 is the default for exactextractr::exactextract() at the time of writing.

terra_options

Passed to terra::terraOptions(). e.g. list(memfrac = 0.6)

cat_preds

Character. Vector of predictor names that are character.

num_bg

Numeric. How many background points?

prop_abs

Character. Is num_bg a proportion (prop) of the number of records in presence or an absolute (abs) number?

many_p_prop

Numeric. Ensure the number of background points is at least many_p_prop * number of presences. e.g. If there are more than 5000 presences and num_bg is set at 10000 and many_p_prop is 2, then num_bg will be increased to many_p_prop * nrow(presences)

folds

Numeric. How many folds to use in cross validation? Will be adjusted downwards if number of presences do not support folds * min_fold_n

spatial_folds

Logical. Use spatial folds? Even if TRUE, can resort to non-spatial cv if presences per fold do not meet min_fold_n or there are not enough presences to support more than one fold.

min_fold_n

Numeric. Sets both minimum number of presences, and, by default, the minimum number of presences required for a model.

stretch_value

Numeric. Stretch the density raster to this value.

dens_res

NULL or numeric. Resolution (in metres) of density raster. Set to NULL to use the same resolution as the predictors.

save_pngs

Logical. Save out a .png of the density raster and spatial blocks

reduce_env

Logical. If TRUE, highly correlated and low importance variables will be removed. In the case of highly correlated pairs of variables, only one is removed.

thresh

Numeric. Threshold used to flag highly correlated and low importance variables

do_gc

Logical. Run base::rm(list = ls) and base::gc() at end of function? Useful when running SDMs for many, many taxa, especially if done in parallel.

force_new

Logical. If outputs already exist, should they be remade?

Value

If return_val is "object" a named list. If return_val is "path" a named list list(prep = out_dir). If out_dir is a valid path, the 'full result' (irrespective of return_val) is also saved to fs::path(out_dir, "prep.rds"). The 'full result' is a named list with elements:

  • log:

    • a log of (rough) timings and other information from the process

  • abandoned:

    • Logical indicating if the sdm was abandoned. If abandoned is TRUE, some list elements may not be present

  • presence_ras:

    • tibble with two columns ('x' and 'y') representing unique cell centroids on the predictors at presences supplied in argument presence

  • predict_boundary:

    • sf used to limit the background points and used by predict_sdm() to generate the 'mask'ed output

  • bg_points:

    • sf of cell centroids representing unique cell centroids for background points

  • blocks

    • data.frame with columns:

      • pa: presence (1) or absence/background (0)

      • x and y: cell centroids for each presence and absence

      • block: the spatial block to which the row belongs

      • a column with values for each of predictors at x and y

  • spatial_folds_used:

    • logical indicating if spatial folds were used. This may differ from the spatial_folds argument provided to prep_sdm() if an attempt to use spatial folds failed to meet desired folds and min_fold_n

  • correlated:

    • list with elements as per envModel::reduce_env(), or, if reduce_env is FALSE, a list with elements remove_env which is empty, and env_var, containing the names of all predictors.

Details

Options for managing memory are terra_options, max_cells_in_memory and do_gc.

Examples


  out_dir <- file.path(system.file(package = "envSDM"), "examples")

  data <- file.path(system.file(package = "predicts"), "ex") |>
    fs::dir_ls(regexp = "\\.csv$") |>
    tibble::enframe(name = NULL, value = "path") |>
    dplyr::mutate(taxa = gsub("\\.csv", "", basename(path))
                  , presence = purrr::map(path, rio::import, setclass = "tibble")
                  , presence = purrr::map(presence
                                          , \(x) x |>
                                            dplyr::filter(!is.na(lat)
                                                          , !is.na(lon)
                                                          )
                                          )
                  , taxa_dir = fs::path(out_dir, taxa)
                  , out_mcp = fs::path(taxa_dir, "mcp.parquet")
                  )

  env_dat <- system.file("ex/bio.tif", package = "predicts")


  # mcps --------

  # make a clip boundary so mcps stay terrestrial
  clip <- terra::as.polygons(terra::rast(env_dat)[[1]] > -Inf) |>
    sf::st_as_sf()

  purrr::pwalk(list(data$presence
                   , data$out_mcp
                   )
              , \(x, y) make_mcp(x, y, pres_x = "lon"
                                 , clip = clip
                                 )
              )


  # prep -----------
  # use the just created mcps (this allows using, say, a different spatial reliability threshold for the mcps)

  purrr::pwalk(list(data$taxa
                    , data$taxa_dir
                    , data$presence
                    , data$out_mcp
                    )
               , function(a, b, c, d) prep_sdm(this_taxa = a
                                               , out_dir = b
                                               , presence = c
                                               , pres_x = "lon"
                                               , pres_y = "lat"
                                               , predictors = env_dat
                                               , is_env_pred = FALSE
                                               , pred_limit = d
                                               , limit_buffer = 10000
                                               , dens_res = 1000 # ignored as decimal degrees preds
                                               #, force_new = T
                                               )
               )
#> Warning: Missing `trust` will be set to FALSE by default for RDS in 2.0.0.
#> Warning: Missing `trust` will be set to FALSE by default for RDS in 2.0.0.

  # example of 'prep'
  prep <- rio::import(fs::path(data$taxa_dir[[2]], "prep.rds"))
#> Warning: Missing `trust` will be set to FALSE by default for RDS in 2.0.0.

  names(prep)
#>  [1] "abandoned"          "finished"           "this_taxa"         
#>  [4] "original"           "presence_ras"       "predict_boundary"  
#>  [7] "bg_points"          "blocks"             "spatial_folds_used"
#> [10] "reduce_env"         "log"               

  # env variables to remove prior to SDM
  prep$reduce_env$remove
#> [1] "bio1"  "bio9"  "block" "cell"  "lat"   "lon"   "pa"    "x"     "y"    

  # Density raster
  dens_ras <- terra::rast(fs::path(data$taxa_dir[[2]], "density.tif")) %>%
    terra::mask(clip) %>%
    terra::classify(matrix(c(0, NA), ncol = 2))

  if(require("tmap")) {

    m <-
      tm_shape(clip) +
      tm_borders() +
      tm_shape(dens_ras) +
      tm_raster(title = "Background point density"
                , breaks = c(0, 2, 4, 6, 8, 10)
                , drop.levels = TRUE
                , colorNA = NULL
                ) +
      tm_legend(outside = TRUE) +
      tm_compass() +
      tm_scale_bar() +
      tm_layout(main.title = paste0("Background point density for ",  prep$this_taxa))

    m

    presences <- prep$blocks %>%
      dplyr::filter(pa == 1) %>%
      sf::st_as_sf(coords = c("x", "y")
                   , crs = 4326
                   )

    m +
      tm_shape(presences) +
        tm_dots(col = "blue")

  }
#> Loading required package: tmap
#> Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
#> remotes::install_github('r-tmap/tmap')
#> Scale bar set for latitude km and will be different at the top and bottom of the map.


  # Background points
  if(require("tmap")) {

    blocks <- prep$blocks %>%
      dplyr::mutate(blocks = factor(block)) %>% # for map
      sf::st_as_sf(coords = c("x", "y")
                   , crs = sf::st_crs(terra::rast(env_dat[[1]]))
                   )


    tm_shape(clip) +
      tm_borders() +
      tm_shape(blocks) +
      tm_dots(title = "Background points\n coloured by block"
              , col = "block"
              ) +
      tm_legend(outside = TRUE) +
      tm_compass() +
      tm_scale_bar() +
      tm_layout(main.title = paste0("Background points for ",  prep$this_taxa))


  }
#> Scale bar set for latitude km and will be different at the top and bottom of the map.