Use a set of (continuous) columns to choose a good set of rows

make_metric_df(
  df,
  mets_df = tibble::tibble(metric = "av_clust_size", high_good = TRUE, clust_sum = TRUE,
    level = "clustering"),
  context = c("method", "groups"),
  mets_col = "summary_mets",
  summarise_method = median,
  scale = FALSE,
  top_thresh = 0.25,
  best_thresh = 5,
  level = c("across", "within")
)

Arguments

df

Dataframe with columns over which to find good rows

mets_df

Dataframe mapping the name of possible metrics to cases (columns) in which to use that metric.

context

Character. Name of columns in df that define context.

mets_col

Character. Name of mets_df column to use in this instance.

summarise_method

Character. Name of method to use in summarising if there is more than one row per context.

scale

Logical. If true, all metrics will be rescale 0 ('worst') to 1 ('best').

top_thresh

Numeric specifying the proportion of rows considered 'top'.

best_thresh

Numeric specifying the absolute number of rows considered 'best'.

level

Either 'across' or 'within'. If the latter, only metrics that are set up to work 'within' clusters (rather than 'across' clusterings) are used.