Package 'rfPermute'

Title:	Estimate Permutation p-Values for Random Forest Importance Metrics
Description:	Estimate significance of importance metrics for a Random Forest model by permuting the response variable. Produces null distribution of importance metrics for each predictor variable and p-value of observed. Provides summary and visualization functions for 'randomForest' results.
Authors:	Eric Archer [aut, cre]
Maintainer:	Eric Archer <[email protected]>
License:	GPL (>= 2)
Version:	2.5.2
Built:	2025-01-23 05:41:43 UTC
Source:	https://github.com/ericarcher/rfpermute

Help Index

Balanced Sample Size
Case Predictions
Class Priors
Clean Random Forest Input Data
Combine rfPermute objects
Confusion Matrix
Extract rfPermute Importance Scores and p-values.
Percent Correctly Classified
Plot Important Predictor Distribution
Plot Inbag distribution
Plot Random Forest Importance Null Distributions
Plot Predicted Probabilities
Plot Random Forest Proximity Scores
Plot Trace
Plot Vote Distribution
Estimate Permutation p-values for Random Forest Importance Metrics
rfPermute package
Diagnostics of rfPermute or randomForest models.
Symbiodinium type metabolite profiles

Create a vector of balanced (equal) sample sizes for use in the sampsize argument of rfPermute or randomForest for a classification model. The values are derived from a percentage of the smallest class sample size.

Usage

balancedSampsize(y, pct = 0.5)
balancedSampsize(y, pct = 0.5)

Arguments

`y`	character, numeric, or factor vector containing classes of response variable. Values will be treated as unique for computing class frequencies.
`pct`	percent of smallest class frequency for `sampsize` vector.

Value

a named vector of sample sizes as long as the number of classes.

Author(s)

Eric Archer [email protected]

Examples

data(mtcars)

# A balanced model with default half of smallest class size
sampsize_0.5 <- balancedSampsize(mtcars$am)
sampsize_0.5

rfPermute(factor(am) ~ ., mtcars, replace = FALSE, sampsize = sampsize_0.5)

# A balanced model with one quarter of smallest class size
sampsize_0.25 <- balancedSampsize(mtcars$am, pct = 0.25)
sampsize_0.25

rfPermute(factor(am) ~ ., mtcars, replace = FALSE, sampsize = sampsize_0.25)


data(mtcars)

# A balanced model with default half of smallest class size
sampsize_0.5 <- balancedSampsize(mtcars$am)
sampsize_0.5

rfPermute(factor(am) ~ ., mtcars, replace = FALSE, sampsize = sampsize_0.5)

# A balanced model with one quarter of smallest class size
sampsize_0.25 <- balancedSampsize(mtcars$am, pct = 0.25)
sampsize_0.25

rfPermute(factor(am) ~ ., mtcars, replace = FALSE, sampsize = sampsize_0.25)

Case Predictions

Description

Construct a data frame of case predictions for training data along with vote distributions.

Usage

casePredictions(x)
casePredictions(x)

Arguments

`x`	a `rfPermte` or `randomForest` model object.

Value

A data frame containing columns of original and predicted cases, whether they were correctly classified, and vote distributions among cases.

Author(s)

Eric Archer [email protected]

Examples

library(randomForest)
data(mtcars)

rf <- randomForest(factor(am) ~ ., mtcars)

cp <- casePredictions(rf)
cp

library(randomForest)
data(mtcars)

rf <- randomForest(factor(am) ~ ., mtcars)

cp <- casePredictions(rf)
cp

Class Priors

Description

Compute the class classification priors and class-specific model binomial p-values using these priors as null hypotheses.

Usage

classPriors(x, sampsize)
classPriors(x, sampsize)

Arguments

`x`	a `rfPermute` or `randomForest` model object.
`sampsize`	the vector of sample sizes used to construct the model. If provided, must have length equal to number of classes. If set to `NULL`, priors will be computed assuming empirical sample sizes.

Author(s)

Eric Archer [email protected]

Examples

library(randomForest)
data(mtcars)

# random sampling with replacement
rf <- randomForest(factor(am) ~ ., mtcars)
confusionMatrix(rf)
classPriors(rf, NULL)

# balanced design
sampsize <- balancedSampsize(mtcars$am)
rf <- randomForest(factor(am) ~ ., mtcars, replace = FALSE, sampsize = sampsize)
confusionMatrix(rf)
classPriors(rf, sampsize)

library(randomForest)
data(mtcars)

# random sampling with replacement
rf <- randomForest(factor(am) ~ ., mtcars)
confusionMatrix(rf)
classPriors(rf, NULL)

# balanced design
sampsize <- balancedSampsize(mtcars$am)
rf <- randomForest(factor(am) ~ ., mtcars, replace = FALSE, sampsize = sampsize)
confusionMatrix(rf)
classPriors(rf, sampsize)

Clean Random Forest Input Data

Description

Removes cases for a Random Forest classification model with missing data and predictors that are constant.

Usage

cleanRFdata(x, y, data, max.levels = 30)
cleanRFdata(x, y, data, max.levels = 30)

Arguments

`x`	columns used as predictor variables as character or numeric vector.
`y`	column used as response variable as character or numeric.
`data`	data.frame containing `x` and `y` columns.
`max.levels`	maximum number of levels in response variable `y`.

Value

a data.frame containing cleaned data.

Author(s)

Eric Archer [email protected]

Combine rfPermute objects

Description

Combines two or more ensembles of rfPermute objects into one, combining randomForest results, null distributions, and re-calculating p-values.

Usage

combineRP(...)
combineRP(...)

Arguments

...

two or more objects of class rfPermute, to be combined into one.

Author(s)

Eric Archer [email protected]

Examples

data(iris)
rp1 <- rfPermute(
  Species ~ ., iris, ntree = 50, norm.votes = FALSE, nrep = 100, num.cores = 1
)
rp2 <- rfPermute(
  Species ~ ., iris, ntree = 50, norm.votes = FALSE, nrep = 100, num.cores = 1
)
rp3 <- rfPermute(
  Species ~ ., iris, ntree = 50, norm.votes = FALSE, nrep = 100, num.cores = 1
)
rp.all <- combineRP(rp1, rp2, rp3)
rp.all

plotNull(rp.all) 

data(iris)
rp1 <- rfPermute(
  Species ~ ., iris, ntree = 50, norm.votes = FALSE, nrep = 100, num.cores = 1
)
rp2 <- rfPermute(
  Species ~ ., iris, ntree = 50, norm.votes = FALSE, nrep = 100, num.cores = 1
)
rp3 <- rfPermute(
  Species ~ ., iris, ntree = 50, norm.votes = FALSE, nrep = 100, num.cores = 1
)
rp.all <- combineRP(rp1, rp2, rp3)
rp.all

plotNull(rp.all)

Confusion Matrix

Description

Generate a confusion matrix for Random Forest classification models with error rates translated into percent correctly classified, and columns for confidence intervals added.

Usage

confusionMatrix(x, conf.level = 0.95, threshold = NULL)

plotConfMat(x, title = NULL, plot = TRUE)
confusionMatrix(x, conf.level = 0.95, threshold = NULL)

plotConfMat(x, title = NULL, plot = TRUE)

Arguments

`x`	a `rfPermute` or `randomForest` model object.
`conf.level`	confidence level for the `binom.test` confidence interval
`threshold`	threshold to test observed classification probability against. Should be a number between 0 and 1. If not `NULL`, the output matrix will have extra columns giving the one-tailed probability that the true correct classification is >= `threshold`.
`title`	a title for the plot.
`plot`	display the plot?

Author(s)

Eric Archer [email protected]

Examples

library(randomForest)
data(mtcars)

rf <- randomForest(factor(am) ~ ., mtcars)
confusionMatrix(rf)

confusionMatrix(rf, conf.level = 0.75)

confusionMatrix(rf, threshold = 0.7)
confusionMatrix(rf, threshold = 0.8)
confusionMatrix(rf, threshold = 0.95)

library(randomForest)
data(mtcars)

rf <- randomForest(factor(am) ~ ., mtcars)
confusionMatrix(rf)

confusionMatrix(rf, conf.level = 0.75)

confusionMatrix(rf, threshold = 0.7)
confusionMatrix(rf, threshold = 0.8)
confusionMatrix(rf, threshold = 0.95)

Extract rfPermute Importance Scores and p-values.

Description

The importance function extracts a matrix of the observed importance scores and p-values from the object produced by a call to rfPermute. plotImportance produces a visualization of importance scores as either a barchart or heatmap.

Usage

## S3 method for class 'rfPermute'
importance(x, scale = TRUE, sort.by = NULL, decreasing = TRUE, ...)

plotImportance(
  x,
  plot.type = c("bar", "heatmap"),
  imp.type = NULL,
  scale = TRUE,
  sig.only = FALSE,
  alpha = 0.05,
  n = NULL,
  ranks = TRUE,
  xlab = NULL,
  ylab = NULL,
  main = NULL,
  size = 3,
  plot = TRUE
)
## S3 method for class 'rfPermute'
importance(x, scale = TRUE, sort.by = NULL, decreasing = TRUE, ...)

plotImportance(
  x,
  plot.type = c("bar", "heatmap"),
  imp.type = NULL,
  scale = TRUE,
  sig.only = FALSE,
  alpha = 0.05,
  n = NULL,
  ranks = TRUE,
  xlab = NULL,
  ylab = NULL,
  main = NULL,
  size = 3,
  plot = TRUE
)

Arguments

`x`	for `importance`, an object produced by a call to `rfPermute`. For `plotImportance`, either a `rfPermute` or `randomForest` model object. If the latter, it must have been run with `importance = TRUE`.
`scale`	for permutation based measures, should the measures be divided their "standard errors"?
`sort.by`	character vector giving the importance metric(s) or p-values to sort by. If `NULL`, defaults to `"MeanDecreaseAccuracy"` for classification models and `"%IncMSE"` for regression models.
`decreasing`	logical. Should the sort order be increasing or decreasing?
`...`	arguments to be passed to and from other methods.
`plot.type`	plot importances as a `bar` chart or `heatmap`?
`imp.type`	character vector listing which importance measures to plot. Can be class names (for classification models) or names of overall importance measures (e.g., "MeanDecreaseAccuracy").
`sig.only`	Plot only the significant (<= `alpha`) predictors?
`alpha`	a number specifying the critical alpha for identifying predictors with importance scores significantly different from random. This parameter is only relevant if `rf` is a `rfPermute` object with p-values. Importance measures with p-values less than or equal to `alpha` will be denoted in barcharts in red and in the heatmap by a white diamond. If set to `NULL`, significance is not denoted.
`n`	plot `n` most important predictors.
`ranks`	plot ranks instead of actual importance scores?
`xlab`, `ylab`	labels for the x and y axes.
`main`	main title for plot.
`size`	a value specifying the size of the significance diamond in the heatmap if the p-value <= `alpha`.
`plot`	display the plot?

Author(s)

Eric Archer [email protected]

Examples

data(mtcars)

# A classification model classifying cars to manual or automatic transmission 
am.rp <- rfPermute(factor(am) ~ ., mtcars, ntree = 100, nrep = 50)
  
imp.scaled <- importance(am.rp, scale = TRUE)
imp.scaled

# plot scaled importance scores
plotImportance(am.rp, scale = TRUE)

# plot unscaled and only significant scores
plotImportance(am.rp, scale = FALSE, sig.only = TRUE)

data(mtcars)

# A classification model classifying cars to manual or automatic transmission 
am.rp <- rfPermute(factor(am) ~ ., mtcars, ntree = 100, nrep = 50)
  
imp.scaled <- importance(am.rp, scale = TRUE)
imp.scaled

# plot scaled importance scores
plotImportance(am.rp, scale = TRUE)

# plot unscaled and only significant scores
plotImportance(am.rp, scale = FALSE, sig.only = TRUE)

Percent Correctly Classified

Description

For classification models, calculate the percent of individuals correctly classified in a specified percent of trees in the forest.

Usage

pctCorrect(x, pct = c(seq(0.8, 0.95, 0.05), 0.99))
pctCorrect(x, pct = c(seq(0.8, 0.95, 0.05), 0.99))

Arguments

`x`	a `rfPermte` or `randomForest` model object.
`pct`	vector of minimum percent of trees voting for each class. Can be `0:1` or `0:100`.

Value

a matrix giving the percent of individuals correctly classified in each class and overall for each threshold value specified in pct.

Author(s)

Eric Archer [email protected]

Examples

library(randomForest)
data(mtcars)

rf <- randomForest(factor(am) ~ ., mtcars, importance = TRUE)
pctCorrect(rf)

library(randomForest)
data(mtcars)

rf <- randomForest(factor(am) ~ ., mtcars, importance = TRUE)
pctCorrect(rf)

Plot Important Predictor Distribution

Description

For classification models, plot distribution of predictor variables on classes sorted by order of importance in model.

Usage

plotImpPreds(
  x,
  df,
  class.col,
  imp.type = NULL,
  max.vars = 16,
  scale = TRUE,
  size = 1,
  point.alpha = 0.2,
  violin.alpha = 0.5,
  plot = TRUE
)
plotImpPreds(
  x,
  df,
  class.col,
  imp.type = NULL,
  max.vars = 16,
  scale = TRUE,
  size = 1,
  point.alpha = 0.2,
  violin.alpha = 0.5,
  plot = TRUE
)

Arguments

`x`	a `rfPermute` or `randomForest` model object.
`df`	data.frame with predictors in `rf` model.
`class.col`	response column name in `df`.
`imp.type`	character string representing importance type to use for sorting predictors.
`max.vars`	number of variables to plot (from most important to least).
`scale`	For permutation based importance measures, should they be divided their "standard errors"?
`size`, `point.alpha`, `violin.alpha`	controls size of points and alpha values (transparency) for points and violin plots.
`plot`	display the plot?

Value

the ggplot2 object is invisibly returned.

Note

If the model in x is from randomForest and was run with importance = TRUE, then 'MeanDecreaseAccuracy' is used as the default importance measure for sorting. Otherwise, 'MeanDecreaseGini' is used.

Author(s)

Eric Archer [email protected]

Examples

library(randomForest)
data(mtcars)

df <- mtcars
df$am <- factor(df$am)

rf <- randomForest(am ~ ., df, importance = TRUE)
plotImpPreds(rf, df, "am")

library(randomForest)
data(mtcars)

df <- mtcars
df$am <- factor(df$am)

rf <- randomForest(am ~ ., df, importance = TRUE)
plotImpPreds(rf, df, "am")

Plot Inbag distribution

Description

Plot distribution of the fraction of trees that samples were inbag in the Random Forest model.

Usage

plotInbag(x, bins = 10, replace = TRUE, sampsize = NULL, plot = TRUE)
plotInbag(x, bins = 10, replace = TRUE, sampsize = NULL, plot = TRUE)

Arguments

`x`	a `rfPermute` or `randomForest` model object..
`bins`	number of bins in histogram.
`replace`	was sampling done with or without replacement?
`sampsize`	sizes of samples drawn. Either a single value or vector of sample sizes as long as the number of classes.
`plot`	display the plot?

Value

the ggplot2 object is invisibly returned.

Note

Red vertical lines on the plot denote the expected inbag rate(s). These rates are based on the values of replace and sampsize supplied. If not specified, they are set to the randomForest defaults. If this is not the same as the arguments used to run the model, there will be a mismatch in the location of these indicator lines and the inbag frequency distribution.

Author(s)

Eric Archer [email protected]

Examples

library(randomForest)
data(mtcars)

sampsize = c(5, 5)

rf <- randomForest(factor(am) ~ ., data = mtcars, ntree = 10)
plotInbag(rf)

rf <- randomForest(factor(am) ~ ., data = mtcars, ntree = 1000)
plotInbag(rf)

rf <- randomForest(factor(am) ~ ., data = mtcars, ntree = 10000)
plotInbag(rf)

library(randomForest)
data(mtcars)

sampsize = c(5, 5)

rf <- randomForest(factor(am) ~ ., data = mtcars, ntree = 10)
plotInbag(rf)

rf <- randomForest(factor(am) ~ ., data = mtcars, ntree = 1000)
plotInbag(rf)

rf <- randomForest(factor(am) ~ ., data = mtcars, ntree = 10000)
plotInbag(rf)

Plot Random Forest Importance Null Distributions

Description

Plot the Random Forest null distributions importance metrics, observed values, and p-values for each predictor variable from the object produced by a call to rfPermute.

Usage

plotNull(
  x,
  preds = NULL,
  imp.type = NULL,
  scale = TRUE,
  plot.type = c("density", "hist"),
  plot = TRUE
)
plotNull(
  x,
  preds = NULL,
  imp.type = NULL,
  scale = TRUE,
  plot.type = c("density", "hist"),
  plot = TRUE
)

Arguments

`x`	An object produced by a call to `rfPermute`.
`preds`	a character vector of predictors to plot. If `NULL`, then all predictors are plotted.
`imp.type`	A character vector giving the importance metric(s) to plot.
`scale`	Plot importance measures scaled (divided by) standard errors?
`plot.type`	type of plot to produce: `"density"` for smoothed density plot, or `"hist"` for histogram.
`plot`	display the plot?

Details

The function will generate an plot for each predictor, with facetted importance metrics. The vertical red line shows the observed importance score and the _p_-value is given in the facet label.

Value

A named list of the ggplot figures produced is invisibly returned.

Author(s)

Eric Archer [email protected]

Examples

# A regression model using the ozone example
data(airquality)
ozone.rp <- rfPermute(
  Ozone ~ ., data = airquality, ntree = 100, 
  na.action = na.omit, nrep = 50, num.cores = 1
)
  
# Plot the null distributions and observed values.
plotNull(ozone.rp) 

# A regression model using the ozone example
data(airquality)
ozone.rp <- rfPermute(
  Ozone ~ ., data = airquality, ntree = 100, 
  na.action = na.omit, nrep = 50, num.cores = 1
)
  
# Plot the null distributions and observed values.
plotNull(ozone.rp)

Plot Predicted Probabilities

Description

Plot histogram of assignment probabilities to predicted class. This is used for determining if the model differentiates between correctly and incorrectly classified samples in terms of how strongly they are classified.

Usage

plotPredictedProbs(x, bins = 30, plot = TRUE)
plotPredictedProbs(x, bins = 30, plot = TRUE)

Arguments

`x`	a `rfPermute` or `randomForest` model object.
`bins`	number of bins in histogram. Defaults to number of samples / 5.
`plot`	display the plot?

Value

the ggplot2 object is invisibly returned.

Author(s)

Eric Archer [email protected]

Examples

library(randomForest)
data(mtcars)

rf <- randomForest(factor(am) ~ ., mtcars)
plotPredictedProbs(rf, bins = 20)

library(randomForest)
data(mtcars)

rf <- randomForest(factor(am) ~ ., mtcars)
plotPredictedProbs(rf, bins = 20)

Plot Random Forest Proximity Scores

Description

Create a plot of Random Forest proximity scores using multi-dimensional scaling.

Usage

plotProximity(
  x,
  dim.x = 1,
  dim.y = 2,
  class.cols = NULL,
  legend.type = c("legend", "label", "none"),
  legend.loc = c("top", "bottom", "left", "right"),
  point.size = 2,
  circle.size = 8,
  circle.border = 1,
  group.type = c("ellipse", "hull", "contour", "none"),
  group.alpha = 0.3,
  ellipse.level = 0.95,
  n.contour.grid = 100,
  label.size = 4,
  label.alpha = 0.7,
  plot = TRUE
)
plotProximity(
  x,
  dim.x = 1,
  dim.y = 2,
  class.cols = NULL,
  legend.type = c("legend", "label", "none"),
  legend.loc = c("top", "bottom", "left", "right"),
  point.size = 2,
  circle.size = 8,
  circle.border = 1,
  group.type = c("ellipse", "hull", "contour", "none"),
  group.alpha = 0.3,
  ellipse.level = 0.95,
  n.contour.grid = 100,
  label.size = 4,
  label.alpha = 0.7,
  plot = TRUE
)

Arguments

`x`	a `rfPermute` or `randomForest` model object.
`dim.x`, `dim.y`	numeric values giving x and y dimensions to plot from multidimensional scaling of proximity scores.
`class.cols`	vector of colors to use for each class.
`legend.type`	type of legend to use to label classes.
`legend.loc`	character keyword specifying location of legend. Can be `"bottom", "top", "left", "right"`.
`point.size`	size of central points. Set to `NULL` for no points.
`circle.size`	size of circles around points indicating classification. Set to NULL for no circles.
`circle.border`	width of circle border.
`group.type`	type of grouping to display. Ignored for regression models.
`group.alpha`	value giving alpha transparency level for group shading. Setting to `0` produces no shading.
`ellipse.level`	the confidence level at which to draw the ellipse.
`n.contour.grid`	number of grid points for contour lines.
`label.size`	size of label if legend.type = `label`.
`label.alpha`	transparency of label background.
`plot`	logical determining whether or not to show plot.

Details

Produces a scatter plot of proximity scores for dim.x and dim.y dimensions from a multidimensional scale (MDS) conversion of proximity scores from a randomForest object. For classification models, points are colored according to original (inner) and predicted (outer) class.

Value

a list with:

`prox.mds`	the MDS scores of the selected dimensions
`g`	`ggplot` object

Author(s)

Eric Archer [email protected]

Examples

library(randomForest)
data(symb.metab)

rf <- randomForest(type ~ ., symb.metab, proximity = TRUE)

# With confidence ellipses
plotProximity(rf)

# With convex hulls
plotProximity(rf, group.type = "hull")

# With contours
plotProximity(rf, group.type = "contour")

# Remove the points and just show ellipses
plotProximity(rf, point.size = NULL, circle.size = NULL, group.alpha = 0.5)

# Labels instead of a legend
plotProximity(rf, legend.type = "label", point.size = NULL, circle.size = NULL, group.alpha = 0.5)

library(randomForest)
data(symb.metab)

rf <- randomForest(type ~ ., symb.metab, proximity = TRUE)

# With confidence ellipses
plotProximity(rf)

# With convex hulls
plotProximity(rf, group.type = "hull")

# With contours
plotProximity(rf, group.type = "contour")

# Remove the points and just show ellipses
plotProximity(rf, point.size = NULL, circle.size = NULL, group.alpha = 0.5)

# Labels instead of a legend
plotProximity(rf, legend.type = "label", point.size = NULL, circle.size = NULL, group.alpha = 0.5)

Plot Trace

Description

Plot trace of cumulative OOB (classification) or MSE (regression) error rate by number of trees.

Usage

plotTrace(x, pct.correct = TRUE, plot = TRUE)
plotTrace(x, pct.correct = TRUE, plot = TRUE)

Arguments

`x`	a `rfPermute` or `randomForest` model object.
`pct.correct`	display y-axis as percent correctly classified (`TRUE`) or OOB error rate (`FALSE`).
`plot`	display the plot?

Value

the ggplot2 object is invisibly returned.

Author(s)

Eric Archer [email protected]

Examples

library(randomForest)
data(mtcars)

rf <- randomForest(factor(am) ~ ., mtcars)
plotTrace(rf)

library(randomForest)
data(mtcars)

rf <- randomForest(factor(am) ~ ., mtcars)
plotTrace(rf)

Plot Vote Distribution

Description

For classification models, plot distribution of votes for each sample in each class.

Usage

plotVotes(x, type = NULL, freq.sep.line = TRUE, plot = TRUE)
plotVotes(x, type = NULL, freq.sep.line = TRUE, plot = TRUE)

Arguments

`x`	a `rfPermute` or `randomForest` model object.
`type`	either `area` for stacked continuous area plot or `bar` for discrete stacked bar chart. The latter is prefered for small numbers of cases. If not specified, a bar chart will be used if all classes have <= 30 cases.
`freq.sep.line`	put frequency of original group on second line in facet label? If `FALSE`, labels are single line. If `NULL` frequencies will not be included in labels.
`plot`	display the plot?

Value

the ggplot2 object is invisibly returned.

Author(s)

Eric Archer [email protected]

Examples

library(randomForest)
data(mtcars)

rf <- randomForest(factor(am) ~ ., mtcars)
plotVotes(rf)

library(randomForest)
data(mtcars)

rf <- randomForest(factor(am) ~ ., mtcars)
plotVotes(rf)

Estimate Permutation p-values for Random Forest Importance Metrics

Description

Estimate significance of importance metrics for a Random Forest model by permuting the response variable. Produces null distribution of importance metrics for each predictor variable and p-value of observed.

Usage

rfPermute(x, ...)

## Default S3 method:
rfPermute(x, y = NULL, ..., num.rep = 100, num.cores = 1)

## S3 method for class 'formula'
rfPermute(
  formula,
  data = NULL,
  ...,
  subset,
  na.action = na.fail,
  num.rep = 100,
  num.cores = 1
)

as.randomForest(x)

## S3 method for class 'rfPermute'
print(x, ...)

## S3 method for class 'rfPermute'
predict(object, ...)
rfPermute(x, ...)

## Default S3 method:
rfPermute(x, y = NULL, ..., num.rep = 100, num.cores = 1)

## S3 method for class 'formula'
rfPermute(
  formula,
  data = NULL,
  ...,
  subset,
  na.action = na.fail,
  num.rep = 100,
  num.cores = 1
)

as.randomForest(x)

## S3 method for class 'rfPermute'
print(x, ...)

## S3 method for class 'rfPermute'
predict(object, ...)

Arguments

`x`, `y`, `formula`, `data`, `subset`, `na.action`, `...`	See `randomForest` for definitions. In `as.randomForest` this is either a `randomForest` or `rfPermute` object to be converted to a `randomForest` object.
`num.rep`	Number of permutation replicates to run to construct null distribution and calculate p-values (default = 100).
`num.cores`	Number of CPUs to distribute permutation results over. Defaults to `NULL` which uses one fewer than the number of cores reported by `detectCores`.
`object`	an `rfPermute` model to be used for prediction. See `predict.randomForest`

Details

All other parameters are as defined in randomForest.formula. A Random Forest model is first created as normal to calculate the observed values of variable importance. The response variable is then permuted num.rep times, with a new Random Forest model built for each permutation step.

Value

An rfPermute object.

Author(s)

Eric Archer [email protected]

Examples

# A regression model predicting ozone levels
data(airquality)
ozone.rp <- rfPermute(Ozone ~ ., data = airquality, na.action = na.omit, ntree = 100, num.rep = 50)
ozone.rp
  
# Plot the scaled importance distributions 
# Significant (p <= 0.05) predictors are in red
plotImportance(ozone.rp, scale = TRUE)

# Plot the importance null distributions and observed values for two of the predictors
plotNull(ozone.rp, preds = c("Solar.R", "Month"))


# A classification model classifying cars to manual or automatic transmission 
data(mtcars)

am.rp <- rfPermute(factor(am) ~ ., mtcars, ntree = 100, num.rep = 50)
summary(am.rp)


plotImportance(am.rp, scale = TRUE, sig.only = TRUE)



# A regression model predicting ozone levels
data(airquality)
ozone.rp <- rfPermute(Ozone ~ ., data = airquality, na.action = na.omit, ntree = 100, num.rep = 50)
ozone.rp
  
# Plot the scaled importance distributions 
# Significant (p <= 0.05) predictors are in red
plotImportance(ozone.rp, scale = TRUE)

# Plot the importance null distributions and observed values for two of the predictors
plotNull(ozone.rp, preds = c("Solar.R", "Month"))


# A classification model classifying cars to manual or automatic transmission 
data(mtcars)

am.rp <- rfPermute(factor(am) ~ ., mtcars, ntree = 100, num.rep = 50)
summary(am.rp)


plotImportance(am.rp, scale = TRUE, sig.only = TRUE)

`rfPermute` package

Description

Random Forest Predictor Importance Significance and Model Diagnostics.

Usage

rfPermuteTutorial()
rfPermuteTutorial()

Diagnostics of `rfPermute` or `randomForest` models.

Description

Combine plots of error traces and inbag rates.

Usage

## S3 method for class 'randomForest'
summary(object, ...)

## S3 method for class 'rfPermute'
summary(object, ...)
## S3 method for class 'randomForest'
summary(object, ...)

## S3 method for class 'rfPermute'
summary(object, ...)

Arguments

`object`	a `rfPermute` or `randomForest` model object to summarize.
`...`	arguments passed to `plotInbag`.

Value

A combination of plots from plotTrace and plotInbag as well as summary confusion matrices (classification) or error rates (regression) from the model.

Author(s)

Eric Archer [email protected]

Examples

# A regression model using the ozone example
data(airquality)
ozone.rp <- rfPermute(
  Ozone ~ ., data = airquality, na.action = na.omit,
  ntree = 100, nrep = 50, num.cores = 1
)

summary(ozone.rp)
 
# A regression model using the ozone example
data(airquality)
ozone.rp <- rfPermute(
  Ozone ~ ., data = airquality, na.action = na.omit,
  ntree = 100, nrep = 50, num.cores = 1
)

summary(ozone.rp)

Symbiodinium type metabolite profiles

Description

A data.frame of 155 metabolite relative concentrations for 64 samples of four Symbiodinium clade types.

Usage

data(symb.metab)
data(symb.metab)

Format

data.frame

References

Klueter, A.; Crandall, J.B.; Archer, F.I.; Teece, M.A.; Coffroth, M.A. Taxonomic and Environmental Variation of Metabolite Profiles in Marine Dinoflagellates of the Genus Symbiodinium. Metabolites 2015, 5, 74-99.

Package 'rfPermute'

Help Index

Balanced Sample Size

Description

Usage

Arguments

Value

Author(s)

Examples

Case Predictions

Description

Usage

Arguments

Value

Author(s)

Examples

Class Priors

Description

Usage

Arguments

Author(s)

See Also

Examples

Clean Random Forest Input Data

Description

Usage

Arguments

Value

Author(s)

Combine rfPermute objects

Description

Usage

Arguments

Author(s)

See Also

Examples

Confusion Matrix

Description

Usage

Arguments

Author(s)

See Also

Examples

Extract rfPermute Importance Scores and p-values.

Description

Usage

Arguments

Author(s)

Examples

Percent Correctly Classified

Description

Usage

Arguments

Value

Author(s)

Examples

Plot Important Predictor Distribution

Description

Usage

Arguments

Value

Note

Author(s)

Examples

Plot Inbag distribution

Description

Usage

Arguments

Value

Note

Author(s)

Examples

Plot Random Forest Importance Null Distributions

Description

Usage

Arguments

Details

Value

Author(s)

Examples

`rfPermute` package

Diagnostics of `rfPermute` or `randomForest` models.