It is possible to construct PDPs with pointwise variability estimates (e.g., +/- one standard deviation). This can be done easily with the pred.fun argument of partial(). To illustrate, we’ll fit a random forest to the famous iris data and construct a PDP for the most important variable that displays pointwise standard deviation bands.

In the code chunk below, we use the ranger package (Wright, Wager, and Probst 2018) to fit a random forest to the built-in iris data. Note that in order to use the vip (Greenwell and Boehmke, n.d.) and pdp (Greenwell 2017) packages for classification models, we need to be able to compute the predicted class probabilities. Therefore, we must specify probability = TRUE in the call to ranger().

library(ranger)  # for fitting random forests
set.seed(101)  # for reproducibility
rfo <- ranger(Species ~ ., data = iris, probability = TRUE,
importance = "impurity")

Next, we’ll use the vip package to construct a variable importance plot (VIP) from the fitted random forest. The default for vip() is to construct model-specific VIPs if availble. Since we specified importance = "impurity" in the call to ranger(), vip() will plot the impurity-based variable importance scores common with most decision tree-based algorithms.

library(vip)  # for variable importance plots
#>
#> Attaching package: 'vip'
#> The following object is masked from 'package:utils':
#>
#>     vi
vip(rfo)

It appears from Figure 1 that Petal.Width and Petal.Length are the most important features in predicting Species. Now that we’ve identified these important features, we’ll construct PDPs for each that include pointwise standard deviation bands. To do this, we’ll specify a special prediction function that returns three components: the average prediction, the average prediction minus one standard deviation, and the average prediction plus one standard deviation. This function is defined below; note that this function requires the arguments object and newdata.

pred_wrapper <- function(object, newdata) {
p <- predict(object, data = newdata)\$predictions[, 1L, drop = TRUE]
c("avg" = mean(p), "avg-1sd" = mean(p) - sd(p), "avg+1sd" = mean(p) + sd(p))
}

Next, we just supply this function via the pred.fun argument in the call to partial():

library(pdp)  # for partial dependence plots
pd1 <- partial(rfo, pred.var = "Petal.Width", pred.fun = pred_wrapper)
pd2 <- partial(rfo, pred.var = "Petal.Length", pred.fun = pred_wrapper)

To plot the resulting PDPs, we could’ve just specified plot = TRUE in the previous calls to partial(). However, since PDPs can be computationally expensive to compute (though, not in this example), it is good practice to store the results first, and then manually construct the plot. We’ll use the convenient autoplot() function provided by pdp to construct the plots (which requires that the ggplot2 package (Wickham et al. 2018) be laoded first). The results are displayed in Figure 2.

library(ggplot2)  # for autoplot() generic
pdp1 <- autoplot(pd1) +
theme_light() +
labs(x = "Petal width (mm)", y = "Partial dependence") +
theme(legend.position = "none")
pdp2 <- autoplot(pd2) +
theme_light() +
labs(x = "Petal length (mm)", y = "Partial dependence") +
theme(legend.position = "none")
grid.arrange(pdp1, pdp2, nrow = 1)  # display plots side by side

Greenwell, Brandon, and Brad Boehmke. n.d. Vip: Variable Importance Plots. https://koalaverse.github.io/vip/index.html.

Greenwell, Brandon M. 2017. “Pdp: An R Package for Constructing Partial Dependence Plots.” The R Journal 9 (1): 421–36. https://journal.r-project.org/archive/2017/RJ-2017-016/index.html.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, and Kara Woo. 2018. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.

Wright, Marvin N., Stefan Wager, and Philipp Probst. 2018. Ranger: A Fast Implementation of Random Forests. https://CRAN.R-project.org/package=ranger.