It is possible to construct PDPs with pointwise variability estimates (e.g., +/- one standard deviation). This can be done easily with the
pred.fun argument of
partial(). To illustrate, we’ll fit a random forest to the famous iris data and construct a PDP for the most important variable that displays pointwise standard deviation bands.
In the code chunk below, we use the ranger package (Wright, Wager, and Probst 2018) to fit a random forest to the built-in iris data. Note that in order to use the vip (Greenwell and Boehmke, n.d.) and pdp (Greenwell 2017) packages for classification models, we need to be able to compute the predicted class probabilities. Therefore, we must specify
probability = TRUE in the call to
Next, we’ll use the vip package to construct a variable importance plot (VIP) from the fitted random forest. The default for
vip() is to construct model-specific VIPs if availble. Since we specified
importance = "impurity" in the call to
vip() will plot the impurity-based variable importance scores common with most decision tree-based algorithms.
It appears from Figure 1 that
Petal.Length are the most important features in predicting
Species. Now that we’ve identified these important features, we’ll construct PDPs for each that include pointwise standard deviation bands. To do this, we’ll specify a special prediction function that returns three components: the average prediction, the average prediction minus one standard deviation, and the average prediction plus one standard deviation. This function is defined below; note that this function requires the arguments
Next, we just supply this function via the
pred.fun argument in the call to
To plot the resulting PDPs, we could’ve just specified
plot = TRUE in the previous calls to
partial(). However, since PDPs can be computationally expensive to compute (though, not in this example), it is good practice to store the results first, and then manually construct the plot. We’ll use the convenient
autoplot() function provided by pdp to construct the plots (which requires that the ggplot2 package (Wickham et al. 2018) be laoded first). The results are displayed in Figure 2.
library(ggplot2) # for autoplot() generic pdp1 <- autoplot(pd1) + theme_light() + labs(x = "Petal width (mm)", y = "Partial dependence") + theme(legend.position = "none") pdp2 <- autoplot(pd2) + theme_light() + labs(x = "Petal length (mm)", y = "Partial dependence") + theme(legend.position = "none") grid.arrange(pdp1, pdp2, nrow = 1) # display plots side by side
Greenwell, Brandon, and Brad Boehmke. n.d. Vip: Variable Importance Plots. https://koalaverse.github.io/vip/index.html.
Greenwell, Brandon M. 2017. “Pdp: An R Package for Constructing Partial Dependence Plots.” The R Journal 9 (1): 421–36. https://journal.r-project.org/archive/2017/RJ-2017-016/index.html.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, and Kara Woo. 2018. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wright, Marvin N., Stefan Wager, and Philipp Probst. 2018. Ranger: A Fast Implementation of Random Forests. https://CRAN.R-project.org/package=ranger.