`vignettes/pdp-se.Rmd.Rmd`

`pdp-se.Rmd.Rmd`

It is possible to construct PDPs with pointwise variability estimates (e.g., +/- one standard deviation). This can be done easily with the `pred.fun`

argument of `partial()`

. To illustrate, we’ll fit a random forest to the famous iris data and construct a PDP for the most important variable that displays pointwise standard deviation bands.

In the code chunk below, we use the **ranger** package (Wright, Wager, and Probst 2018) to fit a random forest to the built-in iris data. Note that in order to use the **vip** (Greenwell and Boehmke, n.d.) and **pdp** (Greenwell 2017) packages for classification models, we need to be able to compute the predicted class probabilities. Therefore, we must specify `probability = TRUE`

in the call to `ranger()`

.

```
library(ranger) # for fitting random forests
set.seed(101) # for reproducibility
rfo <- ranger(Species ~ ., data = iris, probability = TRUE,
importance = "impurity")
```

Next, we’ll use the **vip** package to construct a variable importance plot (VIP) from the fitted random forest. The default for `vip()`

is to construct model-specific VIPs if availble. Since we specified `importance = "impurity"`

in the call to `ranger()`

, `vip()`

will plot the impurity-based variable importance scores common with most decision tree-based algorithms.

```
library(vip) # for variable importance plots
#>
#> Attaching package: 'vip'
#> The following object is masked from 'package:utils':
#>
#> vi
vip(rfo)
```

It appears from **Figure 1** that `Petal.Width`

and `Petal.Length`

are the most important features in predicting `Species`

. Now that we’ve identified these important features, we’ll construct PDPs for each that include pointwise standard deviation bands. To do this, we’ll specify a special prediction function that returns three components: the average prediction, the average prediction minus one standard deviation, and the average prediction plus one standard deviation. This function is defined below; note that this function requires the arguments `object`

and `newdata`

.

```
pred_wrapper <- function(object, newdata) {
p <- predict(object, data = newdata)$predictions[, 1L, drop = TRUE]
c("avg" = mean(p), "avg-1sd" = mean(p) - sd(p), "avg+1sd" = mean(p) + sd(p))
}
```

Next, we just supply this function via the `pred.fun`

argument in the call to `partial()`

:

```
library(pdp) # for partial dependence plots
pd1 <- partial(rfo, pred.var = "Petal.Width", pred.fun = pred_wrapper)
pd2 <- partial(rfo, pred.var = "Petal.Length", pred.fun = pred_wrapper)
```

To plot the resulting PDPs, we could’ve just specified `plot = TRUE`

in the previous calls to `partial()`

. However, since PDPs can be computationally expensive to compute (though, not in this example), it is good practice to store the results first, and then manually construct the plot. We’ll use the convenient `autoplot()`

function provided by **pdp** to construct the plots (which requires that the **ggplot2** package (Wickham et al. 2018) be laoded first). The results are displayed in **Figure 2**.

```
library(ggplot2) # for autoplot() generic
pdp1 <- autoplot(pd1) +
theme_light() +
labs(x = "Petal width (mm)", y = "Partial dependence") +
theme(legend.position = "none")
pdp2 <- autoplot(pd2) +
theme_light() +
labs(x = "Petal length (mm)", y = "Partial dependence") +
theme(legend.position = "none")
grid.arrange(pdp1, pdp2, nrow = 1) # display plots side by side
```

Greenwell, Brandon, and Brad Boehmke. n.d. *Vip: Variable Importance Plots*. https://koalaverse.github.io/vip/index.html.

Greenwell, Brandon M. 2017. “Pdp: An R Package for Constructing Partial Dependence Plots.” *The R Journal* 9 (1): 421–36. https://journal.r-project.org/archive/2017/RJ-2017-016/index.html.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, and Kara Woo. 2018. *Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics*. https://CRAN.R-project.org/package=ggplot2.

Wright, Marvin N., Stefan Wager, and Philipp Probst. 2018. *Ranger: A Fast Implementation of Random Forests*. https://CRAN.R-project.org/package=ranger.