`vignettes/pdp-extending.Rmd`

`pdp-extending.Rmd`

Partial dependence plots (PDPs) are essentially just averaged predictions; this is apparent from step 1. (c) in **Algorithm 1** in Greenwell (2017). Consequently, as pointed out by Goldstein et al. (2015), strong heterogeneity can conceal the complexity of the modeled relationship between the response and predictors of interest. This was part of the motivation behind Goldstein et al. (2015)’s ICE plot procedure.

With `partial()`

it is possible to replace the mean in step 1. (c) of **Algorithm 1** with any other function (e.g., the median or trimmed mean), or obtain PDPs for classification problems on the probability scale^{1}. It is even possible to obtain ICE curves^{2}. This flexibility is due to the new `pred.fun`

argument in `partial()`

(starting with **pdp** version 0.4.0). This argument accepts an optional prediction function that requires two arguments: `object`

and `newdata`

. The supplied prediction function must return either a single prediction or a vector of predictions. Returning the mean of all the predictions will result in the traditional PDP. Returning a vector of predictions (i.e., one for each observation) will result in a set of ICE curves. The examples below illustrate. The point to be made is that the `pred.fun`

argument gives `partial()`

the flexibility to handle all kinds of situations!

Using the `pred.fun`

argument, it is possible to obtain PDPs for classification problems on the probability scale. We just need to write a function that computes the predicted class probability of interest averaged across all observations.

To illustrate, we consider Edgar Anderson’s iris data from the **datasets** package. The `iris`

data frame contains the sepal length, sepal width, petal length, and petal width (in centimeters) for 50 flowers from each of three species of iris: setosa, versicolor, and virginica (i.e., \(K = 3\)). In the code chunk below, we fit a support vector machine (SVM) with a Gaussian radial basis function kernel to the `iris`

data using the `svm()`

function in the **kernlab** package (Karatzoglou, Smola, and Hornik 2018) (the tuning parameters were determined using 5-fold cross-validation). **Note that the partial() function has to be able to extract the predicted probabilities for each class**, so it is necessary to set

`probability = TRUE`

in the call to `svm()`

. See the vignette titled ```
# Fit an SVM to the Edgar Anderson's iris data
iris_svm <- ksvm(Species ~ ., data = iris, kernel = "rbfdot",
kpar = list(sigma = 0.709), C = 0.5, prob.model = TRUE)
```

The function below can be used to extract the average predicted probability of belonging to the `Setosa`

class.

```
# Function to return predicted class probabilities from a "ksvm" object
pred_prob <- function(object, newdata) { # see ?predict.ksvm
mean(predict(object, newdata, type = "prob")[, "setosa", drop = TRUE])
}
```

Next, we simply pass this function via the `pred.fun`

argument in the call to `partial()`

. The following chunk of code uses `pred_prob`

to obtain PDPs for `Petal.Width`

and `Petal.Length`

on the probability scale. The results are displayed in **Figure 1**.

```
# PDPs for Petal.Width and Petal.Length on the probability scale
pdp1 <- partial(iris_svm, pred.var = "Petal.Width", pred.fun = pred_prob,
plot = TRUE, train = iris)
pdp2 <- partial(iris_svm, pred.var = "Petal.Length", pred.fun = pred_prob,
plot = TRUE, train = iris)
pdp3 <- partial(iris_svm, pred.var = c("Petal.Width", "Petal.Length"),
pred.fun = pred_prob, plot = TRUE, train = iris)
# Figure 1
grid.arrange(pdp1, pdp2, pdp3, ncol = 3)
```

We could also plot the PDP for a single feature and include pointwise standard deviation bands! To do this, we simply augment the user-defined prediction function to return the mean, as well as the mean +/- one standard deviation (see **Figure 2**):

```
# Function to return average prediction and average +/- one standard deviation
pfun <- function(object, newdata) { # see ?predict.ksvm
prob <- predict(object, newdata, type = "prob")[, "setosa"]
c("mean" = mean(prob),
"mean-sd" = mean(prob) - sd(prob),
"mean+sd" = mean(prob) + sd(prob))
}
# PDP for Petal.Width +/- one standard deviation (Figure 2)
partial(iris_svm, pred.var = "Petal.Width", pred.fun = pfun, plot = TRUE,
train = iris, rug = TRUE, plot.engine = "ggplot2") +
theme_light() +
labs(x = "Petal width (cm)", y = "Partial dependence")
#> Warning: Ignoring unknown parameters: csides
```

For regression problems, the default prediction function is essentially

```
pred_fun <- function(object, newdata) {
mean(predict(object, newdata), na.rm = TRUE)
}
```

This corresponds to step step 1. (c) in **Algorithm 1**. Suppose we would like to manually construct ICE curves instead. To accomplish this we need to pass a prediction function that returns a vector of predictions, one for each observation in `newdata`

(i.e., just remove the call to `mean`

in `pred.fun`

).

For illustration, we’ll use a corrected version of the Boston housing data analyzed in Harrison and Rubinfeld (1978); the data are available in the **pdp** package (see `?pdp::boston`

for details). We begin by loading the data and fitting a random forest with default tuning parameters and 500 trees using the **ranger** package (Wright, Wager, and Probst 2018).

```
# Fit a random forest to the Boston housing data
set.seed(101) # for reproducibility
boston_rfo <- ranger(cmedv ~ ., data = boston)
```

The model fit is reasonable, with an *out-of-bag* (pseudo) \(R^2\) of 0.879.

The code snippet below manually constructs ICE curves for the Boston housing example using the predictor `rm`

. The result is displayed in **Figure 3**. Note that when the function supplied to `pred.fun`

returns multiple predictions, the data frame returned by `partial()`

includes an additional column, `yhat.id`

, that indicates which curve a point belongs to; in the following code chunk, there will be one curve for each observation in `boston`

.

```
# Use partial to obtain ICE curves
ranger_ice <- function(object, newdata) {
predict(object, newdata)$predictions
}
rm_ice <- partial(boston_rfo, pred.var = "rm", pred.fun = ranger_ice)
# Figure 3
autoplot(rm_ice, rug = TRUE, train = boston, alpha = 0.3)
#> Warning: Ignoring unknown parameters: csides
```

The curves in **Figure 3** indicate some heterogeneity in the fitted model (i.e., some of the curves depict the opposite relationship). Such heterogeneity can be easier to spot using c-ICE curves; see Equation (4) on page 49 of Goldstein et al. (2015). Using **dplyr** [Wickham et al. (2018)}, it is rather straightforward to post-process the output from `partial()`

to obtain c-ICE curves (similar to the construction of *raw change scores* (Fitzmaurice, Laird, and Ware 2011, pg. 130) for longitudinal data)^{3}. This is shown below.

```
# Post-process rm.ice to obtain c-ICE curves
rm_ice <- rm_ice %>%
group_by(yhat.id) %>% # perform next operation within each yhat.id
mutate(yhat.centered = yhat - first(yhat)) # so each curve starts at yhat = 0
```

Since the PDP is just the average of the corresponding ICE curves, it is quite simple to display both on the same plot. This is easily accomplished using the `stat_summary()`

function from the **ggplot2** package to average the ICE curves together. The code snippet below plots the ICE curves and c-ICE curves, along with their averages, for the predictor `rm`

in the Boston housing example. The results are displayed in **Figure 4**.

```
# ICE curves with their average
p1 <- ggplot(rm_ice, aes(rm, yhat)) +
geom_line(aes(group = yhat.id), alpha = 0.2) +
stat_summary(fun.y = mean, geom = "line", col = "red", size = 1)
# c-ICE curves with their average
p2 <- ggplot(rm_ice, aes(rm, yhat.centered)) +
geom_line(aes(group = yhat.id), alpha = 0.2) +
stat_summary(fun.y = mean, geom = "line", col = "red", size = 1)
# Figure 4
grid.arrange(p1, p2, ncol = 2)
```

Fitzmaurice, G. M., N. M. Laird, and J. H. Ware. 2011. *Applied Longitudinal Analysis*. Wiley Series in Probability and Statistics. John Wiley & Sons.

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” *Journal of Computational and Graphical Statistics* 24 (1): 44–65. https://doi.org/10.1080/10618600.2014.907095.

Greenwell, Brandon, and Brad Boehmke. n.d. *Vip: Variable Importance Plots*. https://koalaverse.github.io/vip/index.html.

Greenwell, Brandon M. 2017. “Pdp: An R Package for Constructing Partial Dependence Plots.” *The R Journal* 9 (1): 421–36. https://journal.r-project.org/archive/2017/RJ-2017-016/index.html.

Harrison, David, and Daniel L. Rubinfeld. 1978. “Hedonic Housing Prices and the Demand for Clean Air.” *Journal of Environmental Economics and Management* 5 (1): 81–102. https://doi.org/10.1016/0095-0696(78)90006-2.

Karatzoglou, Alexandros, Alex Smola, and Kurt Hornik. 2018. *Kernlab: Kernel-Based Machine Learning Lab*. https://CRAN.R-project.org/package=kernlab.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2018. *Dplyr: A Grammar of Data Manipulation*. https://CRAN.R-project.org/package=dplyr.

Wright, Marvin N., Stefan Wager, and Philipp Probst. 2018. *Ranger: A Fast Implementation of Random Forests*. https://CRAN.R-project.org/package=ranger.