Prerequisites

# Load required packages
library(kernlab)  # for fitting support vector machines
library(pdp)      # for partial dependence plots
library(vip)      # for variable importance plots

Classification problems

Traditionally, for classification problems, partial dependence functions are on a scale similar to the logit; see, for example, (Hastie, Tibshirani, and Friedman 2009, 369–70). Suppose the response is categorical with \(K\) levels, then for each class we compute \[ f_k(x) = \log\left[p_k(x)\right] - \frac{1}{K}\sum_{k = 1}^K\log\left[p_k(x)\right], \quad k = 1, 2, \dots, K, \] where \(p_k(x)\) is the predicted probability for the \(k\)-th class. Plotting \(f_k(x)\) helps us understand how the log-odds for the \(k\)-th class depends on different subsets of the predictor variables. The same goes for plotting individual conditional expectation (ICE) curves (Goldstein et al. 2015).

To illustrate, we consider Edgar Anderson’s iris data from the datasets package. The iris data frame contains the sepal length, sepal width, petal length, and petal width (in centimeters) for 50 flowers from each of three species of iris: setosa, versicolor, and virginica (i.e., \(K = 3\)). In the code chunk below, we fit a support vector machine (SVM) with a Gaussian radial basis function kernel to the iris data using the svm() function in the kernlab package (Karatzoglou, Smola, and Hornik 2018) (the tuning parameters were determined using 5-fold cross-validation). Note that the partial() function has to be able to extract the predicted probabilities for each class, so it is necessary to set probability = TRUE in the call to svm().

# Fit an SVM to the Edgar Anderson's iris data
iris_svm <- ksvm(Species ~ ., data = iris, kernel = "rbfdot", 
                 kpar = list(sigma = 0.709), C = 0.5, prob.model = TRUE)

Before constructing ICE curves or partial dependence plots (PDPs), it is good practice to investigate variable importance. In the code chunk below, we use the vip package (Greenwell and Boehmke, n.d.) to a construct PDP-based variable importance plot (VIP); see ?vip::vi for a reference. The VIP (Figure 1) suggests that Petal.Width and Petal.Length

**Figure 1** Variable importance plot.

Figure 1 Variable importance plot.

Next we’ll construct ICE curves (black curves) and PDPs (red curves) for all four features. The results are displayed in Figure 2 below.

**Figure 2** Partial dependence of `setosa` on all four features (default logit scale).

Figure 2 Partial dependence of setosa on all four features (default logit scale).

It is also possible to obtain PDPs for classification problems on the raw probability scale by setting prob = TRUE in the call to partial(). An example is given below and the results are displayed in Figure 3.

partial(iris_svm, pred.var = c("Petal.Width", "Sepal.Width"), plot = TRUE, 
        chull = TRUE, train = iris, prob = TRUE, plot.engine = "ggplot2",
        palette = "magma")
**Figure 3** Partial dependence of `setosa` on `Petal.Width` and `Petal.Length` (probability scale).

Figure 3 Partial dependence of setosa on Petal.Width and Petal.Length (probability scale).

By default, partial() will use the first class as the focus class. To plot the PDP for a particular class, use the which.class argument. To illustrate, we’ll plot the partial dependence of Species on both Petal.Width and Petal.Length for each of the three classes. The results are displayed in Figure 4 below.

# Compute partial dependence for each of the three classes
pd <- NULL
for (i in 1:3) {
  tmp <- partial(iris_svm, pred.var = c("Petal.Width", "Petal.Length"),
                 which.class = i, grid.resolution = 10, train = iris)
  pd <- rbind(pd, cbind(tmp, Species = levels(iris$Species)[i]))
}

# Figure 3
library(ggplot2)
ggplot(pd, aes(x = Petal.Width, y = Petal.Length, z = yhat, fill = yhat)) +
  geom_tile() +
  geom_contour(color = "white", alpha = 0.5) +
  scale_fill_distiller(name = "Class-centered\nlogit", palette = "Spectral") +
  theme_bw() +
  facet_grid(~ Species)
**Figure 4** Partial dependence of each class on `Petal.Width` and `Petal.Length` (default logit scale).

Figure 4 Partial dependence of each class on Petal.Width and Petal.Length (default logit scale).

References

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65. https://doi.org/10.1080/10618600.2014.907095.

Greenwell, Brandon, and Brad Boehmke. n.d. Vip: Variable Importance Plots. https://koalaverse.github.io/vip/index.html.

Hastie, Trevor, Robert. Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. Springer Series in Statistics. Springer-Verlag.

Karatzoglou, Alexandros, Alex Smola, and Kurt Hornik. 2018. Kernlab: Kernel-Based Machine Learning Lab. https://CRAN.R-project.org/package=kernlab.