For illustration, we’ll use a corrected version of the Boston housing data analyzed in Harrison and Rubinfeld (1978); the data are available in the pdp package (see ?pdp::boston for details). We begin by loading the data and fitting a random forest with default tuning parameters and 500 trees. The model fit is reasonable, with an out-of-bag (pseudo) $$R^2$$ of 0.89. A variable importance plot is displayed in Figure 1 (using the permutation-based random forest definition of variable importance defined in Breiman (2001)). It appears that rm (and lstat) is important in predicting cmedv. Next, we’d like to visualize the functional relationship between these two variables using a partial dependence plot (PDP).

data(boston, package = "pdp")  # load the (corrected) Boston housing data
library(randomForest)  # for fitting random forests
#> randomForest 4.6-14
#> Type rfNews() to see new features/changes/bug fixes.
library(vip)  # for variable importance plots
#>
#> Attaching package: 'vip'
#> The following object is masked from 'package:utils':
#>
#>     vi
set.seed(101)  # for reproducibility
boston.rf <- randomForest(cmedv ~ ., data = boston, importance = TRUE)
vip(boston.rf, bar = FALSE, horizontal = FALSE, size = 1.5)  # Figure 1

Constructing PDPs can be quite computationally expensive1 Several strategies are available to ease the computational burden in larger problems. For example, there is no need to compute partial dependence of cmedv using each unique value of rm in the training data (which would require $$k = 446$$ passes over the data!). We could get very reasonable results using a reduced number of points. Current options are to use a grid of equally spaced values in the range of the variable of interest; the number of points can be controlled using the grid.resolution option in the call to partial. Alternatively, a user-specified grid of values (e.g., containing specific quantiles of interest) can be supplied through the pred.grid argument. To demonstrate, the following snippet of code computes the partial dependence of cmedv on rm using each option; grid.arrange() is used to display all three PDPs on the same graph, side by side. The results are displayed in Figure 2.

library(pdp)  # for partial dependence plots
grid.arrange(  # Figure 2
partial(boston.rf, "rm", plot = TRUE),
partial(boston.rf, "rm", grid.resolution = 30, plot = TRUE),
partial(boston.rf, "rm", pred.grid = data.frame(rm = 3:9), plot = TRUE),
ncol = 3
)

For convenience, the partial() function includes the quantiles argument for specifying whether or not to compute the PDP at specific quantiles (controlled by the probs argument which defaults to 1:9/10—the deciles of the predictor’s distribution). Below, we plot centered individual conditional expectation (c-ICE) curves for rm using a reduced set of grid points occurring at specific quantiles of rm; the results are displayed in Figure 3. The heterogeneity in the curves indicates the potential presence of interaction effects between rm and other features.

The partial() function relies on the plyr package (Wickham 2016), rather than R’s built-in for loops. This makes it easy to request progress bars (e.g., progress = "text") or run partial() in parallel. In fact, partial() can use any of the parallel backends supported by the foreach package. To use this functionality, we must first load and register a supported parallel backend (e.g., doMC (Analytics and Weston 2017) or doParallel (Corporation and Weston 2017)).

To illustrate, we will use the Los Angeles ozone pollution data described in Breiman and Friedman (1985). The data contain daily measurements of ozone concentration (ozone) along with eight meteorological quantities for 330 days in the Los Angeles basin in 1976.2 The following code chunk loads the data into R:

Next, we use the multivariate adaptive regression splines (MARS) algorithm introduced in Friedman (1991) to model ozone concentration as a nonlinear function of the eight meteorological variables plus day of the year; we allow for up to three-way interactions.

The MARS model produced a generalized $$R^2$$ of , similar to what was reported in Breiman and Friedman (1985). A single three-way interaction was found involving the predictors * wind: wind speed (mph) at Los Angeles International Airport (LAX) * temp: temperature ($$^oF$$) at Sandburg Air Force Base * dpg: the pressure gradient (mm Hg) from LAX to Dagget, CA To understand this interaction, we can use a PDP. However, since the partial dependence between three continuous variables can be computationally expensive, we will run partial() in parallel.

Setting up a parallel backend is rather straightforward. To demonstrate, the following snippet of code sets up the partial() function to run in parallel on both Windows and Unix-like systems using the doParallel package.

Now, to run partial() in parallel, all we have to do is invoke the parallel = TRUE and paropts options and the rest is taken care of by the internal call to plyr and the parallel backend we loaded3. This is illustrated in the code chunk below which obtains the partial dependence of ozone on wind, temp, and dpg in parallel. The last three lines of code add a label to the colorkey. The result is displayed in Figure 4. Note: it is considered good practice to shut down the workers by calling stopCluster() when finished. Notice how we also used the chull option to restrict the PDP to the convex hull of wind and temp which helps in reducing the computation time in multivariate PDPs.

plotPartial(pd, palette = "magma")  # Figure 4
lattice::trellis.focus(  # add a label to the colorkey
"legend", side = "right", clipp.off = TRUE, highlight = FALSE
)
grid::grid.text("ozone", x = 0.2, y = 1.05, hjust = 0.5, vjust = 1)
lattice::trellis.unfocus()

It is important to note that when using more than two predictor variables, plotPartial() produces a trellis display. The first two variables given to pred.var are used for the horizontal and vertical axes, and additional variables define the panels. If the panel variables are continuous, then shingles4 are produced first using the equal count algorithm (see, for example, ?lattice::equal.count). Hence, it will be more effective to use categorical variables to define the panels in higher dimensional displays when possible.

References

Analytics, Revolution, and Steve Weston. 2017. DoMC: Foreach Parallel Adaptor for ’Parallel’. https://CRAN.R-project.org/package=doMC.

Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32. https://doi.org/10.1023/A:1010933404324.

Breiman, Leo, and Jerome H. Friedman. 1985. “Estimating Optimal Transformations for Multiple Regression and Correlation.” Journal of the American Statistical Association 80 (391): 580–98. https://doi.org/10.1080/01621459.1985.10478157.

Corporation, Microsoft, and Steve Weston. 2017. DoParallel: Foreach Parallel Adaptor for the ’Parallel’ Package. https://CRAN.R-project.org/package=doParallel.

Friedman, Jerome H. 1991. “Multivariate Adaptive Regression Splines.” The Annals of Statistics 19 (1): 1–67. https://doi.org/10.1214/aos/1176347963.

Harrison, David, and Daniel L. Rubinfeld. 1978. “Hedonic Housing Prices and the Demand for Clean Air.” Journal of Environmental Economics and Management 5 (1): 81–102. https://doi.org/10.1016/0095-0696(78)90006-2.

Wickham, Hadley. 2016. Plyr: Tools for Splitting, Applying and Combining Data. https://CRAN.R-project.org/package=plyr.

1. The exception is regression trees based on single-variable splits which can make use of the efficient weighted tree traversal method described in friedman-2001-greedy, however, only the gbm package seems to make use of this approach; consequently, pdp can also exploit this strategy when used with gbm models (see ?partial for details).

2. The data are available from http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/ozone.data. Details, including variable information, are available from http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/ozone.info.

3. Notice we have to pass the names of external packages that the tasks depend on via the paropts argument; in this case, "earth". See ?plyr::adply for details.

4. A shingle is a special Trellis data structure that consists of a numeric vector along with intervals that define the “levels” of the shingle. The intervals may be allowed to overlap.