The generalized boosting package gbm in
R offers the plot.gbm
function which > “selects a grid
of points and uses the weighted tree traversal method described in
Friedman (2001) to do the integration”
The partial dependence package pdp in
R offers the partial
function with the Boolean
recursive
argument which > “indicating whether or not to
use the weighted tree traversal method described in Friedman (2001).
This only applies to objects that inherit from class gbm
.
Default is TRUE which is much faster than the exact brute force approach
used for all other models.”
I believe that the original description of the algorithm in the Friedman paper is incorrect. By following only one child of splits on \(X_0\) one computes conditional PDPs and not interventional PDPs. I do not believe it possible to compute the latter with just one pass through the tree, it requires a full pass of the entire training data. The only speedup that my proposed algorithm offers is that one full pass is sufficient for all values of \(X_0\)! And to achieve that, one has to follow both children of \(X_0\).
Why do we need to pass all data? Because the splits on \(X_C\) can be very different depending on the \(X_0\) condition due to correlations among the features. The following shows a simple extreme example to illustrate this point.
Both in his paper and in the ESLII book, Friedman elaborates on how the conditional \(E[f(x_S,X_C)|X_S=x_s]\) is different from the interventional \(E[f(x_S,X_C)| \mathbf{do}(X_S=x_s)]\).