The generalized boosting package gbm in
R offers the plot.gbm function which > “selects a grid
of points and uses the weighted tree traversal method described in
Friedman (2001) to do the integration”
The partial dependence package pdp in
R offers the partial function with the Boolean
recursive argument which > “indicating whether or not to
use the weighted tree traversal method described in Friedman (2001).
This only applies to objects that inherit from class gbm.
Default is TRUE which is much faster than the exact brute force approach
used for all other models.”
I believe that the original description of the algorithm in the Friedman paper is incorrect. By following only one child of splits on \(X_0\) one computes conditional PDPs and not interventional PDPs. I do not believe it possible to compute the latter with just one pass through the tree, it requires a full pass of the entire training data. The only speedup that my proposed algorithm offers is that one full pass is sufficient for all values of \(X_0\)! And to achieve that, one has to follow both children of \(X_0\).
Why do we need to pass all data? Because the splits on \(X_C\) can be very different depending on the \(X_0\) condition due to correlations among the features. The following shows a simple extreme example to illustrate this point.
Both in his paper and in the ESLII book, Friedman elaborates on how the conditional \(E[f(x_S,X_C)|X_S=x_s]\) is different from the interventional \(E[f(x_S,X_C)| \mathbf{do}(X_S=x_s)]\).