Gradient boosting is a machine learning technique for regression problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. The gradient boosting method can also be used for classification problems by reducing them to regression with a suitable loss function.

The similarity with a basis of functions in an Hilbert space is sometimes difficult to push aside. If we have weak learners $latex \{F_m(x,\gamma)\}$ then an exact learner would be $latex F^*(x) = \Sigma_m F_m(x,\gamma)$

but you immediately face the questions

• what kind of space is F sitting in
• what is the inner-product in this space which articulates the basis
• how would one in general reformulate the analogy within category theory?

An approximation F of the full function $latex F^*$ based on a given sample $latex \{(x_1,y_1), …, (x_n, y_n)\}$ is such that it minimizes the loss function L

$latex F^* = arg min_F\big( \mathbb{E}(L(y,F(x)))\big)$

where “arg min” means the function which minimizes the expectation of the loss. The gradient boosting is then from here on just a traditional steepest descent approach starting with a simple weak learner and use the gradient of the loss to access as quickly as possible the set of weak learners minimizing the loss.

The method was invented by Jerome H. Friedman in 1999 and was published in a series of two papers, the first of which introduced the method, and the second one described an important tweak to the algorithm, which improves its accuracy and performance. Gradient Boosting is a special case of the functional gradient descent view of boosting.

Tags: