Prediction Algorithms & Settings
The prediction algorithms are listed in order from highest to lowest with respect to correlation and accuracy. These algorithms are off-the-shelve, meaning that they are widely available and do not require adjustment. It is not within the scope of this document to explain the details of each, however, general descriptions are provided below. Each prediction algorithm has its own set of advanced algorithm settings. The defaults will generally yield the best results.
eXtreme Gradient Boosting |
eXtreme Gradient Boosting is an implementation of gradient boosting tree (see below) with improved performance and accuracy and is the default algorithm. |
Random forest |
Random Forest is an ensemble learning method for classification and regression that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set. It has been shown to perform very well in many machine learning applications. |
Gradient Boosting Tree |
Gradient boosting tree is an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. The idea is that in the next learning iteration it will try to learn from mistakes made in the previous one. |
Neural Network |
|
Linear Regression |
Linear Regression fits a linear model to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. No algorithm settings are required. |
When you are Fitting a model to training data, you have to try to strike a balance between underfitting and overfitting.
An overfit model performs extraordinarily well on the training data but doesn’t generalize well when we try to use it on new data. An underfit model doesn’t give accurate or useful predictions for any data set.
The following list includes all advanced algorithm settings. In general the defaults will yield optimal results. However, if you have experience with these algorithms, the settings are available for you to adjust. The effect of adjusting the setting values is difficult to predetermine as each algorithm responds differently. Therefore, we only advise those who have experience with these algorithms to adjust the settings.
eXtreme Gradient Boosting Tree |
|||
Setting |
Range |
Default |
Description |
Number of trees | 50 - 2000 | 100 |
Number of gradient boosted trees. Equivalent to number of boosting rounds. |
Maximum tree depth | 5 - 20 | 6 |
Maximum tree depth for base learners |
Learning Rate | 0.001 - 0.5 | 0.3 |
Boosting learning rate |
Gamma | 0 - 0.01 | 0 |
Minimum loss reduction required to make a further partition on a leaf node of the tree |
Random Forest | |||
Setting |
Range |
Default |
Description |
Number of trees | 50 - 2000 | 100 |
The number of trees in the forest |
Maximum tree depth | 5 - 20 | Null |
The maximum depth of the tree. If Null, then nodes are expanded until all leaves are pure or until all leaves contain less than the minimum samples per node. |
Minimum samples per node | 2 - 20 | 2 |
The minimum number of samples required to split an internal node. |
Minimum samples per leaf | 1 - 20 | 1 | The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least the minimum samples per leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. |
Gradient Boosting Tree | |||
Setting |
Range |
Default |
Description |
Learning rate | 0.001 - 0.5 | 0.1 |
Learning rate shrinks the contribution of each tree. |
Number of trees | 50 - 2000 | 100 |
The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance. |
Subsample | 0.05 - 1 | 1 |
The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. Choosing subsample less than 1.0 leads to a reduction of variance and an increase in bias. |
Minimum samples per node | 2 - 20 | 2 |
The minimum number of samples required to split an internal node. |
Neural Network | |||
Node1 |
10 - 1000 |
100 |
The number of neurons in the first hidden layer |
Node2 |
10 - 1000 |
100 |
The number of neurons in the second hidden layer |
Alpha |
0.0001 - 10 |
0.0001 |
L2 penalty (regularization term) parameter. |
MaxIterations |
100 - 2000 |
200 |
The number of iterations that the model runs through until it cannot improve its performance metric, or the set number of iterations is reached. |
Linear Regression applies default settings. No adjustments are available. |
Fitting a model
Fitting a model means that you're making your algorithm learn the relationship between predictors and outcome so that you can predict the future values of the outcome. So the best fitted model has a specific set of parameters which best defines the problem at hand.
Overfitting |
Occurs when a statistical model or machine learning algorithm captures the noise of the data. In other words, overfitting occurs when the model or the algorithm fits the data too well. |
Underfitting |
Occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. In other words, underfitting occurs when the model or the algorithm does not fit the data well enough. |