Random Forest is an ensemble learning method (bagging) that constructs a multitude of decision trees at training time and outputs the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.
How It Works
- Create multiple decision trees using bootstrap samples (with replacement) of the training data.
- For each split in each tree, consider only a random subset of features.
- For classification, use majority voting of trees; for regression, use the average prediction.
Advantages
- Reduces overfitting compared to individual Decision Trees.
- Handles high-dimensional data well due to the sampling
Disadvantages
- Less interpretable than a single Decision Tree.
- Computationally more intensive than a single Decision Tree.
Prerequisites for Good Performance
- Presence of actual signal in the features
- Low correlation between predictions (and errors) of individual trees. Decision Tree has high variance by definition, so using random sampling ensures low correlation between the individual trees.