Random Forest is an ensemble learning method (bagging) that constructs a multitude of decision trees at training time and outputs the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

How It Works

  • Create multiple decision trees using bootstrap samples (with replacement) of the training data.
  • For each split in each tree, consider only a random subset of features.
  • For classification, use majority voting of trees; for regression, use the average prediction.

Advantages

  • Reduces overfitting compared to individual Decision Trees.
  • Handles high-dimensional data well due to the sampling

Disadvantages

Prerequisites for Good Performance

  • Presence of actual signal in the features
  • Low correlation between predictions (and errors) of individual trees. Decision Tree has high variance by definition, so using random sampling ensures low correlation between the individual trees.