Random Forest

Random Forest is an ensemble learning method (bagging) that constructs a multitude of decision trees at training time and outputs the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

How It Works

Create multiple decision trees using bootstrap samples (with replacement) of the training data.
For each split in each tree, consider only a random subset of features.
For classification, use majority voting of trees; for regression, use the average prediction.

Advantages

Reduces overfitting compared to individual Decision Trees.
Handles high-dimensional data well due to the sampling

Disadvantages

Less interpretable than a single Decision Tree.
Computationally more intensive than a single Decision Tree.

Prerequisites for Good Performance

Presence of actual signal in the features
Low correlation between predictions (and errors) of individual trees. Decision Tree has high variance by definition, so using random sampling ensures low correlation between the individual trees.

DSWoK — Data Science Well of Knowledge

Explorer

Random Forest

How It Works

Advantages

Disadvantages

Prerequisites for Good Performance

Links

Graph View

Table of Contents

Backlinks