A Decision Tree is a supervised learning algorithm used for both classification and regression tasks. It creates a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
Structure
- Root Node: The topmost node representing the entire dataset.
- Internal Nodes: Nodes that represent the dataset’s features and are used to make decisions.
- Branches: Connections between nodes, representing decision rules.
- Leaf Nodes: Terminal nodes that provide the final output (class label (mode of the training instances) or value (mean of the training instances)).
Training Process
- Feature Selection: Choose the best attribute to split the data.
- Split Point Decision: Determine the best split point for the chosen feature.
- Splitting: Divide the dataset based on the chosen feature and split point: left child note contains data points where the feature value is less than the split point, right child node - with values more than the split point.
- Recursion: Repeat steps 1-3 for each child node until stopping criteria are met.
Splitting Criteria
- Classification Trees:
- Gini Impurity:
- Entropy:
- Regression Trees:
- Variance (equivalent to MSE):
Stopping Criteria
- Maximum depth reached
- Minimum number of samples in a node
- Minimum decrease in impurity
- All samples in a node belong to the same class
Ensemble Methods Using Decision Trees
Advantages
- Easy to understand and interpret
- Requires little data preprocessing
- Can handle both numerical and categorical data
- Can be visualized easily
Disadvantages
- Can create overly complex trees that do not generalize well (overfitting)
- Can be unstable; small variations in the data can result in a completely different tree
- Biased towards features with more levels (in categorical variables)
- Cannot predict beyond the range of the training data (for regression tasks)
Regularization
- Pruning: Removing branches that provide little predictive power
- Setting a minimum number of samples required at a leaf node
- Setting a maximum depth of the tree
- Setting a maximum number of features to consider for splitting
Feature Importance
Decision Trees provide built-in feature importance: Importance is calculated based on how much each feature decreases the weighted impurity.
Example of entropy calculation:
20 balls: 9 blue, 11 yellow.
Now 13 balls: 8 blue and 5 yellow: