Logistic regression

Logistic regression is a statistical method used for binary classification problems, modeling the probability of an instance belonging to a particular class. Common variations of Logistic regression: Binary (two possible outcomes), Multinomial (more than two unordered outcomes), Ordinal (more than two ordered outcomes).
Logistic regression is similar to Linear Regression but uses a logistic (sigmoid) function on the top of it.

$p (y = 1∣ x) = \frac{1}{1 + e ^{- z}}$
where $z = β_{0} + β_{1} x_{1} + β_{2} x_{2} + ... + β_{n} x_{n}$

Logit function: $l o g (\frac{p}{( 1 - p )}) = β_{0} + β_{1} x_{1} + β_{2} x_{2} + ... + β_{n} x_{n}$

The logit is the logarithm of the odds. Logits transform probabilities (from 0 to 1) to real numbers. The sigmoid function is the inverse of the logit function. It maps any real number to a probability between 0 and 1.

Odds represent the ratio of the probability of an event occurring to the probability of it not occurring.
If p = 0.75, odds = $\frac{0.75}{1 - 0.75}$ = 3 - This means the event is 3 times more likely to occur than not to occur.

Assumptions

Independence of observations
Little or no multicollinearity among independent variables
Linearity in the logit for continuous variables

Maximum Likelihood Estimation (MLE)

Logistic regression parameters are typically estimated using MLE.

Likelihood Function

$L (β) = \prod_{i} p (x_{i})_{i}^{y} * (1 - p (x_{i}))^{1 - y_{i}}$

Log-Likelihood

$ll (β) = Σ_{i} [y_{i} l o g (p (x_{i})) + (1 - y_{i}) l o g (1 - p (x_{i}))]$

Gradient Descent for Logistic Regression

In practice, gradient descent is often used to find the optimal parameters.

Update rule: $β = β - α * \nabla J (β)$
Where $α$ is the learning rate and $\nabla J (β)$ is the gradient of the cost function.

Interpretation of Coefficients

$β_{i}$ : Log-odds increase for a one-unit increase in $x_{i}$ , holding other variables constant
$e x p (β_{i})$ : Odds ratio for a one-unit increase in $x_{i}$ . If $e x p (β)$ = 1.2, a one-unit increase in $x$ multiplies the odds by 1.2 (20% increase).

Advantages

Simple and interpretable
Provides probability outputs
Less prone to overfitting compared to more complex models

Limitations

Assumes linearity in log-odds
Cannot capture complex, non-linear relationships without feature engineering

Code example

import numpy as np
 
class LogisticRegression:
   def __init__(self, learning_rate: float=0.01, n_iterations: int=1000, threshold: float=0.5):
       self.learning_rate = learning_rate
       self.n_iterations = n_iterations
       self.weights = None
       self.bias = None
       self.threshold = threshold
 
   def sigmoid(self, z):
       return 1 / (1 + np.exp(-z))
 
   def fit(self, X, y):
       n_samples, n_features = X.shape
       self.weights = np.zeros(n_features)
       self.bias = 0
 
       for _ in range(self.n_iterations):
           prediction = np.dot(X, self.weights) + self.bias
           y_pred = self.sigmoid(prediction)
 
           dw = (1 / n_samples) * np.dot(X.T, (y_pred - y))
           db = (1 / n_samples) * np.sum(y_pred - y)
 
           self.weights -= self.learning_rate * dw
           self.bias -= self.learning_rate * db
 
   def predict(self, X):
       prediction = np.dot(X, self.weights) + self.bias
       y_predicted = self.sigmoid(prediction)
       return [1 if i > self.threshold else 0 for i in y_predicted]

DSWoK — Data Science Well of Knowledge

Explorer

Logistic regression

Assumptions

Maximum Likelihood Estimation (MLE)

Likelihood Function

Log-Likelihood

Gradient Descent for Logistic Regression

Interpretation of Coefficients

Advantages

Limitations

Links

Graph View

Table of Contents

Backlinks