Logistic regression is a statistical method used for binary classification problems, modeling the probability of an instance belonging to a particular class. Common variations of Logistic regression: Binary (two possible outcomes), Multinomial (more than two unordered outcomes), Ordinal (more than two ordered outcomes).
Logistic regression is similar to Linear Regression but uses a logistic (sigmoid) function on the top of it.


where

Logit function:

The logit is the logarithm of the odds. Logits transform probabilities (from 0 to 1) to real numbers. The sigmoid function is the inverse of the logit function. It maps any real number to a probability between 0 and 1.

Odds represent the ratio of the probability of an event occurring to the probability of it not occurring.
If p = 0.75, odds = = 3 - This means the event is 3 times more likely to occur than not to occur.

Assumptions

  1. Independence of observations
  2. Little or no multicollinearity among independent variables
  3. Linearity in the logit for continuous variables

Maximum Likelihood Estimation (MLE)

Logistic regression parameters are typically estimated using MLE.

Likelihood Function

Log-Likelihood

Gradient Descent for Logistic Regression

In practice, gradient descent is often used to find the optimal parameters.

Update rule:
Where is the learning rate and is the gradient of the cost function.

Interpretation of Coefficients

  • : Log-odds increase for a one-unit increase in , holding other variables constant
  • : Odds ratio for a one-unit increase in . If = 1.2, a one-unit increase in multiplies the odds by 1.2 (20% increase).

Advantages

  • Simple and interpretable
  • Provides probability outputs
  • Less prone to overfitting compared to more complex models

Limitations

  • Assumes linearity in log-odds
  • Cannot capture complex, non-linear relationships without feature engineering