Logistic regression is a statistical method used for binary classification problems, modeling the probability of an instance belonging to a particular class. Common variations of Logistic regression: Binary (two possible outcomes), Multinomial (more than two unordered outcomes), Ordinal (more than two ordered outcomes).
Logistic regression is similar to Linear Regression but uses a logistic (sigmoid) function on the top of it.
p(y=1∣x)=1+e−z1
where z=β0+β1x1+β2x2+...+βnxn
The logit is the logarithm of the odds. Logits transform probabilities (from 0 to 1) to real numbers. The sigmoid function is the inverse of the logit function. It maps any real number to a probability between 0 and 1.
Odds represent the ratio of the probability of an event occurring to the probability of it not occurring.
If p = 0.75, odds = 1−0.750.75 = 3 - This means the event is 3 times more likely to occur than not to occur.
Assumptions
Independence of observations
Little or no multicollinearity among independent variables
Linearity in the logit for continuous variables
Maximum Likelihood Estimation (MLE)
Logistic regression parameters are typically estimated using MLE.
Likelihood Function
L(β)=∏ip(xi)iy∗(1−p(xi))1−yi
Log-Likelihood
ll(β)=Σi[yilog(p(xi))+(1−yi)log(1−p(xi))]
Gradient Descent for Logistic Regression
In practice, gradient descent is often used to find the optimal parameters.
Update rule: β=β−α∗∇J(β)
Where α is the learning rate and ∇J(β) is the gradient of the cost function.
Interpretation of Coefficients
βi: Log-odds increase for a one-unit increase in xi, holding other variables constant
exp(βi): Odds ratio for a one-unit increase in xi. If exp(β) = 1.2, a one-unit increase in x multiplies the odds by 1.2 (20% increase).
Advantages
Simple and interpretable
Provides probability outputs
Less prone to overfitting compared to more complex models
Limitations
Assumes linearity in log-odds
Cannot capture complex, non-linear relationships without feature engineering