Logit probit
Logit probit
Definition:
o Probit Model: A regression model for binary outcomes using the cumulative normal
distribution function.
Key Concept: Both models deal with binary dependent variables, but differ in how they map
the linear predictor to probabilities.
o OLS is unsuitable for binary outcomes because predicted probabilities can fall
outside the range [0, 1].
Advantages:
o Logit and Probit ensure probabilities are constrained within [0, 1].
a) Logit Model
Interpretation:
o Coefficients represent changes in the log-odds of the outcome for a unit change in
the predictor.
b) Probit Model
Comparison:
Probit: Uses the normal cumulative function, making it slightly more complex but useful in
certain scenarios.
Data Requirements:
o Probit: When modeling latent variables or assuming a normal error distribution (e.g.,
psychological or behavioral studies).
a) Data Preparation
b) Model Estimation
Logit: Use Maximum Likelihood Estimation (MLE) to fit the logistic regression.
c) Model Evaluation
Goodness-of-Fit:
o Log-likelihood values.
Classification Accuracy:
Validation:
o Train-test split or cross-validation.
6. Practical Applications
Logit:
Probit:
7. Key Takeaways
Similarities: Both models are for binary dependent variables and rely on MLE for estimation.
Differences:
8. Example
Logit Example:
Predicting whether a customer will purchase a product based on age and income:
P(Purchase=1)=eβ0+β1Age+β2Income1+eβ0+β1Age+β2IncomeP(\text{Purchase} = 1) = \frac{e^{\
beta_0 + \beta_1 \text{Age} + \beta_2 \text{Income}}}{1 + e^{\beta_0 + \beta_1 \text{Age} + \beta_2
\text{Income}}}P(Purchase=1)=1+eβ0+β1Age+β2Incomeeβ0+β1Age+β2Income
Probit Example:
Model Equation:
Here, P(Y=1∣X)P(Y = 1 | X)P(Y=1∣X) represents the probability of Y=1Y = 1Y=1, and βi\beta_iβi are the
regression coefficients.
Outcome:
The predicted probabilities are linear functions of the predictors XXX.
Advantages:
1. Simple to Implement: The LPM uses OLS, which is widely understood and easy to
compute.
3. Baseline Model: It serves as a starting point for binary outcome modeling before
moving to more complex models like Logit or Probit.
Despite its simplicity, the LPM has several limitations that make it less suitable in practice:
LPM assumes linearity, which can result in probabilities less than 0 or greater than 1, making
them invalid as probabilities.
Heteroscedasticity: The variance of the error term depends on the predictors, violating the
assumption of constant variance in OLS.
Non-Normal Errors: The error term is not normally distributed because the dependent
variable is binary.
LPM cannot capture the nonlinear nature of probabilities (which should flatten near 0 and 1).
This leads to inaccurate estimates for predictors with extreme values.
e) Lack of Robustness
The LPM is sensitive to specification errors and performs poorly compared to Logit and
Probit models.
Nonlinear
Cannot capture Captures Captures
Relationships
In terms of odds/log-
Interpretation Easy In terms of z-scores
odds
Small Datasets: In some cases, LPM may serve as a useful baseline model if the dataset is
small and computational resources are limited.
6. Example
LPM Equation:
Interpretation:
o For every additional unit increase in income, the probability of purchasing increases
by 5%.
o For every additional year of age, the probability of purchasing decreases by 2%.
Limitation: If Income=100\text{Income} = 100Income=100, the probability might exceed 1,
which is invalid.
7. Key Takeaways
LPM is a simple, easy-to-interpret model for binary outcomes but has significant limitations
due to its linearity and inability to handle probabilities properly.
It is rarely used in practice, with Logit and Probit models being preferred for their robustness
and realistic probability predictions.
Would you like additional examples or help creating exercises to explore these models further?
The Logit Model uses an exponential form to express the relationship between the predictors (XXX)
and the probability of an outcome (PPP) because probabilities are bounded between 0 and 1. Here's
a simple explanation:
In a linear model (P=β0+β1XP = \beta_0 + \beta_1 XP=β0+β1X), probabilities can exceed the
valid range of [0, 1], which doesn't make sense.
A nonlinear transformation like the exponential form ensures probabilities remain within
the valid range.
This logistic function maps any value of XXX (from −∞-\infty−∞ to +∞+\infty+∞) into
probabilities between 0 and 1.
Log-Odds Transformation: The logit model works with the natural log of the odds (log-odds):
Logit(P)=ln(P1−P)=β0+β1X\text{Logit}(P) = \ln\left(\frac{P}{1-P}\right) = \beta_0 + \beta_1
XLogit(P)=ln(1−PP)=β0+β1X The exponential function reverses the log transformation to
return to odds.
4. Key Intuition
Linear in Log-Odds: The relationship between predictors (XXX) and log-odds is linear.
Nonlinear in Probability: The exponential form converts linear log-odds into valid
probabilities.
Interpretation: The coefficient β1\beta_1β1 reflects how a unit change in XXX multiplies the
odds by eβ1e^{\beta_1}eβ1.
5. Simple Example
Summary
The exponential form is used in the logit model to ensure valid probabilities while modeling
a linear relationship in log-odds.
In the logit model (and logistic regression), the terms −z-z−z and +z+z+z appear because of the
nature of the logistic function used to model probabilities. Let’s break this down step by step:
The logistic function for the probability PPP of an event happening (Y=1Y = 1Y=1) is:
Here, z=β0+β1X1+β2X2+⋯+βkXkz = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_k X_kz=β0
+β1X1+β2X2+⋯+βkXk, which is the linear combination of predictors.
2. Why −z-z−z?
The expression for P(Y=0)P(Y = 0)P(Y=0) can also be rewritten using −z-z−z instead of +z+z+z:
So, instead of using zzz, we use −z-z−z to simplify the expression for P(Y=0)P(Y = 0)P(Y=0).
−z-z−z represents the log-odds of Y=0Y = 0Y=0 (the event not happening).
For positive zzz (+z+z+z), P(Y=1)>0.5P(Y = 1) > 0.5P(Y=1)>0.5, meaning the event is more
likely.
For negative zzz (−z-z−z), P(Y=1)<0.5P(Y = 1) < 0.5P(Y=1)<0.5, meaning the event is less likely.
This symmetry ensures that probabilities stay between 0 and 1 regardless of whether zzz is positive
or negative.
5. Key Takeaways
−z-z−z and +z+z+z appear due to the mathematical properties of the logistic function,
ensuring valid probabilities.
+z+z+z: Used for the probability of the event happening (Y=1Y = 1Y=1).
−z-z−z: Used for the probability of the event not happening (Y=0Y = 0Y=0).
Let’s break down the steps leading to the equation Pi1−Pi=eZi\frac{P_i}{1 - P_i} = e^{Z_i}1−PiPi=eZi
and understand how it is derived.
1. The Logistic Function
where ZiZ_iZi is the linear predictor (e.g., Zi=β0+β1X1+…Z_i = \beta_0 + \beta_1 X_1 + \dotsZi=β0+β1
X1+…).
The odds of owning a house (the ratio of PiP_iPi to 1−Pi1 - P_i1−Pi) is:
Pi1−Pi\frac{P_i}{1 - P_i}1−PiPi
4. Final Result
This equation shows that the odds ratio increases exponentially with ZiZ_iZi.
Key Takeaway
The key step is recognizing that PiP_iPi and 1−Pi1 - P_i1−Pi share the same denominator (1+eZi1 +
e^{Z_i}1+eZi), which cancels out when you compute their ratio.
4o
What Does Odds Ratio Mean?
The odds ratio (OR) is a way to compare the likelihood of an event happening to the likelihood of it
not happening. It provides a measure of association between a predictor (e.g., income, education)
and the probability of an outcome (e.g., owning a house).
Definition
Odds are the ratio of the probability of an event happening (PPP) to the probability of it not
happening (1−P1 - P1−P):
o The odds ratio tells us how much more likely (or less likely) an event is to happen
when a predictor changes.
o For example, if the odds ratio is 2, it means the event is twice as likely under the
given condition.
2. Interpretable Impact:
3. Nonlinear Relationships:
o Probabilities are bounded between 0 and 1, but odds and odds ratios can take any
positive value, making them more suitable for modeling relationships with
predictors.
Key Intuition
Odds: "If the probability of owning a house is 0.8, the odds are 0.80.2=4\frac{0.8}{0.2} =
40.20.8=4." This means owning a house is 4 times more likely than not owning one.
o Odds of owning a house for Group A: 4 (owning 4 times more likely than not).
eβ1e^{\beta_1}eβ1: The odds ratio, showing how the odds change for a one-unit increase in
XXX.
Example:
Interpretation: A one-unit increase in XXX doubles the odds of the event occurring.
In cases where probabilities are too small or too large to compare directly, odds ratios help
scale the impact.
Summary
Odds Ratio: A measure of how much the odds of an event change with a predictor.
Why Use It: It simplifies interpretation in logistic regression and provides a consistent scale
for comparing effects across different predictors.