0% found this document useful (0 votes)
3 views

Chapter 12

Chapter 12 discusses simple linear regression, focusing on establishing relationships between variables and forecasting new observations. It outlines the steps to build a regression model, including correlation analysis and fitting a linear model, while also addressing assumptions about error terms. Additionally, it highlights evaluation metrics for regression models, such as the coefficient of determination and error measures, emphasizing the importance of visual tools for assumption validation.

Uploaded by

24006871
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter 12

Chapter 12 discusses simple linear regression, focusing on establishing relationships between variables and forecasting new observations. It outlines the steps to build a regression model, including correlation analysis and fitting a linear model, while also addressing assumptions about error terms. Additionally, it highlights evaluation metrics for regression models, such as the coefficient of determination and error measures, emphasizing the importance of visual tools for assumption validation.

Uploaded by

24006871
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Chapter 12.

Simple regression Nguyen Thi Thu Van - August 3, 2023


Two main objectives of a linear regression model:  Establish if there is a linear relationship between two variables by fitting a linear equation to observed data. More precisely, establish if there is a statistically significant relationship between the two variables.
 Forecast new observations: Can we use what we know about the relationship to forecast unobserved values?
Variable’s roles: One variable, here’s 𝑥, is considered to be the independent/explanatory/predictor and other is considered as a dependent/response variable, here’s 𝑦.
Two basic steps to build a linear regression model: (1) Before attempting to fit a linear model to observed data, we should first determine whether or not there is a relationship between the variables of interest. This does not necessarily imply that one variable causes the other
(for example, higher SAT scores do not cause higher college grades), but that there is some significant association between the two variables [Correlation analysis] (2) Fit a linear model to observed data [Regression models].
Correlation Analysis Regression models
Even though the error term cannot be observable, we assume
that:
(A1) The errors are normally distributed.
(A2) The errors have constant variance 𝜎 2
(A3) The errors are independent of each other.

[1] Scatter plot can be a helpful tool in determining Linear model 𝒚 = 𝜷𝟎 + 𝜷𝟏 × 𝒙 + 𝝐 We measure model’s fit by comparing the variance we can explain relative to the variance we cannot explain.
the strength of the relationship between 2 variables. The random error 𝜖 used in the model is due to the fact that other unspecified 𝑉𝑎𝑟(𝑦) = 𝑉𝑎𝑟(𝛽0 + 𝛽1 × 𝑥 + 𝜖) = 𝑉𝑎𝑟(𝛽1 𝑥) + 𝑉𝑎𝑟(𝜖)
variables also may affect 𝑌 and there may be measurement error in 𝑌.
Sample correlation coefficient (also called the Simple regression equation: 𝑬(𝒀|𝒙) = 𝜷𝟎 + 𝜷𝟏 × 𝒙 Estimated regression model 𝑦̂ = 𝑏0 + 𝑏1 𝑥
Pearson correlation coefficient) Evaluation metrics can be used for regression: Maximum error 𝐸∞ = max |𝑦̂𝑖 − 𝑦𝑖 | Ordinary least squares method (OLS) [Carl F. Gauss,
1≤𝑖≤𝑛
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ ) × (𝑦𝑖 − 𝑦̅) 1809]
𝑟= 1
Average error 𝐸1 = ∑𝑛𝑖=1 |𝑦̂𝑖 − 𝑦𝑖 |
√∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 √∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 𝑛
𝑬𝟐 is minimized when
If 𝑟 = 0 then there is seemly no linear correlation Root-mean square error ∑𝑛 𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )×∑𝑖=1(𝑦𝑖 −𝑦
̅)
𝑏1 = ∑𝑛 2
and 𝑏0 = 𝑦̅ − 𝑏1 𝑥̅
between two variables 𝑥 and 𝑦. 𝑖=1(𝑥𝑖 −𝑥̅ )
𝑛 𝑛
1 1 ∑𝑛 ∑𝑛
𝐸2 = √ ∑(𝑦𝑖 − 𝑦̂) 2 = √ ∑(𝑦 − 𝑏 − 𝑏 𝑥 )2 𝑖=1 𝑥𝑖 𝑖=1 𝑦𝑖
𝑖 𝑖 0 1 𝑖 where 𝑥̅ = ; 𝑦̅ =
𝑛 𝑛 𝑛 𝑛
𝑖=1 𝑖=1

[2] Test for significant correlation using Student’s t Source of variation in 𝑦 Total variation about the mean Variation explained by regression Unexplained or error Another evaluation metric for regression is called
Hypotheses: 𝐻0 : 𝜌 = 0; 𝐻1 : 𝜌 ≠ 0 variation Coefficient of determination
𝑛 𝑛
𝑛−2 In a regression, we seek to explain the + ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̂)
𝑖
2
≡ 𝑆𝑆𝑅 𝑆𝑆𝐸
Test statistic: 𝑡𝑐𝑎𝑙𝑐 = 𝑟√ ∑(𝑦𝑖 − 𝑦̅)2 ≡ 𝑆𝑆𝑇 = ∑(𝑦̂𝑖 − 𝑦̅)2 ≡ 𝑆𝑆𝑅 𝑅2 = =1−
1−𝑟 2
variation in the dependent variable 𝑆𝑆𝑇 𝑆𝑆𝑇
𝑆𝑆𝐸
𝑖=1 𝑖=1
compare with 𝑡𝑐𝑟𝑖𝑡 using 𝑑. 𝑓. = 𝑛 − 2 This number represents the percent of variation explained.
around its mean.
[3] Critical value for the correlation coefficient Test for significance 𝑆𝑆𝐸 Note that in a simple
If the fit is good, SSE is relatively small compared to SST. A measure of overfit is standard error 𝑠𝑒 = √
𝑛−2
regression, the F-test
𝑡𝑐𝑟𝑖𝑡 Coefficient Hypotheses Test statistic Confidence interval for 𝒚 always yields the same
𝑟𝑐𝑟𝑖𝑡 = with 𝑑. 𝑓. = 𝑛 − 2
2
√𝑡𝑐𝑟𝑖𝑡 +𝑛−2 Slope 𝐻0 : 𝛽1 = 0; 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑠𝑙𝑜𝑝𝑒−0 𝑏1 𝑠𝑒 𝑏1 ± 𝑡𝛼 × 𝑠𝑏1
𝑡𝑐𝑎𝑙𝑐 = = 𝑠𝑏1 = 2
p-value as a two-tailed t
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑠𝑙𝑜𝑝𝑒 𝑆𝑏1
Compare 𝑟 to 𝑟𝑐𝑟𝑖𝑡 . 𝐻1 : 𝛽1 ≠ 0 √∑𝑛1(𝑥𝑖 − 𝑥̅ )2
test for zero slope,
Intercept 𝐻0 : 𝛽0 = 0; 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 − 0 𝑏0 𝑏0 ± 𝑡𝛼 × 𝑠𝑏0 which in turn always
𝑡𝑐𝑎𝑙𝑐 = = 1 (𝑥̅ )2 2
If r is not between the positive and negative critical 𝐻1 : 𝛽0 ≠ 0 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 𝑆𝑏0 𝑠𝑏0 = 𝑠𝑒 √ + 𝑛 gives the same p-value
𝑛 ∑1 (𝑥𝑖 − 𝑥̅ )2
values, then the correlation coefficient is significant. If as a two-tailed test for
ANOVA
r is significant, then you may want to use a line for zero correlation. The
prediction. relationship between the
test statistics is
2
𝐹𝑐𝑎𝑙𝑐 = 𝑡𝑐𝑎𝑙𝑐

Caveat: In large samples, small correlations may be A few assumptions about the random error term 𝜀 are made when we use linear regression to fit a line to data [1: The errors are normally distributed. 2: The errors have constant variance. 3: The errors are
significant, even if the scatter plot shows little independent.] Because we cannot observe the error, we must rely on the residuals 𝑒1 = 𝑦1 − 𝑦
̂,
1 𝑒2 = 𝑦2 − 𝑦
̂,
2 … , 𝑒𝑛 = 𝑦𝑛 − 𝑦
̂𝑛 from the estimated regression for clues about possible violations of these assumptions.

evidence of linearity. Thus, a significant correlation While formal tests exist for identifying assumption violations, many analysts rely on simple visual tools to help them determine when an assumption has not been met and how serious the violation is. For more details,
may lack practical importance. please consult in the textbook.
Chapter 11: Analysis of Variance (ANOVA) Nguyen Thi Thu Van - November 25, 2023
One-way analysis of variance One-way ANOVA is a technique used to determine whether there are any statistically significant differences between the means of three or more independent (unrelated) groups.
For example, to allocate resources and fixed costs correctly, hospital management needs to test whether a
patient’s length of a stay (LOS) depends on the diagnostic-related group (DRG) code. Consider the case of a
bone fracture. LOS is a numerical response variable (measured in hours). The hospital organizes the data by
using five diagnostic codes for type of fracture (facial, radius or ulna, hip or femur, other lower extremity, all
other). Type of fracture is a categorical variable.

Group means Overall mean Sum of squares Sum of squares between groups Sum of squares within groups
𝑛𝑗 𝑐 𝑛𝑗 𝑐 𝑛𝑗 𝑐 𝑐 𝑛𝑗 𝑐 𝑐 𝑛𝑗
∑𝑖=1 𝑦𝑖𝑗 1 1 ∑𝑖=1 𝑦𝑖𝑗 1 2
𝑦̅𝑗 = 2 2
𝑦̅ = ∑ ∑ 𝑦𝑖𝑗 = ∑ 𝑛𝑗 × = ∑ 𝑛𝑗 𝑦̅𝑗 ∑ ∑(𝑦𝑖𝑗 − 𝑦̅) = ∑ 𝑛𝑗 × (𝑦̅𝑗 − 𝑦̅) + ∑ ∑(𝑦𝑖𝑗 − 𝑦̅𝑗 )
𝑛𝑗 𝑛 𝑛 𝑛𝑗 𝑛
𝑗=1 𝑖=1 𝑗=1 𝑗=1 𝑗=1 𝑖=1 𝑗=1 𝑗=1 𝑖=1

Explained by treatments Unexplained random error

Treatments 𝑻𝟏 𝑻𝟐 … 𝑻𝒄
𝒚𝟏𝟏 𝒚𝟏𝟐 … 𝒚𝟏𝒄
𝒚𝟐𝟏 𝒚𝟐𝟐 … 𝒚𝟐𝒄

Group size 𝒏𝟏 observations 𝒏𝟐 observations … 𝒏𝒄 observations
Mean ̅̅̅
𝒚𝟏 ̅̅̅
𝒚𝟐 … ̅̅̅
𝒚𝒄

If the treatment means do not differ ANOVA assumptions: F – test method [Ronald A. Fisher in the 1930s]
greatly from the grand mean, SSB will be • Observations on Y are independent. Step 1: State the hypotheses: 𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑐 vs 𝐻1: Not all the means are equal
relatively small and SSE will be relatively • Populations being sampled are normal. Step 2: Specify the decision rule:
large (and conversely). The sums SSB and • Populations being sampled have equal variances. • Numerator: 𝑑𝑓1 = 𝑐 − 1;
SSE may be used to test the hypothesis In general, ANOVA’s are considered to be fairly robust against • Denominator: 𝑑𝑓2 = 𝑛 − 𝑐
that the treatment means differ from the violations of the equal variances assumption as long as each • Find the critical value 𝐹𝑐𝑟𝑖𝑡 = 𝐹. 𝐼𝑁𝑉. 𝑅𝑇(𝛼, 𝑑𝑓1 , 𝑑𝑓2)
grand mean. However, to adjust for group group has the same sample size. However, if the sample sizes 𝑀𝑆𝐵 𝑀𝑆𝐴
Step 3: Calculate the 𝐹 = ≡ use ANOVA table of interest and make the decision.
𝑀𝑆𝐸 𝑀𝑆𝐸
sizes, we first divide each sum of squares are not the same and this assumption is severely violated, you
Otherwise, find 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝐹. 𝐷𝐼𝑆𝑇. 𝑅𝑇(𝐹, 𝑑𝑓1, 𝑑𝑓2 )
by its degrees of freedom. could instead run a Kruskal-Wallis Test, which is the non-
parametric version of the one-way ANOVA.
Post hoc tests with ANOVA: To determine exactly which group means are different, we can perform a Test for homogeneity of variances: ANOVA assumes that observations on the response variable are from normally distributed populations that have the
Tukey post hoc test [John W. Tukey in the 1930s]. Similar to a two-sample t-test except that it pools the same variance. However, only few populations meet these requirements perfectly and unless the sample is quite large, a test for normality is impractical.
variances for all 𝑐 samples 𝐻0 : 𝜇𝑗 = 𝜇𝑘 𝐻1: 𝜇𝑗 ≠ 𝜇𝑘 But we can easily test the assumption of homogeneous (equal) variances: Hartley’s test to check for unequal variances for 𝑐 groups: 𝐻0 : 𝜎12 = 𝜎22 = ⋯ =

- The 𝑇 −statistic 𝑇𝑐𝑎𝑙𝑐 =


̅̅̅−𝑦
|𝑦 𝑗 ̅̅̅̅|
𝑘
; 𝑇𝑐𝑟𝑖𝑡 = 𝑇𝑐,𝑛−𝑐 [c = groups; n = the overall sample size. The 𝜎𝑐2 ; 𝐻1 : Not all the variances are equal.
1 1 2
𝑠𝑚𝑎𝑥 𝑛
√𝑀𝑆𝐸( + )
𝑛𝑗 𝑛𝑘 If 𝑐 = 2 then use 𝐹 test to compare variances [In fact, it is a two tailed t- test]. Otherwise, we use Hartley’s test 𝐻 = 2 with 𝑑𝑓1 = 𝑐; 𝑑𝑓2 = − 1 [𝑑𝑓2 is
𝑠𝑚𝑖𝑛 𝑐

𝑑. 𝑓. for numerator is 𝑐; for denominator is 𝑛 − 𝑐 ] approximated to the next lower integer if it is not an integer]
- We reject 𝐻0 if 𝑇𝑐𝑎𝑙𝑐 > 𝑇𝑐𝑟𝑖𝑡 We reject 𝐻0 if 𝐻𝑐𝑎𝑙𝑐 > 𝐻𝑐𝑟𝑖𝑡
Table T Table H
Too-way analysis of variance Two-way ANOVA is a technique used to determine the effect of two nominal predictor variables on a continuous outcome variable.
For example, a numerical response variable (paint viscosity) may vary both by temperature (Factor A) and by
paint supplier (Factor B). Three different temperature settings (A1, A2, A3) were tested on shipments from three
different suppliers (B1, B2, B3).
Chapter 10. Two-sample hypothesis tests Nguyen Thi Thu Van, November 26, 2022
Two-sample tests compare two sample estimates with each other, whereas one-sample tests compare a sample estimate with a non-sample benchmark or target (a claim or prior belief about a population parameter).
For example. A new bumper is installed on selected vehicles in a corporate fleet. During a 1-year test period, 12 vehicles with the new bumper were involved in accidents, incurring mean damage of $1,101 with a standard deviation of $696. During the same year, 9 vehicles
with the old bumpers were involved in accidents, incurring mean damage of $1,766 with a standard deviation of $838. Did the new bumper significantly reduce damage? Did it reduce significantly variation?
Basis of Two-Sample Tests. The logic of two-sample tests is based on the fact that two samples drawn from the same population may yield different
estimates of a parameter due to chance. For example, exhaust emission tests could yield different results for two vehicles of the same type. Only if the
two sample statistics differ by more than the amount attributable to chance can we conclude that the samples came from populations with different
parameter values, as illustrated the adjacent picture.

Two-sample tests are especially useful because they possess a built-in point of comparison. You can think of many situations where two groups are to
be compared: Before versus after; Old versus new; or Experimental versus control. Sometimes we don’t really care about the actual value of the
population parameter, but only whether the parameter is the same for both populations.
Comparing two means Comparing two proportions Comparing two variances

𝒙𝟏 𝒙𝟐
Variances known Variances unknown 𝒑𝟏 = , 𝒑𝟐 = Assuming the populations are normal, the test
𝒏𝟏 𝒏𝟐
𝝈𝟐𝟏 , 𝝈𝟐𝟐 Equal Unequal statistic follows the F distribution, named for
Ronald A. Fisher in the 1930s
𝑥1 − ̅̅̅
̅̅̅ 𝑥2 ̅̅̅
𝑥1 − ̅̅̅
𝑥2 ̅𝑥̅̅1̅−𝑥
̅̅̅2̅ 𝑝1 − 𝑝2 𝑠12
𝑧𝑐𝑎𝑙𝑐 = 𝑡𝑐𝑎𝑙𝑐 = If n1 , 𝑛2 ≥ 30, then 𝑧𝑐𝑎𝑙𝑐 = 𝑧𝑐𝑎𝑙𝑐 =
𝒔𝟐 𝒔𝟐 1 1 𝐹𝑐𝑎𝑙𝑐 =
√ 𝟏
+𝑛𝟐 √𝑝𝑐 (1 − 𝑝𝑐 ) ( + ) 𝑠22
𝝈𝟐 𝝈𝟐𝟐 𝒔𝟐𝒑 𝒔𝟐𝒑 𝑛1 2 𝑛1 𝑛2
√ 𝟏 + √ +
𝑛1 𝑛2 𝑛1 𝑛2 Critical values
̅𝑥̅̅1̅−𝑥
̅̅̅2̅
If n1 , 𝑛2 < 30, then 𝑡𝑐𝑎𝑙𝑐 = 1
𝒔𝟐 𝒔𝟐
√ 𝟏+ 𝟐 𝐹𝑅 = 𝐹𝑑𝑓1 ,𝑑𝑓2 ; 𝐹𝐿 =
𝑛1 𝑛2
𝐹𝑑𝑓2 ,𝑑𝑓1

Pooled variance We will not pool the variances Pooled proportion


(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22 𝑥1 + 𝑥2
𝑠𝑝2 = 𝑝𝑐 =
𝑛1 + 𝑛2 − 2 𝑛1 + 𝑛2

𝑑. 𝑓. = 𝑛1 + 𝑛2 − 2 𝒔𝟐 𝟐 2 𝑑𝑓1 = 𝑛1 − 1
( 𝟏 𝒔𝟐 )
+
𝑛1 𝑛2
𝑑. 𝑓. = 𝟐 𝟐 𝑑𝑓2 = 𝑛2 − 1
𝒔𝟐 𝒔𝟐
( 𝟏) ( 𝟐)
𝑛1 𝑛2
𝑛1 −1
+𝑛
2−1

or Welch’s adjusted 𝑑. 𝑓. = min{𝑛1 − 1, 𝑛2 − 1}

1 1 𝑠12 𝑠22 𝑝1 (1 − 𝑝1 ) 𝑝2 (1 − 𝑝2 )
(𝑥 𝑥2 ± 𝑡𝛼/2√𝑠𝑝2√
̅̅̅1 − ̅̅̅) + (𝑥 𝑥2 ± 𝑡𝛼/2√ +
̅̅̅1 − ̅̅̅) (𝑝1 − 𝑝2 ) ± 𝑧𝛼/2√ +
𝑛1 𝑛2 𝑛1 𝑛2 𝑛1 𝑛2

To find 𝑧𝑐𝑟𝑖𝑡 , look up in To find 𝑡𝑐𝑟𝑖𝑡 , look up in Table D or, Excel function. To find 𝐹𝑐𝑟𝑖𝑡 , we look up in Table F, or Excel
Table C. function.
𝛼
𝐹𝑅 ≡ 𝐹. 𝐼𝑁𝑉. 𝑅𝑇 ( , 𝑑𝑓1, 𝑑𝑓2)
2
1
𝐹𝐿 ≡ 𝛼
𝐹. 𝐼𝑁𝑉. 𝑅𝑇 ( , 𝑑𝑓2, 𝑑𝑓1)
2
Paired When sample data consist of n matched pairs, a different approach is required. If the same individuals are observed twice 𝑛
∑ 𝑑 ∑(𝑑 −𝑑) ̅ 2 𝑑̅−𝜇𝑑
𝑑̅ = 1 𝑖 ; 𝑠𝑑 = √ 𝑖 ;𝑡= 𝑠𝑑
𝑛 𝑛−1
𝒕 − test Student’s t distribution but under different circumstances, we have a paired comparison. For example, weekly sales of Snapple at 12 Walmart √𝑛

stores are compared before and after installing a new eye-catching display. Did the new display increase sales?
It is worth bearing in mind three questions (1) Are the populations skewed? Are there outliers? This question refers to the assumption of normal populations, upon which the tests are based.
when you are comparing two samples (2) Are the sample sizes large (𝒏 ≥ 𝟑𝟎)? The t test is robust to non-normality as long as the samples are not too small and the populations are not too skewed thanks to the Central Limit Theorem.
(3) Is the difference important as well as significant? A small difference in means or proportions could be statistically significant if the sample size is large because the standard error gets smaller as the sample size gets larger.
Chapter 9. One-sample hypothesis tests Nguyen Thi Thu Van - August 15, 2023
Data are used in business every day to support Hypothesis = A premise or claim that we want to test.
marketing claims, help managers make Steps in hypothesis testing. Step 1: State the hypotheses to be tested. 𝐻0 : Null hypothesis = a currently accepted value for a parameter which is constructed to be that of the status quo versus 𝐻1 : Alternative hypothesis =
decisions, and measure business improvement. research hypothesis which involves the claim to be tested. The two statements are hypotheses because the truth is unknown. Efforts will be made to reject the null hypothesis.
 Has a ski resort decreased its average 𝑯𝟎 is true 𝑯𝟎 is false Key terms What is it? Symbol Definition Also called
response time to accidents? Reject 𝑯𝟎 Type I error Correct decision Type I error Reject a true hypothesis 𝛼 𝑃(𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒) False positive
 Did the proportion of satisfied car repair Fail to reject 𝑯𝟎 Correct decision Type II error Type II error Fail to reject a false hypothesis 𝛽 𝑃(𝑓𝑎𝑖𝑙 𝑡𝑜 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒) False negative
customers increase after providing more Power Correctly reject a false hypothesis 1−𝛽 𝑃(𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 |𝐻0 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒) Sensitivity
training for the employees?
Step 2: Specify what level of inconsistency with the data will lead to rejection of the hypothesis. This is called a decision rule [Jerzy Neyman in the 1930s]
Savvy businesspeople use data and many of the
To specify our decision rule, we will define an “extreme” outcome, called the critical value, is the boundary between the two regions (reject 𝐻0 and do not reject 𝐻0 ). The area under the sampling distribution curve that defines
statistical tools to answer these types of
an extreme outcome is called the rejection region.
questions. Apart from tools that you have
Step 3: Collect data and calculate necessary statistics to test the hypothesis and then make a decision. Should the hypothesis be rejected or not?
learned so far, hypothesis testing is one of the
 Collect data and calculate a test statistic.
most widely used statistical tools.
 Measures the difference between the sample statistic and the hypothesized parameter: A test statistic that falls in the shaded region will cause rejection 𝐻0 . Otherwise we fail to reject 𝐻0
Step 4: Take action based on the decision. This last step—taking action—requires experience and expertise on the part of the decision maker. Because resources are always scarce, we cannot do everything and therefore a
decision requires understanding not only statistical significance, but also the practical importance of the potential improvement: the magnitude of the effect and its implications for product durability, customer satisfaction,
Hypothesis testing is used in science and
budgets, cash flow, and staffing.
business to test assumptions and theories and
In addition, we can use p-value method [Ronald A. Fisher in the 1930s]: p-value that is less than α will cause rejection of H0 . In this case, we say the conclusion is statistically significance in terms of supporting the theory
guide managers when facing decisions.
being investigated not due to chance. The area of the nonrejection region (white area) is 1 − α.

Methods used in one-sample hypothesis tests One-sample hypothesis tests for mean One-sample hypothesis tests for proportion
If population variance is known If population variance is unknown Normal distribution
𝑛 ≥ 30 𝑛 < 30 𝒏𝝅𝟎 ≥ 𝟏𝟎 and 𝒏(𝟏 − 𝝅𝟎 ) ≥ 𝟏𝟎
𝑥̅ − 𝜇0 𝑥̅ −𝜇0 𝑥̅ −𝜇0 𝑥̅ − 𝜇0 𝑝 − 𝜋0 𝑝 − 𝜋0
𝑧𝑐𝑎𝑙𝑐 = 𝑧𝑐𝑎𝑙𝑐 = [𝑜𝑟 𝑎𝑙𝑠𝑜 𝑡𝑐𝑎𝑙𝑐 = ] 𝑡𝑐𝑎𝑙𝑐 = 𝑧𝑐𝑎𝑙𝑐 = =
𝜎 𝑠 𝑠 𝑠 𝜎𝑝
√𝑛 √𝑛 √ 0 (1 − 𝜋0 )
𝜋
√𝑛 √𝑛 𝑛
𝛼
Two-tailed 𝑯𝟎 : 𝝁 = 𝝁𝟎 Confident 𝑧𝛼/2 ≡ ±𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(𝛼/2) or look up in Table C 𝑡𝛼/2 ≡ ±𝑇. 𝐼𝑁𝑉. 2𝑇(𝛼, 𝑑𝑓) ≡ ±𝑇. 𝐼𝑁𝑉( , 𝑑𝑓) or look up in Table D with 𝑑𝑓 = 𝑛 − 1
2 𝜋0 (1 − 𝜋0 )
test 𝑯𝟏 : 𝝁 ≠ 𝝁𝟎 Interval 𝑝 ± 𝑧𝛼 √
2 𝑛

𝒑 −value p − value ≡ 2 × 𝑃(𝑍 > |𝑧𝑐𝑎𝑙𝑐 |) = 2 × [1 − 𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇(|𝑧𝑐𝑎𝑙𝑐 |,1)] =2 × 𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇(−|𝑧𝑐𝑎𝑙𝑐 |,1) 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 2 × 𝑇. 𝐷𝐼𝑆𝑇(−|𝑡|, 𝑑𝑓) p − value = 2 × [𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇(−|𝑧𝑐𝑎𝑙𝑐 |,1)]
Left-tailed 𝑯𝟎 : 𝝁 ≥ 𝝁𝟎 CI 𝑧𝛼 ≡ 𝑧𝑐𝑟𝑖𝑡 = −|𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(𝛼)| 𝑡𝛼 ≡ −|𝑇. 𝐼𝑁𝑉(𝛼, 𝑑𝑓)| 𝑧𝛼 ≡ 𝑧𝑐𝑟𝑖𝑡 = −|𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(𝛼)|
test 𝑯𝟏 : 𝝁 < 𝝁𝟎 𝒑 −value p − value ≡ P(Z < −|zcalc |) = 𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇(−|𝑧𝑐𝑎𝑙𝑐 |,1) 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑇. 𝐷𝐼𝑆𝑇(−|𝑡𝑐𝑎𝑙𝑐 |, 𝑑𝑓) p-value = 𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇(−|𝑧𝑐𝑎𝑙𝑐 |,1)
Right 𝑯𝟎 : 𝝁 ≤ 𝝁𝟎 CI 𝑧𝛼 ≡ 𝑧𝑐𝑟𝑖𝑡 = | 𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(𝛼)| 𝑡𝛼 ≡ |𝑇. 𝐼𝑁𝑉(𝛼, 𝑑𝑓)| 𝑧𝛼 ≡ 𝑧𝑐𝑟𝑖𝑡 = |𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(𝛼)|
tailed test 𝑯𝟏 : 𝝁 > 𝝁𝟎 𝒑 −value 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≡ 𝑃(𝑍 > |𝑧𝑐𝑎𝑙𝑐 | ) = 1 − 𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇(|𝑧𝑐𝑎𝑙𝑐 | |,1) 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 1 − 𝑇. 𝐷𝐼𝑆𝑇(|𝑡𝑐𝑎𝑙𝑐 |, 𝑑𝑓) p-value = 1 − 𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇(|𝑧𝑐𝑎𝑙𝑐 |,1)
Cautions The confidence interval method described above requires that you specify your rejection criterion in terms of the test statistic before you take a sample. The p-value method is a more flexible approach that is often preferred
by statisticians over the critical value method. It requires that you express the strength of your evidence (i.e., your sample) against the null hypothesis in terms of a probability. The p-value is a direct measure of the likelihood
of the observed sample under 𝐻0 . The p-value answers the following question: If the null hypothesis is true, what is the probability that we would observe our particular sample mean (or one even farther away from 𝜇0 )?
However, be cautious in usage: p-value is not the probability of the null hypothesis being true! If we get wrong on it, p-value by themselves can cause misleading results.
Topic: Bias and Random Error Nguyễn Thị Thu Vân, September 24, 2023
Estimator 𝑋̅ is a function 𝑋̅ = 𝜃 used to estimate 𝜇. For In Statistics, there
Population are many sources
example:
About N = 230,000
∑𝑛=100
𝑖=1 𝑥𝑖 of error in
workers in various 1) 𝑥̅ = 𝜃(𝑥1 , … , 𝑥100 ) ≡
Sample 𝑥2 𝑛
𝑥1 collecting and
manufacturers 𝑥8 2) 𝑥̅ = 𝜃(𝑥1 , … , 𝑥100 ) ≡ ∑𝑛=100
𝑖=1 𝑤𝑖 𝑥𝑖
n = 100 𝑥3 sampling data.
in HCMC in 2022. workers 𝑥99 𝑥100 s.t. ∑𝑛=100
𝑖=1 𝑤𝑖 = 1 and 𝑤𝑖 ≥ 0
Error can be
What was their yearly 3) and so on.
described as
average savings 𝝁 = ? ̅
𝒙 The value of 𝜃 corresponding to this particular sample is
random or
called an estimate for 𝜇.
systematic.
Random error or sampling error is the difference between an estimate and Systematic error or bias is the difference between the expected value of the
the corresponding population parameter 𝑥̅ − 𝜇 estimator and the population parameter 𝐸(𝑋̅) − 𝜇.

Of course, sampling error is an inevitable risk in statistical sampling but a The simplest example occurs with a measuring device that is improperly
random error. The impact of random error/imprecision can be minimized with calibrated so that it consistently overestimates (or underestimates) the
large sample sizes. measurements by some units.
Precision and accuracy are two ways that scientists think about error:

 Accuracy refers to how close a measurement is to the true or accepted value.


 Precision refers to how close measurements of the same item are to each other.

Below is a diagram that will attempt to differentiate between imprecision and inaccuracy.

Fig 1. Fig 2. Fig 3. Fig 4.


Topic: Bias and Random Error Nguyễn Thị Thu Vân, September 24, 2023
Let look at the figures again. In each figure, please pay attention at the 5 directional arrows to see how these arrows are clustered together [it may disclose
precision/imprecision of the sample] and how they are clustered around the bull’s eye [accuracy/inaccuracy of the sample]!

Fig 1. Accuracy and precision Fig 2. Accuracy and imprecision Fig 3. Inaccuracy and precision Fig 4. Inccuracy and imprecision
As you have evidenced, accuracy and precision are independent to each other. Clearly, error due to imprecision can be minimized with large sample sizes, but
bias is impossible to improve/adjust even for large sample size.
Properties of “good” Estimators
Unbiased estimators. The expected value is equal to the Efficient estimators. The more efficient estimator Consistent estimators. With more
population parameter. For example: If 𝑥1 , 𝑥2 , … , 𝑥𝑛 is a has lower sampling variability than a competing observations, we get closer to the
random sample from a normal population 𝑁(𝜇, 𝜎 2 = 1), estimator. Among all unbiased estimators, we population parameter.
1 prefer the minimum variance estimator.
then 𝑋̅ = 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 is an unbiased estimator of 𝜇 2 + 1.

[I’ll leave it as a homework for you!]

In conclusion, the bottom-line so far is that you must be able to distinguise between systematic error and random error. Furthermore, as you have seen, each
sampling is likely to come out with a different estimate. It is very useful to know a range in which the population parameter/true value will lie with a high
probability. This range will be called the confident interval.
Chapter 7. Continuous probability distribution Nguyen Thi Thu Van - July 30, 2023
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) is the area of the region bounded by the probability
𝑏
density function 𝑓 (𝑥) over [𝑎, 𝑏]. That is
𝑏
= ∫ 𝑓(𝑥)𝑑𝑥
𝑎
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫ 𝑓(𝑥)𝑑𝑥
𝑎

and 𝑃(𝑋 = 𝑐) = 0 for any 𝑐 ∈ [𝑎, 𝑏].


As a result, 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = 𝑃(𝑎 < 𝑋 < 𝑏)

For a continuous random variable, the PDF is an equation that shows the height of the curve 𝑓 (𝑥) Expected value of a continuous variable Variance of a continuous variable Standard deviation
∞ ∞
at each possible value of X.
𝐸(𝑋) ≡ 𝜇 = ∫ 𝑥𝑓(𝑥)𝑑𝑥 𝜎 2 = ∫ (𝑥 − 𝜇)2 𝑓(𝑥)𝑑𝑥 = 𝐸(𝑋2 ) − [𝐸(𝑋)]2 𝜎 = √𝜎 2
−∞ −∞

Uniform distribution Normal distribution Normal approximation distribution Exponential distribution


Normal approximation to Normal approximation to
the Binomial the Poisson.
The continuous uniform distribution 𝑈(𝑎, 𝑏) is The normal or Gaussian distribution When 𝑋 is normally distributed 𝑁(𝜇, 𝜎), the Binomial probabilities may The normal approximation
similar to the discrete uniform distribution when 𝑁(𝜇, 𝜎) [Karl Gauss (1777–1855)] is standardized variable Z has a standard normal be difficult to compute when for the Poisson distribution
the 𝑥 −values cover a wide range [𝑎, 𝑏] symmetric about the mean and showing that distribution. 𝑁(0,1). 𝑛 is large. Instead, we can works best when 𝜆 is fairly
data values near the mean are more frequent use a normal approximation large, say 𝜆 ≥ 10.
Description

in occurrence than data values far from the if 𝑛𝜋 ≥ 10; 𝑛(1 − 𝜋) ≥ 10. Consider the process of customers
mean. Continuity correction: use one half way cutoff points. arriving at a restaurant. If the count of
customer arrivals in a randomly selected
minute has a Poisson distribution, the
distribution of the time between two
customer arrivals will have an
exponential distribution.
Example Assume the weight of a randomly chosen Assume that the number of calories in a McDonald’s Egg McMuffin is a normally distributed In a certain store, there is a On average, 28 patients per Between 2p.m. to 4 p.m., patient
American passenger car is a uniformly random variable with a mean of 290 calories and a standard deviation of 14 calories. What is the .03 probability that the hour arrive in the Foxboro insurance inquiries arrive at Blue Choice
distributed random variable ranging from 2,500 probability that a particular serving contains fewer than 300 calories? scanned price in the bar code 24-Hour Walk-in Clinic on insurance at a mean rate of 2.2 calls per
pounds to 4,500 pounds. What is the probability 𝑋 = the number of calories of a randomly chosen McMuffin. So 𝑋 can take on fractional values scanner will not match the Friday between 6 p.m. and minute. What is the probability of
that a vehicle will weigh less than 3,000 pounds? around the mean. advertised price. The cashier midnight. What is the waiting less than 30 seconds for the next
𝑋 = the weight of a randomly chosen car. So 𝑋 scans 800 items. What is the approximate probability of call? So let 𝑋 be the waiting time for the
can take on fractional values on [2,500, 4,500] probability of at least 20 more than 35 arrivals? next call. Then 𝑋 = [0, ∞)
mismatches?
𝑎 is lower limit and 𝑏 is upper limit 𝜇, 𝜎 𝜆: mean arrival rate per unit of time
or space
1 1 1 𝑥−𝜇 2 𝑧2
PDF 𝑓(𝑥) = over [𝑎, 𝑏] 1 𝑥−𝜇 𝑓(𝑥) = 𝜆𝑒 −𝜆𝑥 over [0, ∞)
𝑏−𝑎 𝑓(𝑥) = 𝑒 −2( 𝜎 ) 𝑓(𝑧) = 𝑒 −
2 ;𝑧=
𝜎
𝜎√2𝜋 √2𝜋

𝑁𝑂𝑅𝑀. 𝐷𝐼𝑆𝑇(𝑥, 𝜇, 𝜎, 0) 𝑁𝑂𝑅𝑀. 𝐷𝐼𝑆𝑇(𝑧, 0) 𝑁𝑂𝑅𝑀. 𝐷𝐼𝑆𝑇(𝑥, 𝜇, 𝜎, 0) 𝐸𝑋𝑃𝑂𝑁. 𝐷𝐼𝑆𝑇(𝑥, 𝜆 ,0)


CDF 𝑥−𝑎 1 𝑥 1 𝑢−𝜇 2
−2 ( 𝜎 ) 1 𝑧 𝑧2 𝑃(𝑋 < 𝑥) = 1 − 𝑒 −𝜆𝑥 𝑖𝑓 𝑥 ≥ 0
𝑃(𝑋 < 𝑥) = 𝑖𝑓 𝑎 ≤ 𝑥 ≤ 𝑏 𝑃(𝑋 < 𝑥) = ∫ 𝑒 𝑑𝑢 𝑃(𝑍 < 𝑧) = ∫ 𝑒− 2 𝑑𝑢
𝑏−𝑎 𝜎√2𝜋 −∞
√2𝜋 −∞

𝑃 = 𝑁𝑂𝑅𝑀. 𝐷𝐼𝑆𝑇(𝑥, 𝜇, 𝜎, 1) 𝑃 = 𝑁𝑂𝑅𝑀. 𝑆. 𝐷𝐼𝑆𝑇(𝑧, 1) 𝑃 = 𝑁𝑂𝑅𝑀. 𝐷𝐼𝑆𝑇(𝑥, 𝜇, 𝜎, 1) 𝑃 = 𝐸𝑋𝑃𝑂𝑁. 𝐷𝐼𝑆𝑇(𝑥, 𝜆 ,1)


𝑥 = 𝑁𝑂𝑅𝑀. 𝐼𝑁𝑉(𝑃, 𝜇, 𝜎) 𝑧 = 𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(𝑃)
Table C1, C2 Table C1, C2
Mean 𝜇 = (𝑎 + 𝑏)/2 𝜇 𝜇=0 𝜇 = 𝑛𝜋 𝜇=𝜆 𝜇 = 1/𝜆
SD 𝜎 = √(𝑏 − 𝑎)2 /12 𝜎 𝜎=1 𝜎 = √𝑛𝜋(1 − 𝜋) 𝜎 = √𝜆 𝜎 = 1/𝜆

Shape Symmetric with no mode Symmetric, mesokurtic, bell-shaped Always right-skewed


Chapter 6. Discrete probability distribution Nguyen Thi Thu Van - July 30, 2023
A random variable is a rule that assigns a numerical value to each outcome in the sample space of a random experiment in
accordance with the following regulation: more than one outcome can be assigned to one numerical value, but two different PDF P(X=x) CDF P(X ≤ x)
Probability distribution
numerical values cannot be represented for the same outcome. for three coin flips 0.875 1.000
1.0 1.0
Possible events X P(X=x) 0.375 0.375 0.500
TTT 0 0.125 0.5 0.125 0.125 0.5 0.125
 Discrete random variables have a countable number of distinct values. For example, tossing a coin three times, the
HTT, THT, TTH 1 0.375 0.0
number of times that the coin comes up head is a random variable 𝑋 = {0,1,2,3} 0.0
HHT, THH, HTH 2 0.375
 Continuous random variables can have any value within an interval and can take on an infinite number of possible HHH 3 0.125
0 1 2 3 0 1 2 3
values (say, X = the average height of a random group of students). 1 Number of heads Number of heads

Expected value of a discrete random Variance of a discrete random variable Standard deviation A discrete probability distribution assigns a probability to each value of a discrete variable and can be described either by
variable 𝑁
𝜎 = √𝜎 2  probability density/mass function (PDF/PMF) that shows the probability of each X–value
𝑁 𝑉𝑎𝑟(𝑋) ≡ 𝜎 2 = ∑(𝑥𝑖 − 𝜇)2 𝑃(𝑥𝑖 )
 cumulative distribution function (CDF) that shows the cumulative sum of probabilities, adding from the smallest to the largest X–value.
𝑖=1
𝐸(𝑋) ≡ 𝜇 = ∑ 𝑥𝑖 𝑃(𝑥𝑖 )
𝑖=1 = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2
Uniform distribution Binomial distribution Poisson distribution Hypergeometric distribution Geometric distribution
Descriptions Uniform distribution describes a random Binomial distribution describes the number of successes Poisson distribution [Siméon-Denis Poisson (1781– Hypergeometric distribution is similar Geometric distribution is related to the
variable with a finite number of equally in 𝑛 independent trials in which each trial is a Bernoulli 1840)] describes the number of events occurring in a to the binomial except that sampling is binomial. It describes the number of
likely consecutive integer values experiment [Jakob Bernoulli (1654–1705)] i.e., the fixed interval of time or space. without replacement. Bernoulli trials until the first success is
from 𝑎 to 𝑏. experiment has only two outcomes: success or fail. For the Poisson distribution to apply, the events must observed.
The success is merely meant to the event of interest. occur randomly and independently over a continuum of
time or space.
Example The daily 3 digits lottery has a uniform On average, 20 percent of the emergency room patients At an outpatient mental health clinic, appointment A statistics textbook chapter contains At Faber University, 15 percent of the
distribution with 1000 equally likely at Greenwood General Hospital lack health insurance. In cancellations occur at a mean rate of 1.5 per day on a 60 exercises, 6 of which are essay alumni (the historical percentage) make a
outcomes range from 000 through 999 a random sample of four patients, what is the probability typical Wednesday. What is the probability that no questions. A student is assigned 10 donation or pledge during the annual tele-
:𝑋 = {000, … , 999}. that two will be uninsured? cancellations will occur on a particular Wednesday? problems. What is the probability that fund. What is the probability that the first
𝑋 = the number of uninsured patients, 𝑋 = {0, 1, 2, 3, 4}. X = the number of cancellations on a particular none of the questions are essay? donation will come within the first 5 calls?
Wednesday, 𝑋 = {0,1,2, … } X = the number of essay questions the 𝑋 = the number of calls make until the first
student receives, 𝑋 = {0,1,2,3,4,5,6} success is achieved, 𝑋 = {1,2,3,4, … }
Parameters 𝑎: lower limit and 𝑏: upper limit 𝑛 : number of trials 𝜆: mean arrivals per unit of time or space 𝑁: population size; 𝜋: probability of success
𝜋: probability of success 𝑛 : sample size
𝑠 : number of successes in population
𝑥: number of successes in sample
PDF/PMF 1 𝑛! 𝜆𝑥 𝑒 −𝜆 𝑠 𝐶𝑥 × 𝑁−𝑠 𝐶𝑛−𝑥 𝑃(𝑋 = 𝑥) = 𝜋(1 − 𝜋)𝑥−1
𝑃(𝑋 = 𝑥) = 𝑃(𝑋 = 𝑥) = 𝜋 𝑥 (1 − 𝜋)𝑛−𝑥 𝑃(𝑋 = 𝑥) = 𝑃(𝑋 = 𝑥) =
𝑏−𝑎+1 𝑥! (𝑛 − 𝑥)! 𝑥! 𝑁 𝐶𝑛

𝐵𝐼𝑁𝑂𝑀. 𝐷𝐼𝑆𝑇(𝑥, 𝑛, 𝜋 ,0) or Table A 𝑃𝑂𝐼𝑆𝑆𝑂𝑁. 𝐷𝐼𝑆𝑇(𝑥, 𝜆, 0) or Table B 𝐻𝑌𝑃𝐺𝐸𝑂𝑀. 𝐷𝐼𝑆𝑇(𝑥, 𝑛, 𝑠, 𝑁, 0) 𝐸𝑋𝑃𝑂𝑁. 𝐷𝐼𝑆𝑇(𝑥, 𝜆 ,0)
CDF 𝑥−𝑎+1 𝑃(𝑋 ≤ 𝑥) = 1 − (1 − 𝜋)𝑥 , 𝑥 ≥ 1
𝑃(𝑋 ≤ 𝑥) = ,𝑎 ≤ 𝑥 ≤ 𝑏
𝑏−𝑎+1
𝐵𝐼𝑁𝑂𝑀. 𝐷𝐼𝑆𝑇(𝑥, 𝑛, 𝜋 ,1) 𝑃𝑂𝐼𝑆𝑆𝑂𝑁. 𝐷𝐼𝑆𝑇(𝑥, 𝜆, 1) 𝐻𝑌𝑃𝐺𝐸𝑂𝑀. 𝐷𝐼𝑆𝑇(𝑥, 𝑛, 𝑠, 𝑁, 1) 𝐸𝑋𝑃𝑂𝑁. 𝐷𝐼𝑆𝑇(𝑥, 𝜆 ,1)
𝑎+𝑏 𝑠 1
Mean 𝜇 = 𝑛𝜋 𝜇= 𝜆 𝜇 = 𝑛𝜋 where 𝜋 =
𝜇= 𝑁 𝜇=
2 𝜋
Standard 𝜎 = √𝑛𝜋(1 − 𝜋) 𝜎 = √𝜆 𝑁−𝑛
[(𝑏 − 𝑎) + 1]2 − 1 𝜎 = √𝑛𝜋(1 − 𝜋) × √ 1−𝜋
deviation 𝜎=√ 𝑁−1 𝜎=√ 2
12 𝜋
𝑠
Shape Symmetric with no mode Skewed right if 𝜋 < 0.5, skewed left if 𝜋 > 0.5 and Always right-skewed, but less so for larger 𝜆. Symmetric if = 0.5 Highly skewed
𝑁
symmetric if 𝜋 = 0.5
Characteristics Uniform model is useful as a benchmark  The trials are independent of each other. There are  An event of interest occurs randomly over time or  The trials are not independent.  There is at least one trial to obtain the
and also to generate random integers for only two outcomes for each trial: success or failure. space.  The probability of success is not first success, but the number of trials is
sampling, or in simulation models. For  The probability of success for each trial 𝜋 remains  The average arrival rate 𝜆 remains constant. constant from trial to trial. not fixed.
example, lotteries are frequently studied constant.  The arrivals are independent of each other.
to make sure they are truly random.
Chapter 5. Probability Nguyen Thi Thu Van - October 22, 2022
Descriptive statistics is used to summarize and A random experiment is an observational process whose results cannot be known in advance. For example, when a customer enters a Lexus dealership, will the customer buy a car or not? How much will the customer
describe data collected from, for example, scientific spend?
experiments, or business processes. The set of all possible outcomes denoted 𝑆 is the sample space for the experiment. An event is any subset of outcomes in the sample space. A simple event, or elementary event, is a single outcome. A compound event
consists of two or more simple events.
Inferential statistics use the theory of probability
- To generalize the information of a The probability of an event is a number that measures the relative likelihood that the event will occur. The probability of an event 𝐴, denoted 𝑃(𝐴), must lie within the interval from 0 to 1: 0 ≤ 𝑃(𝐴) ≤ 1
sample to a wider population where the  A discrete sample space S consists of all the simple events, denoted 𝐸1 , 𝐸2 , … , 𝐸𝑛 : 𝑆 = {𝐸𝟏 , … , 𝐸𝑛 }. Then
sample comes from. 𝑃(𝑆) = 𝑃(𝐸1 ) + 𝑃(𝐸2 ) + ⋯ + 𝑃(𝐸𝑛 ) = 1, 0 ≤ 𝑃(𝐸𝑖 ) ≤ 1
- To test hypotheses.  If the outcome of the experiment is a continuous measurement, the sample space cannot be listed, but it can be described by a rule. For example, the
- To make predictions about the future. sample space for the length of a randomly chosen cell phone call would be 𝑆 = {𝑋|𝑋 ≥ 0}: 𝑃(𝑆) = ∑𝑋∈𝑆 𝑃(𝑋) ≡ ∫ 𝑃(𝑋) = 1, 0 ≤ 𝑃(𝑋) ≤ 1.
Three distinct approaches of assigning probability:
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒𝑠 𝑜𝑓 𝑡ℎ𝑒 𝑒𝑣𝑒𝑛𝑡
 An empirical approach: probability obtained is based on relative frequencies through observations or experiments (𝑎𝑛 𝑒𝑣𝑒𝑛𝑡) = . For example, there is a 2 percent chance of twins in a randomly chosen birth.
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

Law of Large Numbers is an important probability theorem, which says that as the number of trials increases, any empirical probability approaches its theoretical limit.
 A classical/a priori approach: probability obtained is based on logic or theory, not on experience. But such calculations are rarely possible in business situations. For example, there is a 50 percent chance of heads on a coin flip.
 A subjective approach: probability obtained is based on personal judgment or expert opinion. However, such a judgment is not random, because it is based on experience with similar events and knowledge of underlying causal process. Thus, subjective probabilities
have something in common with empirical probabilities. For example, there is an 80 percent chance that Vietnam will bid for the 2024 Winter Olympics.
Key terms Descriptions Formulas
Complement of an even 𝐴 is denoted by 𝐴′ is every outcome except the event. ′
Rules of 𝑃(𝐴) = 1 − 𝑃(𝐴 )
probability Union of two events 𝐴 and : 𝐴 ⊎ 𝐵 𝑖𝑠 all outcomes in either or both. General law of addition: 𝑃(𝐴 ⊎ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
Intersection of two events 𝐴 and 𝐵: 𝐴 ∩ 𝐵 is only those events in both. The probability of intersection of 2 events is called the joint probability. General law of multiplication: 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴|𝐵) × 𝑃(𝐵)
A and B are mutually exclusive (or disjoint) if their intersection is the empty set, i.e., one event precludes the other from occurring. 𝑃(𝐴 ∩ 𝐵) = 0 and 𝑃(𝐴 ⊎ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
Events are collectively exhaustive if their union is the entire sample space S (i.e., all the events that can possibly occur). Two mutually exclusive, collectively exhaustive events are binary (or dichotomous) events. For example, a car repair is either covered
by the warranty or not covered by the warranty. Remember that there can be more than 2 mutually exclusive, collectively exhaustive events. For example, a Walmart customer can pay by credit card, debit card, check, or cash.
Odds is the ratio of an event’s probability to the probability of its complement. Odds for event A:
𝑃(𝐴)
Odds against event A:
1−𝑃(𝐴)
1−𝑃(𝐴) 𝑃(𝐴)
𝑎
If odds against event A is quoted as 𝑏 to 𝑎 then 𝑃(𝐴) =
𝑎+𝑏

Conditional The probability of event A given that event B has occurred is a conditional probability, denoted 𝑃(𝐴 | 𝐵) which is read “the probability of A 𝑃(𝐴|𝐵) =
𝑃(𝐴∩𝐵)
𝑃(𝐵)
probability given B.”
Event A is independent of event B if and only if 𝑃(𝐴) = 𝑃(𝐴|𝐵)
Two events are independent when knowing that one event has occurred does not affect the probability that the other event will occur. In If 𝐴, 𝐵 are two independent events then 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵)
contrast, they will be called dependent. In general, if 𝐴1 , … , 𝐴𝑛 are independent then 𝑃(𝐴1 ∩ … ∩ 𝐴𝑛 ) = 𝑃(𝐴1 ) × … × 𝑃(𝐴𝑛 ). This law can be
applied to system reliability. For example, suppose that a website has two independent file servers. If each
has 99 percent reliability, what is the total reliability? The principle of redundancy: when individual
components have a low reliability, high reliability can be still achieved with massive redundancy.
Contingency A contingency table is a cross-tabulation of frequencies into 𝑟 rows and 𝑐 columns and is called an 𝑟 × 𝑐 table. The intersection of each row and
table column is a cell that shows a frequency. A contingency table is like a frequency distribution for a single variable, except it has two variables
(rows and columns). Contingency tables often are used to report the results of a survey. Marginal probability of an even is a relative frequency
that is found by dividing a row or column total by the total sample size. Each cell of the 𝑟 × 𝑐 table is used to calculate a joint probability
representing the intersection of two events. Conditional probabilities may be found by restricting ourselves to a single row or column.

Tree Events and probabilities can be displayed in the form of a tree diagram or decision tree to help visualize all possible outcomes without using complicated formulas. A probability tree has two main parts: branches and ends (or also called leaves). The junction
diagrams points between branches and leaves are called nodes. The probability of each branch is generally written on branches, while the outcomes are written on the ends. The tree diagram is a common business planning activity.
Bayes’ Bayes’ theorem [Thomas Bayes (1702–1761)] provides a method of revising probabilities to reflect new information. The prior (unconditional) 𝑃(𝐴|𝐵)𝑃(𝐵) 𝑃(𝐴|𝐵)𝑃(𝐵)
𝑃(𝐵|𝐴) = =
theorem probability of an event B is revised after event A has occurred to yield a posterior (conditional) probability. This theorem is a simple 𝑃(𝐴 ∩ 𝐵) + 𝑃(𝐴 ∩ 𝐵′) 𝑃(𝐴|𝐵)𝑃(𝐵) + 𝑃(𝐴|𝐵′)𝑃(𝐵′)

mathematical formula used for calculating conditional probabilities. where 𝐵, 𝐵’ are mutually exclusive, collectively exhaustive.

Counting Permutation is number of arrangements of sampled items drawn from a population when order is important. 𝑛!
𝑛 𝑃𝑟 =
rules (𝑛 − 𝑟)!
Combination is number of arrangements of sampled items drawn from a population when order does not matter. 𝑛!
𝑛 𝐶𝑟 =
(𝑛 − 𝑟)! 𝑟!
Chapter 3 and 4: Descriptive statistics Nguyen Thi Thu Van - June 6, 2023
Apart of type of data and sampling methods, the characteristics that can be asking about the data:
 Center: Where are the data values concentrated? What seem to be typical or middle data values?
Central tendency is a statistical measure to determine a single score that defines the center of a distribution that is most typical or most
representative of the entire group.
 Variability: How much dispersion is there in the data? How spread out are the data values? Are there unusual values?
Variability provides a quantitative measure of the differences between scores in a distribution and describes the degree to which the scores are
spread out or clustered together. The smaller the standard deviation, the more closely the data cluster about the mean.
 Shape: Are the data values distributed symmetrically? Skewed? Sharply peaked? Flat? Bimodal? Technically, the shape of a distribution is defined by
an equation that prescribes the exact relationship between each X and Y value on the graph. However, we will rely on a few less-precise terms that
serve to describe the shape of most distributions. Nearly all distributions can be classified as being either symmetrical or skewed.
 Correlation: Is there an association between two variables?
Descriptive statistics refers to the collection, organization, presentation, and summary of data either using charts and graphs [kinds of visual descriptions] or using a numerical summary [kind of numerical descriptions].
Visual descriptions Numerical descriptions
The type of graph we use to display our data is dependent on the type of data we have. Some charts are better suited We can assess dataset in a general way from a dot plot or histogram. However, to describe datasets more precisely and convincingly, we need numerical
for numerical data (e.g., stem and leaf displays, dot plots, histograms), while others are better for displaying statistics.
categorical data (e.g., bar chart, pie charts, Pareto charts).
Stem and leaf displays Dot plots Histograms Effective excel charts Center Variability Shape Correlation/Covariance
A stem-and-leaf A dot plot is a graphical display Histograms are used Line charts – Column 6 measures of center: 5 measures of variability: Instead of relying on histograms, Instead of relying on scatter plots, we can use
display is basically a of 𝑛 individual values of to describe visually charts – Bar charts – Pie  Mean  Range we can use statistical measures a statistical measure to be called correlation
frequency tally, except numerical data on horizontal datasets with any charts – Scatter plots –  Median  Sample variance like skewness and kurtosis to coefficient to assess the degree of linear
that we use digits axis. The basic steps in making a size. The basic steps Pareto chart - Table – Pivot  Mode  Sample standard deviation gain more precise inferences correlation between two variables (−1 ≤ 𝑟 <
instead of tally marks, dot plot: (1) sort the data in to construct a tables.  Midrange  Coefficient of variation about the shape of the population 0 negative correlation, 0 < 𝑟 ≤ 1 positive
where each data value ascending order. (2) make a histogram:  Geometric mean  Mean absolute deviation being sampled. correlation, 𝑟 = 0 no correlation).
is split into a leaf (the scale that covers the data range. (1) sort the data in Pareto chart is commonly  Trimmed mean Skewness: skewed left (skewness
last digit) and a stem (3) mark axis demarcations and ascending order. used in quality management The empirical relationship < 0), skewed right (skewness > 0) The covariance measures the degree to
(the other digits). label them. (4) plot each data (2) choose the to display the frequency of between mean, median, and and symmetric (skewness = 0) which two variables move together (if 𝜎𝑋𝑌 >
value as a dot above the scale at number of bins [The defects or errors of different mode: Kurtosis (Ku): mesokurtic 0, 𝑋 𝑎𝑛𝑑 𝑌 move in the same direction,
Stem-and-leaf display its approximate location. If more number of bins of a types. Most quality 𝑚𝑒𝑎𝑛 − 𝑚𝑜𝑑𝑒 (𝐾𝑢 = 0), platykurtic (𝐾𝑢 < 0), 𝜎𝑋𝑌 < 0 in opposite directions, 𝜎𝑋𝑌 = 0
can reveal central than one data value lies at dataset is of your problems can usually be
= 3 (𝑚𝑒𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛) leptokurtic (𝐾𝑢 > 0). unrelated).
tendency (say, at stem approximately the same X-axis judgement. If there is traced to only a few sources
z-score or standardized score is used to (1) identify and describe the exact location of each score, how far away from the mean the score lies in a
that has the biggest location, the dots are piled up no other specific or causes. Sorting the
distribution by using the mean and the standard deviation, which are the important measures for describing an entire distribution of data. For example,
frequency), dispersion vertically. requirement, we can categories in descending
assume that you received a score of 76 on a statistics exam. How did you do if 𝜇 = 70? Obviously, your score is 6 points above the mean, but you still do
(say, the data range) as A stacked dot plot can be used use Sturges’ rule.]. order helps managers focus
not know exactly where it is located. You may have one of highest scores in class or may not, which means 6 points may be a relatively big distance or a
well as the shape of to compare two or more groups (3) set the bin limits, on the vital few causes of
slightly small distance and you may be slightly above the average. At this stage you need to know the standard deviation to identify relative distance from
distribution. However, of data. (4) put the data problems rather than the
the score to the mean. (2) standardize an entire distribution. For example, there are many tests for measuring IQ and the tests are usually standardized so
its disadvantage is that values in the trivial many.
that they have a mean of 100 and a standard deviation of 15. This helps us understand and compare IQ scores even though they come from different tests.
a stem-and-leaf graph Dot plot can reveal the central appropriate bin.
Empirical rule/ Chebyshev’s theorem: If you have a mean and standard deviation, you might need to know the proportion of values that lie within, say,
works well for small tendency (say, where the data (5) create the table. Scatter plot displaying
plus and minus two standard deviations of the mean. If your data follow the normal distribution, that’s easy using the Empirical Rule! However, if you
samples of integer data values tend to cluster and where (6) sketch a bar chart pairs of observations on xy-
don’t know the distribution of your dataset or know that it doesn’t follow the normal distribution, let use Chebyshev’s Theorem.
with a limited range is the midpoint), dispersion, and whose Y-axis shows plane show relationships
Percentiles: In addition, to indicate the percentage of data values that fall below a particular value, we can use percentiles. A dataset could be divided the
but becomes awkward shape of distribution when the the number of data between variables (say,
data into 100 groups (percentiles), 10 groups (deciles), 5 groups (quintiles), or 4 groups (quartiles). Percentiles within percentile rank can be determined
when you have decimal data is large enough. However, values (or a income and age).
directly from the frequency distribution table. For the intermediate values not reported in the table can be found by a procedure called interpolation.
data (e.g., $60.39) or the disadvantage is that dot percentage) within
Grouping data plays a significant role when we have to deal with large data. Data formed by arranging individual observations of a variable into groups,
multidigit data (e.g., plots don’t reveal very much each bin and whose Pivot table is a powerful
so that a frequency distribution table of these groups provides a convenient way of summarizing or analyzing the data is termed as grouped data.
$3,857). In such information about the data set’s X-axis ticks show the tool to calculate,
cases, it is necessary to shape when the sample is small, end points of each summarize, and analyze Boxplot is a useful tool in exploratory data analysis for graphically depicting groups of numerical data through their quartiles to detect outliers (say, data
round the data to make and they become awkward when bin. data. values that differ greatly from the majority.)
the display work. the sample is large.
Chapter 2: Data collection Nguyen Thi Thu Van - September 19, 2022
Data terminology
An observation is a single member of a collection of items that we want to study, such as a person, firm, or region. A variable is a
characteristic of the subject or individual, such as name, age, income. A dataset consists of all the values of the observations we have
chosen to observe. Data usually are entered into a spreadsheet or database as an 𝑛 × 𝑚 matrix: univariate datasets (one variable);
bivariate datasets (two variables); and multivariate datasets (more than two variables). The questions that can be explored and the
data analytical techniques that can be used will depend upon the data type and the number of variables.

Data type
Categorical / Qualitative data Numerical / Quantitative data
Categorical data have values that are described by words rather than numbers (e.g., gender, eye color, hair color). Because categorical Numerical data arise from counting, measuring something, or some kind of mathematical operation. Numerical data can be broken down
variables have non numerical values, on occasion the values of categorical variable may be represented using numbers, this is called into two types: discrete (i.e. variables with countable number of values like the number of credits, the number of passengers in a flight
coding. But coding a category as a number does not make the data numerical and the number does not typically imply a rank. …) and continuous (i.e. variables with values within an interval like height, weight, time, income …).
Time-series data is a sequence of data points collected over time intervals, allowing us to track changes over time. Time-series data can
track changes over milliseconds, days, or even years. For the time series data, we are interested in the trends, or the pattern over time.

Cross-sectional data is collected from many units (people, companies, countries, etc.) in a single time period For the cross-sectional
data, we are interested in variation among observations or in relationships.

4 levels of measurement for data


Nominal scale/measurement Ordinal scale/measurement Interval scale/measurement Ratio scale/measurement
A nominal scale describes a variable with categories that do not An ordinal scale is one where the order matters but not the An interval scale is one where there is order and the difference A ratio scale is an interval scale where ratios are meaningful. The
have a natural order or ranking. difference between values. between two values are meaningful, but ratios are not meaningful zero point of this scale is meaningful, meaning that it indicates the
because the zero point of these scales doesn’t mean the absence of absence of the quantity being measured.
the quantity being measured.
Example. If you’re measuring the academic majors for a group of Example. Often, an ordinal scale consists of a series of ranks Example. You know that a measurement of 80° Fahrenheit is Example. A gas tank with 10 gallons (10 more than 0) has twice
college students, the categories would be arts, business, chemistry, (first, second, third, and so on) like the order of finish in a horse higher than exactly 20° a measure of 60° F. But we can’t say that as much gas as a tank with only 5 gallons (more than 0).
and so on. Each student would be classified in one category race. 60°F is twice as warm as 30°F, or that 30°F is 50 percent warmer
according to his or her major. than 20°F, because zero point of this scale is not meaningful.
The Likert scale is widely used in social work research. It is usually treated as an interval scale, but strictly speaking it is an ordinal scale, where arithmetic operations cannot be conducted. The coarseness of a Likert-scale refers to the number of scale points.

Sampling methods
Two main categories of sampling methods: random sampling (e.g., simple random sample, systematic sample, stratified sample, A census is an examination
cluster sample) and non-random sampling (e.g., judgement sample, convenience sample, focus group). Sampling without of all items of the population,
replacement means that once an item has been selected to be included in the sample, it cannot be considered for the sample again. while a sample involves
Sampling with replacement means that the same random number could show up more than once. Sampling with replacement does not looking only at some items
lead to bias in our sample results. Note that when the population is finite and the sample size is close to the population size, we should selected from the population.
not use sampling without replacement. When the sample’s less than 5% of the population, the population is effectively infinite.

Survey
Most survey research follows the same basic steps: Step 1: State the goals of the research. Step 5: Design a data collection instrument (questionnaire).
Step 2: Develop the budget (time, money, staff …). Step 6: Pretest the survey instrument and revise as needed.
Step 3: Create a research design (target population, frame, sample size) Step 7: Administer the survey (follow up if needed). Step 8: Code the data and analyze it.
Step 4: Choose a survey type and method of administration. Note that these steps may overlap in time.
Chapter 1 – Statistics Nguyen Thi Thu Van – September 19, 2022
Managers need reliable and timely information in order to analyze market trends and adjust to changing market conditions: company’s internal operations (e.g., sales, production, inventory
levels, warranty claims) and competitive positions (e.g., market share, customer satisfaction, repeat sales). Companies increasingly are using business analytics to support decision making,
to recognize anomalies that require tactical action, or to gain strategic insight to align business processes with business objectives. Statistics and statistical analysis permit data-based
decision making and reduce managers’ need to rely on guesswork. We can’t agree more that businesses that combine managerial judgment with statistical analysis are more successful.
What’s statistics? Why study statistics? Statistics in Business Statistical challenges Critical thinking
Statistics helps convert unstructured Knowing statistics will make you a Some of the ways statistics are Common challenges facing Some common logical pitfalls which
raw data that have been collected better consumer of other people’s data used in business: the business professionals abound in both the data process and the
into useful information and data analyses. You should know enough to using statistics: reasoning process: conclusions from
mining. It encompasses all the handle everyday data problems, to feel - Auditing small samples, conclusions from
technologies for collecting, storing, confident that others cannot deceive - Marketing - Imperfect data and nonrandom samples, conclusions from
accessing, and analyzing data on the you with spurious arguments, and to - Health care practical constraints rare events, poor survey methods.
company’s operations to make better know when you’ve reached the limits - Quality improvement - Business ethics Therefore, statistics is an essential part
business decisions. Some experts of your expertise. Here are some - Purchasing - Upholding ethical of critical thinking. It allows us to test an
prefer to call statistics data science, reasons for anyone to study statistics: - Medicine standards idea against empirical evidence. We use
a trilogy of tasks involving: - Communication - Operation management - Using consultants statistical tools to compare our prior
- Computer skills - Product warranty ideas with empirical data (data
- Data modeling - Information management collected through observations and
- Analysis - Technical literacy experiments). If the data do not support
- Decision making - Process improvement our theory, we can reject or revise our
theory.
There are two primary kinds of statistics:
• Descriptive statistics refers to the collection, organization, presentation, and summary of data (either using charts and graphs or using a numerical summary).
• Inferential statistics refers to generalizing from a sample to a population, estimating unknown population parameters, drawing conclusions, and making decisions.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy