0% found this document useful (0 votes)
18 views

OSTA-WS2024-Lecture 03

Uploaded by

10323060
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

OSTA-WS2024-Lecture 03

Uploaded by

10323060
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Statistics

OSTA – WS 2024
Dr. Omer CAYIRLI

Lecture 3
Outline

❖Descriptive Statistics II
➢ Numerical Measures
✓ Measures of location and variability
✓ Measures of distribution shape, relative location
✓ Outliers
✓ Measures of association between two variables
Descriptive Statistics: Numerical Measures

❖ Numerical Measures
➢ If the measures are computed for data from a sample, they are
called sample statistics.
➢ If the measures are computed for data from a population, they
are called population parameters.
➢ A sample statistic is referred to as the point estimator of the
corresponding population parameter.
Descriptive Statistics: Numerical Measures

❖ Measures of Location
➢ Mean
➢ Median
➢ Mode
➢ Weighted Mean
➢ Geometric Mean
➢ Percentiles
➢ Quartiles
Descriptive Statistics: Numerical Measures

❖ Mean
➢ Provides a measure of central location.
➢ The mean of a data set is the average of all the data values.
➢ The sample mean 𝑥ҧ is the point estimator of the population mean, 𝜇.

❖ Sample Mean 𝑥ҧ
➢ The monthly rents for 70 randomly
sampled apartments.
Descriptive Statistics: Numerical Measures

❖ Median
➢ The value in the middle when the data items are arranged in ascending order.
➢ The preferred measure of central location when a data set has extreme values.
✓ A few extremely large observations can inflate the mean.
➢ 7 observations: 26, 18, 27, 12, 14, 27, and 19.
Rewritten in ascending order: 12, 14, 18, 19, 26, 27, and 27.
The median is the middle value in this list, so the median = 19.
➢ 8 observations: 26, 18, 27, 12, 14, 27, 19, and 30.
Rewritten in ascending order: 12, 14, 18, 19, 26, 27, 27, and 30.
The median is the average of the two middle values in this list,
The median = (19 + 26)/2 = 22.5.
Descriptive Statistics: Numerical Measures

❖ Mode
➢ The value that occurs with greatest frequency.
➢ The greatest frequency can occur at two or more different
values.
✓ If the data have exactly two modes, the data are bimodal.
✓ If the data have more than two modes, the data are multimodal.
Descriptive Statistics: Numerical Measures

❖ Weighted Mean
➢ Computed by giving each observation a weight that reflects its
relative importance.
➢ The choice of weights depends on the application.
✓ GPA, Portfolio return, DXY Index, IMF’s SDR
✓ The weights might be the number of credit hours earned for each grade, as in
GPA.
Worker Wage ($/hr) Total Hours
Carpenter 21.60 520
Electrician 28.72 230
Laborer 11.80 410
Painter 19.75 270
Plumber 24.16 160
Descriptive Statistics: Numerical Measures

❖ Geometric Mean
➢ Calculated by finding the nth root of the product of n values.
➢ Often used in analyzing growth rates in financial data where using the arithmetic mean may
provide misleading results.
➢ Should be applied anytime the mean rate of change over several successive periods is needed.
✓ Changes in populations of species, crop yields, pollution levels, and birth and death rates.
Descriptive Statistics: Numerical Measures

❖ Geometric Mean
➢ Calculated by finding the nth root of the product of n values.
➢ Often used in analyzing growth rates in financial data where using the arithmetic mean may
provide misleading results.
➢ Should be applied anytime the mean rate of change over several successive periods is needed.
✓ Changes in populations of species, crop yields, pollution levels, and birth and death rates.
Descriptive Statistics: Numerical Measures
❖ Percentiles
➢ A percentile provides information about how the data are spread over the interval from the smallest value to the
largest value.
➢ Admission test scores for colleges and universities are frequently reported in terms of percentiles.
➢ The 𝑝th percentile of a data set is a value such that at least p percent of the items take on this value or less and at
least (100 – 𝑝) percent of the items take on this value or more.
➢ Arrange the data in ascending order.
➢ Compute 𝐿𝑝, the location of the 𝑝th percentile.

➢ The 80th percentile is the 56th value plus 0.8 times the difference between the 57th and 56th values.
➢ So the 80th percentile = 635 + 0.8(649 – 635) = 646.2.
Descriptive Statistics: Numerical Measures

❖ Quartiles
➢ Quartiles are specific percentiles.
➢ First Quartile = 25th Percentile
➢ Second Quartile = 50th Percentile = Median
➢ Third Quartile = 75th Percentile

➢ The 75th percentile is the 53rd value plus 0.25 times the difference between the 54th and 53rd values.
➢ The 75th percentile = third quartile = 625 + 0.25(625 – 625) = 625.
Descriptive Statistics: Numerical Measures

❖ Measures of Variability
➢ Common measures of variability are:
✓ Range
✓ Interquartile Range
✓ Variance
✓ Standard Deviation
✓ Coefficient of Variation
Descriptive Statistics: Numerical Measures
Descriptive Statistics: Numerical Measures

❖ Range
➢ The difference between the largest and smallest data value.
➢ The simplest measure of variability.
➢ Very sensitive to the smallest and largest data values.

➢ Range = largest value – smallest value = 715 – 525 = 190


Descriptive Statistics: Numerical Measures

❖ Interquartile Range (IQR)


➢ The difference between the third quartile and the first quartile.
➢ The range for the middle 50% of the data.
➢ Overcomes the sensitivity to extreme data values.

➢ 3rd Quartile (Q3) = 625


➢ 1st Quartile (Q1) = 545
➢ IQR = 625 – 545 = 80
Descriptive Statistics: Numerical Measures

❖ Variance
➢ A measure of variability that utilizes all the data.
➢ Based on the difference between the value of each observation (xi) and the mean
(𝑥ҧ for a sample, μ for a population).
➢ Useful in comparing the variability of two or more variables.
➢ It is the average of the squared deviations between each data value and the
mean.
➢ The variance of a sample is:

➢ The variance of a population is:


Descriptive Statistics: Numerical Measures
Descriptive Statistics: Numerical Measures
Descriptive Statistics: Numerical Measures

❖ Standard Deviation
➢ The positive square root of the variance.
➢ Measured in the same units as the data, easily interpreted than the variance.

➢ The standard deviation of a sample is:

➢ The standard deviation of a population is:


Descriptive Statistics: Numerical Measures

E (𝒓𝒊 ) = 𝑝1 𝑟1 +𝑝2 𝑟2 +· · ·+𝑝𝑛 𝑟𝑛


σ𝟐 (𝒓𝒊 ) = 𝑝1 ∗ (𝑟1 −E (𝑟𝑖 ))2 +𝑝2 ∗ (𝑟2 −E (𝑟𝑖 ))2 + … + 𝑝𝑛 ∗ (𝑟𝑛 −E (𝑟𝑖 ))2
𝝈(𝒓𝒊 ) = σ2 (𝑟𝑖 )
Descriptive Statistics: Numerical Measures

❖ Coefficient of Variation
➢ Indicates how large the standard deviation is in relation to the mean.

➢ The coefficient of variation of a sample is:

➢ The coefficient of variation of a population is:

❖ Sample Variance, Standard Deviation, and Coefficient of Variation


➢ Example: The variance, standard deviation, and coefficient of variation for apartment
rents.
Descriptive Statistics: Numerical Measures

❖ Measures of Distribution Shape, Relative Location


➢ Distribution Shape
➢ z-Scores
➢ Chebyshev’s Theorem
➢ Empirical Rule
Descriptive Statistics: Numerical Measures

❖ Distribution Shape: Skewness


➢ The numerical measure of the shape of a distribution is called
skewness.
➢ The formula for the skewness of sample data,

Moderately Skewed Left Symmetric (not skewed) Highly Skewed Right


Skewness is negative. Skewness is zero. Skewness is positive
Mean will usually be less than the median. Mean and median are equal. Mean usually be more than the median.
Descriptive Statistics: Numerical Measures

❖ z-Scores
➢ Often called the standardized value.
➢ It denotes the number of standard deviations a data value xi is from the mean.
➢ An observation’s z-score is a measure of the relative location of the observation
in a data set.
➢ A data value less than the sample mean will have a z-score less than zero.
➢ A data value greater than the sample mean will have a z-score greater than zero.
➢ A data value equal to the sample mean will have a z-score of zero.
Descriptive Statistics: Numerical Measures
Descriptive Statistics: Numerical Measures

❖ At least (1 – 1/z2) of the data values must


be within z standard deviations of the
mean, where z is any value greater than 1.
❖ Chebyshev’s theorem requires z > 1; but z
need not be an integer.
❖ At least 75% of the data values must be
within z = 2 standard deviations of the
mean.
❖ At least 89% of the data values must be
within z = 3 standard deviations of the
mean.
❖ At least 94% of the data values must be
within z = 4 standard deviations of the
mean.
Descriptive Statistics: Numerical Measures

❖ Empirical Rule
➢ When the data are believed to approximate a bell-shaped
distribution:
The empirical rule can be used to determine the percentage of data
values that must be within a specified number of standard deviations
of the mean.
➢ For data having a bell-shaped distribution:
✓ Approximately 68% of the data values will be within one standard deviation of
the mean.
✓ Approximately 95% of the data values will be within two standard deviations of
the mean.
✓ Almost all of the data values will be within three standard deviations of the
mean.
Descriptive Statistics: Numerical Measures
Descriptive Statistics: Numerical Measures

❖ Detecting Outliers
➢ An outlier is an unusually small or unusually large value in a data set.
➢ A data value with a z-score less than –3 or greater than +3 might be
considered an outlier.
➢ It might be:
✓ an incorrectly recorded data value
✓ a data value that was incorrectly included in the data set
✓ a correctly recorded data value that belongs in the data set
Standardized Values for Apartment Rents
Descriptive Statistics: Numerical Measures

❖ Five-Number Summaries and Box Plots


➢ Summary statistics and easy-to-draw graphs can be used to quickly summarize large
quantities of data.
➢ Two tools that accomplish this are five-number summaries and box plots.
✓ Smallest Value
✓ First Quartile
✓ Median
✓ Third Quartile
✓ Largest Value

Min = 525 First Quartile = 545


Median = 575
Third Quartile = 625 Max = 715
Descriptive Statistics: Numerical Measures
Min = 525 First Quartile = 545
❖ Box Plot
➢ A graphical display of data Median = 575
that is based on a five-
number summary. Third Quartile = 625 Max = 715
➢ Provides another way to
identify outliers.
➢ A box is drawn with its ends
located at the first and third
quartiles.
➢ A vertical line is drawn in the
box at the location of the
median (second quartile).
➢ Limits are located (not
drawn) using the
interquartile range (IQR).
➢ Data outside these limits are
considered outliers.
➢ The location of each outlier
is shown with the symbol * .
Descriptive Statistics: Numerical Measures
Min = 525 First Quartile = 545
❖ Apartment Rents
Median = 575
❖ The lower limit is located
1.5(IQR) below Q1. Third Quartile = 625 Max = 715
❖ Lower Limit: Q1 – 1.5(IQR)
= 545 – 1.5(80) = 425
❖ The upper limit is located
1.5(IQR) above Q3.
❖ Upper Limit: Q3 + 1.5(IQR)
= 625 + 1.5(80) = 745
❖ There are no outliers
(values less than 425 or
greater than 745) in the
apartment rent data.
Descriptive Statistics: Numerical Measures

❖ Measures of Association Between Two Variables


➢ Two descriptive measures of the relationship between two variables
are covariance and correlation coefficient.
➢ Covariance
✓ A measure of the linear association between two variables.
✓ Computed as follows:

For samples:

For populations:
Descriptive Statistics: Numerical Measures

❖ Measures of Association Between Two Variables


➢ Correlation Coefficient
✓ A measure of linear association and not necessarily causation.
✓ High correlation does not mean that one variable is the cause of the other.
✓ Can take on values between –1 and +1.
✓ The correlation coefficient is computed as follows:

For samples:

For populations:
Descriptive Statistics: Numerical Measures

❖ A golfer is interested in investigating the relationship, if any,


between driving distance and 18-hole score.
What is next?

❖Introduction to Probability I
➢ Random Experiments, Counting Rules, and Assigning
Probabilities
➢ Events and Their Probabilities
➢ Some Basic Relationships of Probability
➢ Reading(s):
✓ SBE Ch. 4.1 → 4.3
Statistics
OSTA – WS 2024
Dr. Omer CAYIRLI

Lecture 3

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy