OSTA-WS2024-Lecture 03
OSTA-WS2024-Lecture 03
OSTA – WS 2024
Dr. Omer CAYIRLI
Lecture 3
Outline
❖Descriptive Statistics II
➢ Numerical Measures
✓ Measures of location and variability
✓ Measures of distribution shape, relative location
✓ Outliers
✓ Measures of association between two variables
Descriptive Statistics: Numerical Measures
❖ Numerical Measures
➢ If the measures are computed for data from a sample, they are
called sample statistics.
➢ If the measures are computed for data from a population, they
are called population parameters.
➢ A sample statistic is referred to as the point estimator of the
corresponding population parameter.
Descriptive Statistics: Numerical Measures
❖ Measures of Location
➢ Mean
➢ Median
➢ Mode
➢ Weighted Mean
➢ Geometric Mean
➢ Percentiles
➢ Quartiles
Descriptive Statistics: Numerical Measures
❖ Mean
➢ Provides a measure of central location.
➢ The mean of a data set is the average of all the data values.
➢ The sample mean 𝑥ҧ is the point estimator of the population mean, 𝜇.
❖ Sample Mean 𝑥ҧ
➢ The monthly rents for 70 randomly
sampled apartments.
Descriptive Statistics: Numerical Measures
❖ Median
➢ The value in the middle when the data items are arranged in ascending order.
➢ The preferred measure of central location when a data set has extreme values.
✓ A few extremely large observations can inflate the mean.
➢ 7 observations: 26, 18, 27, 12, 14, 27, and 19.
Rewritten in ascending order: 12, 14, 18, 19, 26, 27, and 27.
The median is the middle value in this list, so the median = 19.
➢ 8 observations: 26, 18, 27, 12, 14, 27, 19, and 30.
Rewritten in ascending order: 12, 14, 18, 19, 26, 27, 27, and 30.
The median is the average of the two middle values in this list,
The median = (19 + 26)/2 = 22.5.
Descriptive Statistics: Numerical Measures
❖ Mode
➢ The value that occurs with greatest frequency.
➢ The greatest frequency can occur at two or more different
values.
✓ If the data have exactly two modes, the data are bimodal.
✓ If the data have more than two modes, the data are multimodal.
Descriptive Statistics: Numerical Measures
❖ Weighted Mean
➢ Computed by giving each observation a weight that reflects its
relative importance.
➢ The choice of weights depends on the application.
✓ GPA, Portfolio return, DXY Index, IMF’s SDR
✓ The weights might be the number of credit hours earned for each grade, as in
GPA.
Worker Wage ($/hr) Total Hours
Carpenter 21.60 520
Electrician 28.72 230
Laborer 11.80 410
Painter 19.75 270
Plumber 24.16 160
Descriptive Statistics: Numerical Measures
❖ Geometric Mean
➢ Calculated by finding the nth root of the product of n values.
➢ Often used in analyzing growth rates in financial data where using the arithmetic mean may
provide misleading results.
➢ Should be applied anytime the mean rate of change over several successive periods is needed.
✓ Changes in populations of species, crop yields, pollution levels, and birth and death rates.
Descriptive Statistics: Numerical Measures
❖ Geometric Mean
➢ Calculated by finding the nth root of the product of n values.
➢ Often used in analyzing growth rates in financial data where using the arithmetic mean may
provide misleading results.
➢ Should be applied anytime the mean rate of change over several successive periods is needed.
✓ Changes in populations of species, crop yields, pollution levels, and birth and death rates.
Descriptive Statistics: Numerical Measures
❖ Percentiles
➢ A percentile provides information about how the data are spread over the interval from the smallest value to the
largest value.
➢ Admission test scores for colleges and universities are frequently reported in terms of percentiles.
➢ The 𝑝th percentile of a data set is a value such that at least p percent of the items take on this value or less and at
least (100 – 𝑝) percent of the items take on this value or more.
➢ Arrange the data in ascending order.
➢ Compute 𝐿𝑝, the location of the 𝑝th percentile.
➢ The 80th percentile is the 56th value plus 0.8 times the difference between the 57th and 56th values.
➢ So the 80th percentile = 635 + 0.8(649 – 635) = 646.2.
Descriptive Statistics: Numerical Measures
❖ Quartiles
➢ Quartiles are specific percentiles.
➢ First Quartile = 25th Percentile
➢ Second Quartile = 50th Percentile = Median
➢ Third Quartile = 75th Percentile
➢ The 75th percentile is the 53rd value plus 0.25 times the difference between the 54th and 53rd values.
➢ The 75th percentile = third quartile = 625 + 0.25(625 – 625) = 625.
Descriptive Statistics: Numerical Measures
❖ Measures of Variability
➢ Common measures of variability are:
✓ Range
✓ Interquartile Range
✓ Variance
✓ Standard Deviation
✓ Coefficient of Variation
Descriptive Statistics: Numerical Measures
Descriptive Statistics: Numerical Measures
❖ Range
➢ The difference between the largest and smallest data value.
➢ The simplest measure of variability.
➢ Very sensitive to the smallest and largest data values.
❖ Variance
➢ A measure of variability that utilizes all the data.
➢ Based on the difference between the value of each observation (xi) and the mean
(𝑥ҧ for a sample, μ for a population).
➢ Useful in comparing the variability of two or more variables.
➢ It is the average of the squared deviations between each data value and the
mean.
➢ The variance of a sample is:
❖ Standard Deviation
➢ The positive square root of the variance.
➢ Measured in the same units as the data, easily interpreted than the variance.
❖ Coefficient of Variation
➢ Indicates how large the standard deviation is in relation to the mean.
❖ z-Scores
➢ Often called the standardized value.
➢ It denotes the number of standard deviations a data value xi is from the mean.
➢ An observation’s z-score is a measure of the relative location of the observation
in a data set.
➢ A data value less than the sample mean will have a z-score less than zero.
➢ A data value greater than the sample mean will have a z-score greater than zero.
➢ A data value equal to the sample mean will have a z-score of zero.
Descriptive Statistics: Numerical Measures
Descriptive Statistics: Numerical Measures
❖ Empirical Rule
➢ When the data are believed to approximate a bell-shaped
distribution:
The empirical rule can be used to determine the percentage of data
values that must be within a specified number of standard deviations
of the mean.
➢ For data having a bell-shaped distribution:
✓ Approximately 68% of the data values will be within one standard deviation of
the mean.
✓ Approximately 95% of the data values will be within two standard deviations of
the mean.
✓ Almost all of the data values will be within three standard deviations of the
mean.
Descriptive Statistics: Numerical Measures
Descriptive Statistics: Numerical Measures
❖ Detecting Outliers
➢ An outlier is an unusually small or unusually large value in a data set.
➢ A data value with a z-score less than –3 or greater than +3 might be
considered an outlier.
➢ It might be:
✓ an incorrectly recorded data value
✓ a data value that was incorrectly included in the data set
✓ a correctly recorded data value that belongs in the data set
Standardized Values for Apartment Rents
Descriptive Statistics: Numerical Measures
For samples:
For populations:
Descriptive Statistics: Numerical Measures
For samples:
For populations:
Descriptive Statistics: Numerical Measures
❖Introduction to Probability I
➢ Random Experiments, Counting Rules, and Assigning
Probabilities
➢ Events and Their Probabilities
➢ Some Basic Relationships of Probability
➢ Reading(s):
✓ SBE Ch. 4.1 → 4.3
Statistics
OSTA – WS 2024
Dr. Omer CAYIRLI
Lecture 3