0% found this document useful (0 votes)
112 views

Stat Handout

Statistics involves collecting, organizing, summarizing, and interpreting data. Descriptive statistics describes data while inferential statistics draws conclusions. There are three measures of central tendency: mean, median, and mode. The mean is the average value found by summing all values and dividing by the total count. The median is the middle value of data arranged from lowest to highest. The mode is the most frequent value. Measures of central tendency don't reflect how spread out data is. Measures of dispersion like range and standard deviation describe a data's spread. Range is the difference between highest and lowest values while standard deviation describes how much values deviate from the mean in a less sensitive way than range.

Uploaded by

Jenrick Dimayuga
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views

Stat Handout

Statistics involves collecting, organizing, summarizing, and interpreting data. Descriptive statistics describes data while inferential statistics draws conclusions. There are three measures of central tendency: mean, median, and mode. The mean is the average value found by summing all values and dividing by the total count. The median is the middle value of data arranged from lowest to highest. The mode is the most frequent value. Measures of central tendency don't reflect how spread out data is. Measures of dispersion like range and standard deviation describe a data's spread. Range is the difference between highest and lowest values while standard deviation describes how much values deviate from the mean in a less sensitive way than range.

Uploaded by

Jenrick Dimayuga
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

DEFINING STATISTICS

Statistics is a branch of mathematics that involves the collection, organization,


summarization, presentation, and interpretation of data. The branch of statistics that involves the
collection, organization, summarization, and presentation of data is called descriptive statistics,
while the branch that interprets and draws conclusions from the data is called inferential
statistics.

MEASURES OF CENTRAL TENDENCY


One of the most basic statistical concepts involves finding measures of central tendency
of a set of numerical data. It is often helpful to find numerical values that locate, in some sense,
the “center” of a set of data. We will consider three measures of central tendency, namely the
arithmetic mean, the median, and the mode.
The arithmetic mean, or simply the mean, is the most commonly used measure of
central tendency. To find the mean of a set of data, find the sum of the data values and divide the

∑x
sum by the number of data values. This formula can be written in symbols as mean= .
n
Statisticians often collect data from small portions of a large group in order to determine
information about the group. In such situations, the entire group under consideration is known as
the population, and any subset of the population is called a sample. It is traditional to denote the
mean of a sample by x́ (read as “x bar”) and to denote the mean of a population by the Greek
letter μ (lowercase mu).

Example: Six friends in a biology class of 20 students received test grades of 92, 84, 65, 76,
88, and 90. Find the mean of these test scores.
Solution: The six friends are a sample of the population of 20 students. Use x́ to represent
the mean.
∑ x 92+84 +65+76+ 88+90 495
x́= = = =82.5
n 6 6
Therefore, the mean of the test scores is 82.5.
Another measure of central tendency is the median, which is the middle number (or the
mean of the two middle numbers) in a list of numbers that have been arranged in numerical order
from least to greatest or vice versa. Any list of numbers that is arranged as such is known as a
ranked list.
The median of a ranked list of n numbers is the middle number if n is odd and the mean
of the two middle numbers if n is even.

Example: Find the mean of the data in the following lists.


a. 4, 8, 1, 14, 9, 21, 12
b. 46, 23, 92, 89, 77, 108
Solution:
a. The list 4, 8, 1, 14, 9, 21, 12 contains 7 numbers. The median of a list with an odd
number of entries is found by ranking the numbers and finding the middle
number. Ranking the numbers from least to greatest yields 1, 4, 8, 9, 12, 14, 21,
and we can see that the middle number is 9. Therefore, the median is 9.
b. The list 46, 23, 92, 89, 77, 108 contains 6 numbers. The median of a list of data
with an even number of entries is found by ranking the numbers and computing
the mean of the two middle numbers. Ranking the numbers fro least to greatest
yields 23, 46, 77, 89, 92, 108, and the two middle numbers are 77 and 89.
Therefore, the median is equal to the mean of these two numbers, which is 83.

The third measure of central tendency is the mode, which is the number that occurs most
frequently within a list of numbers. However, it is possible for some lists of numbers to not have
a mode. For instance, in the list 1, 6, 8, 10, 32, 15, 49, each number occurs exactly once. Because
no number occurs more often than the other numbers in the list, there is no mode.
On the other hand, it is also possible for a list of numerical data to have more than one
mode. For instance, in th list 4, 2, 6, 2, 7, 9, 2, 4, 9, 8, 9, 7, the numbers 2 and 9 each occur three
times. Each of the other numbers occurs less than three times. Thus 2 and 9 are both modes for
the data.

Example: Find the mode of the data in the following lists.


a. 18, 15, 21, 16, 15, 14, 15, 21
b. 2, 5, 8, 9, 11, 4, 7, 23
Solution:
a. In the list 18, 15, 21, 16, 15, 14, 15, 21, the number 15 occurs more often than the
other numbers. Thus 15 is the mode.
b. Each number in the list 2, 5, 8, 9, 11, 4, 7, 23 occurs only once. Because no
number occurs more often than the others, there is no mode.

The mean, the median, and the mode are all acceptable measures of central tendency.
However, they are generally not equal. The mean of a set of data is the most sensitive among the
three. A change in any of the numbers changes the mean, and the mean can be changed
drastically by changing an extreme value. In contrast, the median and the mode of a set of data
are usually not changed by changing an extreme value.
When a data set has one or more extreme values that are very different from the majority
of data values (known as outliers), the mean will not necessarily be a good indicator of an
average value. To see why, let us compare the mean, median, and mode for the salaries of 5
employees of a small company.

Salaries: ₱370,000 ₱60,000 ₱36,000 ₱20,000 ₱20,000

506,000
The sum of the 5 salaries is ₱506,000, and hence the mean is =101,200. Meanwhile, the
5
median is the middle number, which is ₱36,000, and because the ₱20,000 salary occurs most
frequently, the mode is ₱20,000. The data set contains one outlier, which makes the mean
considerably larger than the median. Most of the employees of this company would probably
agree that the median of ₱36,000 better represents the average of the salaries than does either the
median or the mode.

The Weighted Mean


A value called the weighted mean is often used when some data values are more
important than others. For instance, many professors determine a student’s course grade from the
student’s tests and the final examination. Consider the situation in which a professor counts the
final examination score as 2 test scores. To find the weighted mean of the student’s scores, the
professor first assigns a weight to each score. In this case, the professor could assign each of the
test scores a weigh of 1 and the final exam score a weight of 2. A student with test scores of 65,
70, and 75 and a final examination score of 90 has a weighted mean of

( 65× 1 ) + ( 70× 1 ) + ( 75× 1 ) +(90 × 2) 390


= =78
5 5
Note that the numerator of the weighted mean above is the sum of the products of each test score
and its corresponding weight. The number 5 in the denominator is the sum of all the weights (
1+1+1+2=5).
Generally, the weighted mean of the n numbers x 1 , x 2 , x 3 ,… , x n with the respective
assigned weights w 1 , w 2 , w 3 ,… , w n is
∑ (x ∙ w)
weighted mean=
∑w
where ∑ ( x ∙ w ) is the sum of the products formed by multiplying each number by its assigned
weight, and ∑ w is the sum of all the weights.

MEASURES OF DISPERSION
In the preceding section we introduced three measures of central tendency—the mean,
the median, and the mode. However, some characteristics of a set of data may not be evident
from an examination of these quantities. For instance, consider a soft-drink dispensing machine
that should dispense 8 oz of your selection into a cup. The table below shows data for two of
these machines.
Machine 1 Machine 2
9.52 8.01
6.41 7.99
10.07 7.95
5.85 8.03
8.15 8.02
x́=8.0 x́=8.0
The mean data value for each machine is 8 oz. However, the quantity of soda dispensed
in Machine 1 is highly inconsistent—in some cases the soda overflows the cup, and in other
cases too little soda is dispensed. The machine obviously needs adjustment. Machine 2, on the
other hand, is working just fine. The quantity dispensed is very consistent, with little variation.
This example shows that average values do not affect the spread or dispersion of data. To
measure these, we must introduce statistical values known as the range and the standard
deviation.
The range of a set of data values is the difference between the greatest data value and the
least data value. In the above example, the greatest quantity dispensed by Machine 1 is 10.07 oz
and the least quantity is 5.85. Thus, the range of the number of ounces dispensed is
10.07−5.85=4.22oz.
The range of a set of data is easy to calculate, but it can be deceiving. The range is a
measure that depends only on the two most extreme values, and as such it is very sensitive. A
measure of dispersion that is less sensitive to extreme values is the standard deviation. The
standard deviation of a set of numerical data makes use of the amount by which each individual
data value deviates from the mean. These deviations, represented by ( x−x́ ), are positive when the
data value x is greater than the mean x́ and are negative when x is less than x́.
The sum of all the deviations ( x−x́ ) is 0 for all sets of data. Because of this, we cannot
use the sum of the deviations as a measure of dispersion for a set of data. Instead, the standard
deviation uses the sum of the squares of the deviations.
If x 1 , x 2 , x 3 ,… , x n is a population of n numbers with a mean of μ, then the standard
deviation of the population is
2
∑ ( x−μ )
σ=
√ n
If x 1 , x 2 , x 3 ,… , x n is a sample of n numbers with a mean of x́, then the standard deviation of the
sample is
2
∑ ( x− x́ )
s=
√ n−1
Most statistical applications involve a sample rather than a population, which is the complete set
of data values. Sample standard deviations are designated by the lowercase letter s. In those
cases in which we do work with a population, we designate the standard deviation of the
population by σ , which is the lowercase Greek letter sigma. We can use the following procedure
to calculate the standard deviation of n numbers.
1. Determine the mean of the n numbers.
2. For each number, calculate the deviation (difference) between the number and the mean
of the numbers.
3. Calculate the square of each deviation and find the sum of these squared deviations.
4. If the data is a population, divide the sum by n. If the data is a sample, divide the sum by
n−1.
5. Find the square root of the quotient in Step 4.

Example: The following numbers were obtained by sampling a population: 2, 4, 7, 12, 15.
Find the standard deviation of the sample.
Solution:
1. The mean of the numbers is
2+4 +7+12+15 40
x́= = =8
5 5
2. For each number, calculate the deviation between the number and the mean.
x x−x́
2 2−8=−6
4 4−8=−4
7 7−8=−1
12 12−8=4
15 15−8=7
3. Calculate the square of each deviation in Step 2, and find the sum of those
squared deviations.
x x−x́ x−x́
2 2−8=−6 (−6 )2=36
4 4−8=−4 (−4 )2=16
7 7−8=−1 (−1 )2=1
12 12−8=4 4 2=16
15 15−8=7 72 =49
∑ ¿118
4. Because we have a sample of n=5 values, divide the sum 118 by n−1, which is
4.
118
=29.5
4
5. The standard deviation of the sample is s= √ 29.5 . To the nearest hundredths, the
standard deviation is s=5.43.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy