Statistical Formula Sheet 1: X X N X N X F X N
Statistical Formula Sheet 1: X X N X N X F X N
n IL
Ranged table median formula: Median ' L % & CF @
2 f
L is the beginning of the interval for the range that holds the median value. n is the total
observations for the table(sum of frequencies). CF is the cumulative frequency of the ranges prior
to the range of the median value. f is the frequency of the range that holds the median value. IL is
the interval/range length( IL = range high - range low + 1 ).
N N N N N N
Formulas for the sample standard deviation and variance:
('X)2 ('X)2
'X 2 & 'X 2 &
n 'X 2 & n(X)2 n 'X 2 & n(X)2
s2 ' ' s ' '
n & 1 n & 1 n & 1 n & 1
('fX)2
'fX 2 &
n
Formulas for the sample standard deviation and variance from a range table: s '
n & 1
r µX
Binomial formula: P(r) ' nC r@ p @ q n&r Poisson Formula: P(x)'
X!@ e µ
Formulas for the probability standard deviation and variance: µ ' '[XP(X)] 2
' '[(X & µ)2@ P(X)]
σ
2
Standard probability formulas: p % q ' 1 µ ' np ' n@p@(1&p)
σ
X & µ s
σ
σ
X
n n
Http://www.zen.home.att.net
Troy E. O'Brien Page 1
s p(1 & p)
Confidence interval formulas: X ± Z p ± Z
n n
s p(1 & p)
Confidence interval formulas for small samples (n-1 degrees of freedom): X ± t p ± t
n n
2 2
z@s Z
Formulas for determining the size of a sample: n ' n ' p @(1 & p)@
E E
Set theory formulas: P(A or B) = P(A) + P(B) P(A or B) = P(A) + P(B) - P(A and B)
P(A) = 1 - P(~A) P(A and B) = P(A)P(B) P(A and B) = P(A)P(B|A)
Correlations( y = a + bx ): b ' j
xy & x̄ @j y j xy & x̄ @j y
a ' ȳ & bx̄ r '
j x & nx̄
2 2
(j x 2 & nx̄ 2)(j y 2 & nȳ 2)
j y & aj y & bj xy
2
se ' --- note: n - 2 degrees of freedom
n & 2
se
sb '
j x & nx̄
2 2
b & hypothetical value
These are used on the slope of the correlation: b ± t@sb And t '
sb
b
t ' - used to test if a correlations does exists: see *
sb
1 (x & x̄)2
For a given x value: sa % bx ' s e % (a % bx) ± t@sa % bx
n j x & nx̄
2 2
Http://www.zen.home.att.net
Troy E. O'Brien Page 2
Test of dependance/independence:
r r 2(n & 2)
t ' ' --- see *
(1 & r 2) (1 & r 2)
(n & 2)
* Ho: No Correlation or No Dependence
If t > t crit then positive dependence/correlation
If t < -t crit then negative dependence/correlation
If t > t crit or t < - t crit then dependent/correlate
Note - n -2 degrees of freedom
The F score is used when comparing the variance of two samples. This score is also used with the Anova
test on multiple groups to see if the variance is the same for all groups.
2
s1
F ' With n1 - 1 degrees of freedom in the numerator and n2 - 1 in the denominator
2
s2
General Note: Crit or critical scores are that are from tables.
Important Notes:
t-scores use n - 1 degrees of freedom unless two samples are involved then use n1 + n2 - 2
Probabilities are always in the range of 0 < p < 1. Negative or probabilities greater than 1 have no meaning.
If a range formula gives you a negative or greater than one probability, then stop/represent it as zero(for a
negative) or one(for anything greater than one).
P, in probability terms, stands for success and is calculated by the number of ways to succeed divided by
the total ways.
Http://www.zen.home.att.net
Troy E. O'Brien Page 3
Important to Know Definitions:
Mean - Arithmetic average of numbers.
Mode - The number that comes up the most often.
Median - The number in the middle when all the numbers are arrange from low to high. If between
two numbers, take the average of the numbers.
Expected Value - The mean value or the average value that is the expected outcome.
Standard Deviation - Measurement of the spread of a group of numbers.
Variance - The measurement of the squared distance of each number in a sample from the mean of
a sample. It is also the standard deviation squared.
Sample - A portion of a population taken to represent a population.
Population - All data that represents a group. IE Davenport has a school population which they have
complete data representing that group. Secretary of State has data on the population of
valid Michigan Drivers.
Statistic - A single measurement or calculated value from a group of numbers.
Statistics - Measurements or calculated values used to interpret a problem, sample, or population.
Combination - Selecting a group when order does not matter. IE ABC is the same as CAB.
Permutation - Selecting a group when order matters. IE ABC is not the same as BAC.
Tree Diagram - A diagram used to show multiple events by using a hierarchy just like the diagram used for
tree factoring.
Venn Diagram - A diagram made of circles to show the relationships of sets and subsets. Helps
organize data to eliminate repetition.
Survey - Taking a random sample of the population to measure some property of the population.
Bias - Using statistics to prove your point without reguard to the actual statistics. This is also caused by
poor wording of questions or leading people into giving an answer. Meaningless statistics are
created by bias. Do not be bias with your statistics.
Fun - Something statistics can be if you come to class well prepared.
Dumb Question - No such thing.
Recursion - See recursion.
Puns - Type of joke often told by your instructor and usually followed by groans and/or moans.
Wordprocessor - What you should use to type up your paper.
Binomial - Event with only two possible outcomes, success and failure. When up of several sub
binomial events, the binomial formula may be used.
Midpoint - The value calculated in a range table by adding the high and low of each interval and
dividing that result by two. It is also used as the x vaule for calculating the mean, mode,
and standard deviation of a range table.
Discrete Numbers- These are numbers that fall on a number line and include data like salaries, ages,
distance, etc
Non-Discrete Numbers - These are categorical numbers that are turned into percentages. They
include data like gender, ethnic background, yes/no survey questions, etc
Histogram - A bar chart used to show relationships between a dependant and independent variable.
Central Limit Theorm - For almost all population, the sample distributions of X is approximately normal
when the simple random sample size is sufficiently large.
Http://www.zen.home.att.net
Troy E. O'Brien Page 4
Logic And Sets Handout
p ~p
T F
F T
p q p ¸ q p º q p 6 q p : q
T T T T T T
T F F T F F
F T F T T F
F F F F T T
Hints:
Make Into
Where:
A={1,2,4,5} B={2,3,4,6}
C={4,5,6,7} U={1,2,3,4,5,6,7,8}
Http://www.zen.home.att.net
Troy E. O'Brien Page 5
Variable
Meaning
Sample Population
n N Number of observations
s Standard deviation
p̄ p Probability mean
s2 2 Variance
Subscripts are used with variables/symbols to show which belong to which sample. X1, s1, and n1 are all
from the same sample.
Also remember that "YES" is a dirty word when doing tests of hypothesis. You do not want to make absolute
statements because your level of significance is how often that you may be wrong. In addition, less-than and
more-than are the key words for a one tail test. Different or not the same is your clue for a two tail test.
Http://www.zen.home.att.net
Troy E. O'Brien Page 6
The Bell Curve
Statis tics
uses th e area
under thi s curve
A B C D
Negative Z-scores 0 Positive Z-scores
Negative t-scores 0 Positive t-scores
Z-scores and t-scores measure the distance from the center of the curve to the point on the line
where the desired amount of area has been accumulated. In the above example, there are four divisions
under the curve. Please note that A + B + C + D = 1 and A + B = .5 and C + D = .5. In a one tailed test,
either A or D will be equal to the level of significance. In a two tailed test, A and D will both be equal to half
of the level of significance. Remember: Level of Confidence + Level of Significance = 1
In testing of hypothesis, Ho is always equality and Ha is either <, >, or …. To check for #, you use > in Ha. To
check for $, you use < in Ha.
2) Determine Ho and Ha. Knowing the variables is needed for this step.
4) Draw a bell curve and show the tail(s). Also determine the Z-score(s) or t-score(s) for the reject
criteria.
5) Use the proper formula to determine the Z-score or t-score for the test. This can easily determined
from step 1 where you listed out the variables of the problem.
6) Show the result on the graph you drew. This will let you know whether to accept Ho or Ha.
Http://www.zen.home.att.net
Troy E. O'Brien Page 7
ANova Help Sheet
Group 1 Group 2 Group 3 Group 4
Notes Grand
x x2 x x2 x x2 x x2 Totals
T2 T2
sum of squares treatment: sst ' j ( c ) & j
( x)2
sum of squares error: sse ' j (x 2) & j ( c )
nc N nc
sst
k - 1 degrees of freedom in the numerator
k & 1
F ' For F score from table
sse N - k degrees of freedom in the denominator
N & k
From above :
922 1292 1372 1502 5082
sst ' ( % % % ) & ' 7826.236111111 & 7820.121212121 ' 6.1148989899
6 8 9 10 33
Http://www.zen.home.att.net
Troy E. O'Brien Page 8
Chi aka - Help sheet
(fo & fe)2
2
' j[ ]
χ
fe
n
fe '
number of groups
example
Peaches 23 30 -7 49 1.6333333
Apples 31 30 1 1 0.033333
Mangos 25 30 -5 25 0.8333333
Grapes 29 30 -1 1 0.033333
Lemon 37 30 7 49 1.6333333
Lime 35 30 5 25 0.8333333
Totals 180 180 0 150 5
Http://www.zen.home.att.net
Troy E. O'Brien Page 9
for the following example table:
we get the expected value for each cell by multiplying the corresponding column and
row totals then dividing it by the grand total(all values came for the above table to get
this):
finally we take the difference of the observed and expected values, square it, and divide
by the expected value for each cell - not forgetting to get the column/row totals:
Http://www.zen.home.att.net
Troy E. O'Brien Page 10
Sign Test
Hypothesis:
Ho: p = .5
Ha: p <> .5
Http://www.zen.home.att.net
Troy E. O'Brien Page 11