0% found this document useful (0 votes)
227 views

Biostatistics (Midterm)

The document discusses key concepts in biostatistics including populations, variables, and statistical analyses. It defines a population as the group of interest being studied. Variables are characteristics of the population that can be measured, and are classified as independent, dependent, confounding, or background. Statistical analyses examine relationships between variables to test differences between populations or evaluate programs and their outcomes.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
227 views

Biostatistics (Midterm)

The document discusses key concepts in biostatistics including populations, variables, and statistical analyses. It defines a population as the group of interest being studied. Variables are characteristics of the population that can be measured, and are classified as independent, dependent, confounding, or background. Statistical analyses examine relationships between variables to test differences between populations or evaluate programs and their outcomes.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

MODULE 1 Statistical analyses are based on three primary entities:

INTRODUCTION TO BIOSTATISTICS 1. The population (U) that is of interest


Biostatistics 2. The set of characteristics (variables) of the units of this population
(U)
 Analysis and interpretation is done using biostatistics.
3. The probability distribution (p) of these characteristics in the
 The word “statistics” comes from the Italian word “statista”
population
meaning “stateman” or the German word ”statistik” which means a
political state. The Population (U)
 The science of statistics is said to have developed from registration The population (U) is a collection of units of observation that are of
of heads of families in ancient Egypt to the Roman census on interest and is the target of the investigation.
military strength, births and deaths, etc. and found its application
Ex. In determining the effectiveness of a particular drug for a disease, the
gradually in the field of health and medicine.
population (U) would consist of all possible patients with this disease.
 John Graunt (1620 – 1674), who was neither a physician nor a
It is essential, in any research study, to identify the population (U) clearly
mathematician is considered the father of health statistics.
and precisely. The success of the investigation will depend to a large
 Statistics is the science of compiling, classifying, and tabulating extent on the identification of the population (U) of interest.
numerical data and expressing the results in a mathematical or
The Variable (V)
graphical form.
A variable is a state, condition, concept or event whose value is free to
 Biostatistics is that branch of statistics concerned with mathematical
vary within the population.
facts and data related to biological events.
Once the population is identified, clearly define what characteristics of
Uses:
the units of this population (subjects of the study)are to be investigated.
1. To test whether the difference between two populations is real or a
Ex.:
chance occurrence
In the case of a particular drug, one needs to define the disease and what
2. To evaluate the correlation between attributes in the same population
other characteristics of the people (e.g. age, sex, education, etc.) one
3. To evaluate the efficacy of vaccines, sera, etc. intends to study.
4. To measure mortality and morbidity  The Variable (V)
5. To evaluate achievements of public health programs Clear and precise definitions and methods for measuring these
characteristics (a simple observation, a laboratory measurement, or tests
6. To fix priorities in public health programs
using a questionnaire) are essential for the success of the research study
7. To help promote health legislation and create administrative standards
 Classifications of Variables
for oral health
1. Independent Variables
Basis for Statistical Analysis
Variables that are manipulated or treated in a study in order to see what Ex.: age, sex, ethnic origin, education, marital status, social status
effect, differences in them will have on those variables proposed as being
Synonyms:
dependent on them.
demographic profile
Synonyms:
VARIABLES
cause, input, predisposing factor, antecedent, risk factor, characteristic,
attribute, determinant Another way of classifying variables is by form. Understanding this
classification is essential to selecting the appropriate statistical test to
2. Dependent Variables
analyze data. Broadly, variables can be classified into categorical and
Variables in which changes are results of the level or amount of the continuous variables.
independent variable or variables.
Categorical variables can be further divided into nominal and ordinal
Synonyms: variables. In a nominal scale, discrete categories do not have a
quantitative relationship with each other. A nominal scale, for example,
Effect, outcome, consequence, result, condition, disease
records eye color as blue/green/brown/hazel or answers to a question as
Ex.: yes/no. As implied, ordinal variables consist of ordered categories;
however, the difference between the categories is not specified. The use
Dependent Variables – DMFT, dfT, presence or absence of carious of A, B, and C letter grades is an example of an ordinal scale. Unless
lesions specific numeric quantities are assigned, the difference between a C and
Independent Variable – exposure to preventive program a B is not necessarily the same as the difference between a B and an A.

3. Confounding or intervening variables VARIABLES

Variables that should be studied because they may influence or  Continuous variables represent measured quantities (e.g., blood
“confound” the effect of the independent variable(s) on the dependent pressure and temperature).
variable(s).  Continuous variables may be divided into interval and ratio
variables.
Ex.  The points of an interval scale are equally spaced, and the
In a study of the effect of tobacco (independent variable)on oral cancer difference between two points is meaningful (e.g., the difference
(dependent variable), the nutritional status of the individual may play an between 30 degrees Celcius and 31 degrees Celcius is the same
intervening role. as 89 and 90 degrees Celcius). However, 100 degrees Celcius is
not twice as hot as 50 degrees Celcius. As you may guess, the
ratio between points on a ratio scale has meaning.
 Age is an example of a ratio scale. Therefore, David, who is 18,
is twice as old as his brother Michael, who is 9. It should be
4. Background Variables noted that continuous variables, whether interval or ratio, are
analyzed in the same way.
Variables that are so often of relevance in investigations of groups or
 DMFT and dfT outcomes are continuous, inter-scale variables,
populations that they should be considered for possible inclusion in the
whereas presence or absence of carious lesions is a nominal
study.
categorical variable.
SAMPLING FROM A POPULATION interest or variability may be low enough that the effort and cost
of probability sampling outweighs the risk of drawing a biased
 In statistical language, the subset of the population of interest in sample.
a study is called a sample.
 There are several subtypes of non-probability samples.
 Usually, we wish to draw conclusions about some numeric
 Cluster sampling divides the population into small groups
aspect of the population.
(clusters), draws a simple random sample of clusters, and
 In statistical terms, parameter is a numeric characteristic of the assesses every subject in the sampled clusters. This may be a
population. A parameter has a set value, but we usually do not good approach when cost and time to travel between randomly
know the value. selected subjects would be prohibited.
 A statistic is a numeric characteristic of the sample. We can  A quota sample is drawn by selecting items or people in a block
know the value of a statistic in our sample, but the value will of predetermined siz. For example, you may select the first ten
change from sample to sample. women, without regard for the pool they may represent. Finally,
 It is important that the sample be representative of the a convenience sample, a its name suggests, is selected on the
population of interest from which it was drawn because the basis of convenience to the researcher, with little concern for
statements (or inferences) about the whole population may be representativenss. The types of samples are summarized as
made from the measurements taken on the sample. follows:
 If a sample is not representative of the population of interest, it is
a biased sample. For example, in caries prevalence measures, Probability Sample: A sampling from a population that you can
schoolchildren living in fluoridated community would be a identify aand to which you have access to all members.
biased sample of all children because, as a group, they would Simple Random Sample: Each item or person in the population of
have a lower prevalence than the entire population of interest. interest has an equal and independent chance of being selected.
 The best way to ensure a representative, unbiased sample is to
perform simple random sampling. A simple random sample is Stratified Random Sample: Random sampling carried out in
one in which every item or person in the population has an equal subgroups of a population to ensure that selections will be made from
and independent chance of being selected. A simple random each level of the subgroup.
sample is an example of a probability sample.
Nonprobability Sample: A sampling when you cannot identify or do
 Probability samples are those drawn when you are able to
not have access to the entire population of interest.
identify and have access to all members of the population of
interest. Cluster Sample: Drawing a simple random sample of small groups
 A stratified random sample, another type of probability (clusters)of the population and assessing each subject in the sampled
sample, is a variant of the simple random sample. This sampling cluster.
scheme is random sampling carried out in subgroups of a
Quota Sample: Sampling items or people in a block of predetermined
population to ensure that selections will be made from each level
size.
of the subgroup.
 For example, you may take steps to ensure that every age, sex, Convenience Sample: A sampling scheme in which the subjects are
race, or social stratum subgroup is represented in sufficient selected, partly or entirely, at the convenience of the researcher.
numbers in the sample.
 At times, a probability sample may not be possible or warranted.  The Probability Distribution (P)
For example, you may not have access to the entire population of
The most crucial link between the population and its characteristics. There are 2 steps in data analysis:
which allows us to draw inferences on the population based on sample
1st Step: Calculate the descriptive statistics. The
observations, depends on this probability distribution.
characteristics of the data found within the sample of individuals in
How to use probability distribution? whom the study was conducted.
Probability distributions indicate the likelihood of an event or outcome. 2nd Step: Calculate the inferential statistics. The purpose of
Statisticians use the following notation to describe probabilities: p(x) = generating inferential statistics is to determine whether the results found
the likelihood that random variable takes a specific value of x. The sum in the sample may be a result of chance or, assuming no other threats to
of all probabilities for all possible values must equal 1. validity, whether we can generalize our results to the general population
of interest.
What does a probability distribution (p) indicates?
A probability distribution (p) indicates the possible outcomes of a
random experiment and the probability that each of those outcomes will
occur.
Why do we use probability distributions?
Probability distributions are a fundamental concept in statistics. They
are used both on a theoretical level and a practical level. Some
practical uses of probability distributions are: To calculate confidence
intervals for parameters and to calculate critical regions for hypothesis
tests.
What is an example of probability distribution?
The probability distribution of a discrete random variable can always
be represented by a table.
For example, suppose you flip a coin two times. ... The probability of
getting 0 heads is 0.25; 1 head, 0.50; and 2 heads, 0.25. Thus, the table is
an example of a probability distribution for a discrete random variable.
What is the difference between discrete and continuous probability
distributions?
A probability distribution may be either discrete or continuous.
A discrete distribution means that X can assume one of a countable
(usually finite) number of values, while a continuous
distribution means that X can assume one of an infinite (uncountable)
number of different values.
DATA ANALYSIS
Module 2
MEASURES OF CENTRAL TENDENCY  The mean only makes sense in the context of continuous variables;
however, in practice, the mean is also frequently calculated for
LEARNING OUTCOMES: ordinal variables with many levels, for example, age in years.
At the end of the topic, students are able to:
MEDIAN
1. Define central tendency, mean, median, and mode.
2. Differentiate mean, mode, and median.  The median of a sample is the middle item (midpoint) of a data set,
3. Determine the mean, mode, and median of a given data. which will divide a data set arranged in order in half.

A measure of central tendency is a single value that attempts to describe a  To find the median, the data must first be arranged in order of
set of data by identifying the central position within that set of data. increasing value. Continuing the example from the mean, this results
Measures of central tendency attempt to identify the middle of a distribution in  0, 1, 2, 3, 4. In this case, the median is 2. This example is
to provide one sample statistic that describes the character of an entire data straightforward because there were odd number of observations, with
set. a single observation in the middle to serve as a median. If there is an
even number of observations, then the median is the mean of the
Three Measures of Central Tendency middle pair of observations. To illustrate:
1. Mean  If you had collected  dmfs data on one additional child, you would
2. Mode have the following observations:    0, 1, 2, 3, 3, and 4.
3. Median  The middle pair of observations is formed by   2  and  3  , and the
mean of this pair is   ( 2 + 3 ) / 2  =  5 / 2  =  2.5 .  
MEAN
 The sample mean of a data set is the arithmetic average, which is the  Thus, the median in this case is 2.5 Note that 3 observations fall
sum of observations divided by the number of observations. below 2.5 and 3 observations fall above 2.5.
 Because it would be very difficult to visually judge the location of
o Ex.: If you measured decayed, missing, or filled primary the middle point in a large data set, out of  n  ordered observations,
tooth surfaces (dmfs) among five (5) first grade the  (( n + 1 ) / 2)th    observation is the median. Using this technique
schoolchildren and obtained the following data:  0, 3, 1, 2, to identify the median in an odd number of observations is
and 4 straightforward, so we will illustrate this technique on the data set
o The mean, or average, dmfs would be:   with an even number of observations, so the  ( 6 + 1 ) / 2  =  3.5th
( 0 + 3 + 1 + 2 + 4 )  / 5  =  10/5  =  2.0 observation should be the median.

 By substituting symbols for these numbers, we can represent the  The 3.5th observation is midway between 2 and 3 (0, 1, 2,….3, 3, 4),
general formula for the mean. Each symbol, X1, X2, X3 ….. , etc. to or 2.5, agreeing with what was concluded in the mean.
Xn, represents an individual observation, where n is the total number  Median is based only on the order of information of data (i.e., how
of observations. The mean of the sample is represented by the many observations are above and below a given point). Therefore
symbol Ẋ (x-bar). Thus, the formula for the mean would be: median is useful for describing the central tendency of ordinal
                      categorical variables, as well as continuous variables.
                                    X1 +  X2 + X3 ….. + Xn                             Median is not influenced by and does not convey the actual numeric
                               Ẋ  =             ------------------------------------
values of the observations
                                                                                   n
 The mean is the most commonly used measure of central tendency. MODE
 The mode is the most frequently occurring value in a set of
observations. Again, it is convenient to arrange the observations in
increasing order to judege how often a value occurs.   In case of a positively skewed frequency distribution, the mean is
o Ex.: The mode of 0 , 1 , 2, 3 , 3 , 4 data set is  3  because this
value occurs twice while all other values occur only once.
 If our data set were 0 , 1 , 1 ,1 , 2 , 3 ,3 , 4 there would be two
modes,  1  and  3 , and this data set would be called bimodal. 
 When all values occur with the same frequency, the data set is said to
have no mode.
 The chief advantage of mode is that it is the only measure of central
tendency that makes sense for nominal categorical variables, such as
eye color. It would not make sense to place eye color in ascending
order to identify the median, nor would it make sense to identify the
average eye color. It would, however, be perfectly sensible to say always greater than median and the median is always greater than
that the most frequent eye color in a given sample is brown. the mode.
Otherwise, mode is not often used, as it records only the most
frequent value, which may be far from the center of the distribution MODULE 3
of values.
MODULE 3
MEAN, MODE, and MEDIAN MEASURES OF DISPERSION:
 The relationship between mean, median, and mode may be RANGE, MEAN DEVIATION, AND STANDARD DEVIATION
graphically appreciated through the frequency curve of a distribution.
The frequency curve is simply a smooth version of the histogram. MEASURES OF VARIABILITY
The mode is the highest point of the curve; the median is the value  In statistics, variability (also called dispersion, scatter, or spread) is
that divides the area under the curve in half. The position of the the extent to which a distribution is stretched or squeezed. 
mean is slightly more difficult to conceptualize. If you think of the  It describes the characteristics of a set of samples in a given
curve as a solid object, the mean would be the point at which the population. 
shape would balance.  When the measure of variability or dispersion is small, the group is
 The mean, median, and mode coincide on a symmetrical frequency more or less homogeneous but when the measure of variability is large,
curve. If, however, the distribution is skewed, the mean is drawn the group is more or less heterogeneous.
toward the long tail of the distribution, again demonstrating how o Common examples of measures of variability are the: 
sensitive the mean is to extreme values. 1. Range 
2. Standard Deviation
Graphical Relationships between Mean, Median, and Mode 3. Variance
 If a frequency distribution graph has a symmetrical frequency curve, RANGE
then mea n, median, and  In statistics, a range shows how spread out a set of data is. The
mode wil l be equal.  bigger the range, the more spread out the data. If the range is small,
the data is closer together or more consistent. The range of a set of
numbers is the largest value, subtract the smallest value.
 In statistics, a range shows how spread out a set of data is. The
bigger the range, the more spread out the data. If the range is small,
the data is closer together or more consistent. The range of a set of SD2  =  ∑ (X - Ẋ)2
numbers is the largest value, subtract the smallest value.                                       N-1
 In statistics, a range shows how spread out a set of data is. The Where:
bigger the range, the more spread out the data. If the range is small, o SD2 = variance
the data is closer together or more consistent. The range of a set of o ∑ (X - Ẋ)2    = sum of squares of deviations about the mean
number is the largest value, subtract the smallest value. sum variance
o  N =  number of cases
 The Range is the difference between the lowest and highest values. Standard Deviation
Example: 
 In statistics, the standard deviation is a measure of the amount of
o Formula:     Range= Largest value (L) – Smallest Value  variation or dispersion of a set of values. A low standard deviation
indicates that the values tend to be close to the mean of the set, while
 In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9. So
a high standard deviation indicates that the values are spread out over
the range is 9 − 3 = 6.
a wider range.
 The range is useful for showing the spread within a dataset and for  Standard deviation tells you how spread out the data is. It is a
comparing the spread between similar datasets.  measure of how far each observed value is from the mean. In any
 The prime advantage of this measure of dispersion is that it is easy distribution, about 95% of values will be within 2 standard
to calculate. On the other hand, it has lot of disadvantages. It is very deviations of the mean
sensitive to outliers and does not use all the observations in a data  The standard deviation measures the spread of the data about the
set. mean value. ... For example, the mean of the following two is the
same: 15, 15, 15, 14, 16 and 2, 7, 14, 22, 30. However, the second is
 Outlier - is an observation that lies an abnormal distance clearly more spread out. If a set has a low standard deviation, the
from other values in a random sample from a population. values are not spread out too much.
Ex.: 71, 70, 73, 70,70, 69, 70, 72, 71, 300, 71, 89   ….. 300
is an outlier Formula:
                           √∑ (X - Ẋ)2                              
Merits of Range:  N-1
 It is simple to understand and easy to calculate. Where:   
 It is less time consuming. o SD    =    Standard Deviation
o X      =   Score
Demerits of Range: o Ẋ      =   Arithmetic Mean
 It is not based on each and every item of the distribution. o  ∑ (X - Ẋ)   =   Sum of the deviations of X and the mean
 It is very much affected by the extreme values. o N      =  Total number of cases
 We define range in such a way so as to eliminate the outliers and Procedures:
extreme points in the data set.
1. Find the arithmetic mean. 
The Variance 2. Determine the deviation of each score (x) from the mean which
 It is the square of the standard deviation an is also known as the equals to X - Ẋ
mean square. 3. Square each deviations to get ∑ (X - Ẋ)2
4. Sum the squared deviations to get ∑ (X - Ẋ)2
Formula:          5. Divide the sum by N – 1 to find ∑ (X - Ẋ)2                              
N-1
6. Extract the square root of the result.  SD   = √ 11,861,725.00 - 11,481,932.25

Computation of Standard Deviation from Ungrouped Data               2,500-50

 SD   = √  379,792.75

                    2,450
 SD = √∑ (X - Ẋ)2                              
 SD = √ 155.01744898
                N-1
 SD   =  12.45
 SD =  √ 832.50
Illustration of Variance and Standard Deviation
              10-1
 You want to find the variance and SD of the five observations:  2, 

7,  5,  3,  and  10.


 To do so, you would follow 4 steps:
1. Compute the mean of the observations:  Ẋ = 2+7+5+3+10 = 27 = 5.4
5              5

      = √ 92.5

 SD  =  9.62

Midpoint Method Computation of Standard Deviation from Grouped


Data

 SD = √  N ∑f M2  -  (∑ fM) 2

                        N2   -  N

 SD   = √  50 (237,234.50)  - 3,388.52
2. Determine the squared difference (deviation) between each
                                           
502  - 50 observation x and the mean.
3. Calculate the variance by determining the mean squared deviation: ecause it was first discovered by Carl Friedrich Gauss. The normal
Variance = sum of squared deviation  =  41.20    =   8.24 distribution is a continuous probability distribution that is very
number of observation           5 important in many fields of science.
4. Determine the SD by taking the square root of the variance:
SD = √ variance  = √ 8.24  =  2.87  

 You can interpret the SD as a type of average deviation of the


observations from the mean. 
 If the observations are close together, the SD is small.
 In the extreme case, in which all observations have the same value,
the SD will be zero.
 As the observations become more spread out, the SD increases.
 Like the mean, the SD
may be strongly
influenced by
unusually high or What is a standard normal curve in statistics?
low values (outliers).  A normal distribution with a mean of 0 and a standard deviation
of 1 is called a standard normal distribution. ... Since
Module 4 the distribution has a mean of 0 and a standard deviation of 1, the Z
THE STANDARD column is equal to the number of standard deviations below (or
NORMAL CURVE above) the mean.

STANDARD  For the standard normal distribution, 68% of the observations lie


NORMAL CURVE within 1 standard deviation of the mean; 95% lie within
SYNONYMS: two standard deviation of the mean; and 99.9% lie within
 normal distribution 3 standard deviations of the mean.
 Gaussian distribution
 standard normal distribution
 Z distribution
 bell curve

THE STANDARD NORMAL


CURVE/NORMAL
DISTRIBUTION
 The normal
distribution i
s a probability
distribution. It  Many values follow a normal distribution. This is because of
is also the central limit theorem, which says that if an event is the sum of
called Gaussi identical but random events, it will be normally distributed. Some
an examples include:
distribution b  Height
 Test scores  z for any particular x value shows how many standard deviations x
 Light intensity (so-called Gaussian beams, as in laser light) is away from the mean for all x values.
 Intelligence is probably normally distributed. There is a
problem with accurately defining or measuring it, though.

 Insurance companies use normal distributions to model certain


average cases.

What does a normal


distribution curve tell us?
 The n ormal
distribution is
a pro bability function
that describes how the
values of a variable What does a normal curve
are distributed. It is a show?
symmetric distribu  In statistics, the theoretical curve that shows how often an
tion  where most of the experiment will produce a particular result. The curve is symmetrical
observations cluster and bell shaped, showing that trials will usually give a result near the
around the central peak and the probabilities for values further away average, but will occasionally deviate by large amounts.
from the mean taper off equally in both directions.

How do you
convert a normal
distribution into The Empirical Rule
a standard
 If X is a random
normal curve?
variable and has a normal
 Any point (x) from a normal
distribution w ith mean µ and
distribution can be converted to
the standard normal distribution (z) standard deviation σ,
with the formula z = (x- then the Empirical
mean) / standard deviation.  Rule says the following:
 About 68% of the x values lie between the range between µ – σ
and µ + σ (within one standard deviation of the mean).
 About 95% of the x values lie between the range between µ – 2σ
and µ + 2σ (within two standard deviations of the mean).
 About 99.7% of the x values lie between the range between µ – 3σ
and µ + 3σ(within three

standard deviations of the mean). Why is normal curve important?


Notice that almost all the x-values/data lie within three standard  The normal distribution is the most important probability
deviations of the mean. distribution in statistics because it fits many natural phenomena. 
 The empirical rule is also known as the 68-95-99.7 rule.  For example, heights, blood pressure, measurement error, and IQ
scores follow the normal distribution. 
What is the difference between normal curve and normal distribution?
 Both are normal distributions but standard normal distribution is How do I determine whether my data are normal?
a particular case of Normal distribution when the mean is 0 and 1. Look at a histogram with the normal curve superimposed. A
standard deviation or variance is 1. histogram provides useful graphical representation of the data.
 This makes Y a “Standard” distribution. 
 So basically X is Normally distributed and Y is “Standard” normally
distributed.

What are 3 characteristics of a normal curve?


 Properties of a normal distribution
1. The mean, mode and median are all equal. 
2. The curve is symmetric at the center (i.e. around the mean, μ).
Exactly half of the values are to the left of center and exactly half the
values are to the right. 
3. The total area under the curve is 1.
test (K-S) and Shapiro-Wilk (S-W) test are designed to test normality
by comparing your data to a normal distribution with the same mean
and standard deviation of your sample. If the test is NOT significant,
 Skewness involves the symmetry of the distribution. Skewness that then the data are normal, so any value above .05 indicates normality.
is normal involves a perfectly symmetric distribution. A positively If the test is significant (less than .05), then the data are non-normal
skewed distribution has scores clustered to the left, with the tail  See the data below which indicate variable1 is normal, and variable2
extending to the right. A negatively skewed distribution has scores is non-normal. Also, keep in mind one limitation of the normality tests
clustered to the right, with the tail extending to the left. Skewness is is that the larger the sample size, the more likely to get significant
0 in a normal distribution, so the farther away from 0, the more non- results. Thus, you may get significant results with only slight
normal the distribution. deviations from normality when sample sizes are large.

2. The histogram above for variable1 represents perfect symmetry 4. Look at normality plots of the data. “Normal Q-Q Plot” provides a
(skewness) and perfect peakedness (kurtosis); and the descriptive graphical way to determine the level of normality. The black line
statistics below for variable1 parallel this information by reporting indicates the values your sample should adhere to if the distribution
"0" for both skewness and kurtosis. The histogram above for was normal. The dots are your actual data. If the dots fall exactly on
variable2 represents the black line, then your data are normal. If they deviate from the
black line, your data are non-normal. Notice how the data for

positive skewness (tail extending to the right); and the descriptive


statistics below for variable2 parallel this information.
variable1 fall along the line, whereas the data for variable2 deviate
   from the line.

3. Look at established tests for normality that take into account both
Skewness and Kurtosis simultaneously. The Kolmogorov-Smirnov
  Y, is the number of heads we get from tossing two coins,
  then Y could be 0, 1, or 2.
 
 
 
 
 
 
 
 
   A fundamental task in many statistical analyses is to characterize
    the location and variability of a data set. A further characterization of
THE NORMAL CURVE/NORMAL DISTRIBUTION
the data includes skewness and kurtosis.
What Is a Probability Distribution?  Skewness is a measure of symmetry, or more precisely, the lack of
symmetry. A distribution, or data set, is symmetric if it looks the
 A probability distribution is a statistical function that describes all same to the left and right of the center point.
the possible values and likelihoods that a random variable can take  Kurtosis is a measure of whether the data are heavy-tailed or light-
within a given range. This range will be bounded between the tailed relative to
minimum and maximum possible values, but precisely where the a normal
possible value is likely to be plotted on the probability distribution
depends on a number of factors. These factors include the
distribution's mean (average), standard deviation, skewness,
and kurtosis.
 A random variable is a variable whose value is unknown or a
function that assigns values to each of an experiment's
outcomes. ... Random variables are often used in econometric or
regression analysis to determine statistical relationships among one
another.
distribution. That is, data sets with high kurtosis tend to have heavy
 EX.: A typical example of a random variable is the outcome tails, or outliers. Data sets with low kurtosis tend to have light tails,
of a coin toss. Consider a probability distribution in which or lack of outliers. A uniform distribution would be the extreme case.
the outcomes  The histogram is an effective graphical technique for showing both
of a random the skewness and kurtosis of data set. A histogram is an approximate
event are representation of the distribution of numerical data. It was first
not equally introduced by Karl Pearson. It is a diagram consisting of rectangles
likely to whose area is proportional to the frequency of a variable and whose
happen. If width is equal to the class interval.
random
variable,
greater than +1, the distribution is too peaked. Likewise, a kurtosis of less
than –1 indicates a distribution that is too flat.
How do you interpret skewness?
Interpreting
1. If skewness is less than −1 or greater than +1, the distribution is highly
skewed.
2. If skewness is between −1 and −½ or between +½ and +1, the
distribution is moderately skewed.
3. If skewness is between −½ and +½, the distribution is approximately
symmetric.

 Skewness refers to For kurtosis, the general guideline is that if the number is greater than +1,
a distortion or asymmetry that deviates from the symmetrical bell the distribution is too peaked. Likewise, a kurtosis of less than –1 indicates
curve, or normal distribution, in a set of data. ... A normal a distribution that is too flat. Distributions exhibiting skewness and/or
distribution has a skew of zero, while a lognormal distribution, for kurtosis that exceed these guidelines are considered nonnormal." (Hair et al.,
2017, p.
example, would exhibit some degree of right- skew.
 Kurtosis - the sharpness of the peak of a frequency-distribution What does the kurtosis value tell us?
curve.

 Kurtosis is a measure of whether the data are heavy-tailed or


light-tailed relative to a normal distribution. That is, data sets
with high kurtosis tend to have heavy tails, or outliers. Data sets with
low kurtosis tend to have light tails, or lack of outliers.

What is acceptable skewness and kurtosis?

 The values for asymmetry and kurtosis between -2 and +2 are


considered acceptable in order to prove normal univariate
distribution (George & Mallery, 2010). ... (2010) and Bryne (2010)
You can interpret the values as follows: argued that data is considered to be normal if skewness is between ‐2
"Skewness assesses the extent to to +2 and kurtosis is between ‐7 to +7.
which a variable's distribution is
symmetrical. ... For kurtosis, the
general guideline is that if the number is
What is the importance of skewness and kurtosis? What is the use of kurtosis?

Kurtosis is a statistical measure used to describe the degree to which


scores cluster in the tails or the peak of a frequency distribution. The
peak is the tallest part of the distribution, and the tails are the ends of the
distribution. There are three types of kurtosis: mesokurtic, leptokurtic, and
platykurtic.
 
Mesokurtic distributions have a kurtosis of zero, meaning that the
probability of extreme, rare, or outlier data is zero or close to zero.
Mesokurtic distributions are known to match that of the normal distribution,
or normal curve, also known as a bell curve. In contrast, a leptokurtic
distribution has fatter tails.
 
Leptokurtic distributions are statistical distributions with kurtosis greater
than three. It can be described as having a wider or flatter shape with fatter
tails resulting in a greater chance of extreme positive or negative events. It is
one of three major categories found in kurtosis analysis.

Skewness essentially measures the symmetry of the distribution, while


kurtosis determines the heaviness of the distribution tails. The understanding
shape of data is a crucial action. It helps to understand where the most
information is lying and analyze the outliers in a given data. In an
asymmetrical distribution a negative skew indicates that the tail on the left
side is longer than on the right side (left-skewed), conversely a positive skew
indicates the tail on the right side is longer than on the left (right-skewed).

The term "platykurtic"


refers to a statistical
distribution in which the
excess kurtosis
value is negative. For
this reason, a platykurtic
distribution will have
thinner tails than a normal distribution will, resulting in fewer extreme 3. If there is a reason to believe that the 2 variables are related and the
positive or negative events. computed r is high, these 2 variables are really meant as associated. On the
other hand, if the variables correlated are low (though theoretical related)
other factors might be responsible for such small associations.
MEASURES OF CORRELATION OR RELATIONSHIP 4. The meaning of correlation coefficient simply informs that when 2
variables change, there may be a strong or weak relationship taking place.
Measures of Correlation or Relationship
 This is used to find the amount and degree of relationship or the absence Pearson Product Moment Correlation (rxy)
of relationship between two sets of values, characteristics or variables.  This is a linear correlation necessary to find the degree of the association
 Correlation is a measure of degree of relationship between paired data, of 2 sets of variables, x and y.
however, it does not determine the cause and effect of relationship, but  This is the most common measure of correlation to determine
rather it focuses on the strength of the relationship between paired data. relationship between 2 sets of variables quantitatively.
 The coefficient of correlation which represents correlation value shows
Obtaining r from ungrouped data:
the extent to which two variables are related and to what extent
variations in one group of data go the variations in the other. RXY = N ∑xy – (∑x) – (∑y)
 Coefficient of correlation can vary from a value of 1.00 which means √(N ∑x2 – (∑x)2 ) x (N ∑y2 – (∑y)2 )
perfect positive correlation through (0), which means no correlation at
all, and -1.00 which means perfect negative correlation. Where: rxy = correlation between x and y
 Based from xy coordinate plane: ∑x = sum of x
A. Positive r: y increases as x increases ∑y = sum of y
B. r near zero: little or no linear relationship between x and y ∑xy = sum of the product of x and y
C. Negative r: y decreases as x increases N = number of cases
D. r = 1: a perfect positive linear relationship between x and y ∑x2 = sum of squared x
E. r = -1: a perfect negative linear relationship between x and y ∑y2 = sum of squared y
 Interpreting Coefficient of Correlation
+1 Perfect positive/negative correlation Steps in computing Pearson Product Moment Correlation (rxy)
+0.75 - +0.99 Very high positive/negative correlation
+0.50 - +0.74 High positive/negative correlation 1. Find the sum of x and y
+0.25 - + 0.49 Moderate small positive/negative correlation 2. Square all x and y values
+0.01 - +0.24 Very small positive/negative correlation 3. Sum x2 and y2
0.00 No correlation 4. Multiply x and y
5. Get the sum of the product xy
6. Apply the formula
Guide for the Interpretation of r:
1. The relationship of 2variables does not necessarily mean that one is the Ex,: What is the relationship between the age and dental caries status of
cause or effect of the other variable. It does not imply cause-effect elementary school children in Barangay x?
relationship.
2. When the computed r is high, it does not necessarily mean that the factor is
strongly dependent on the other.
2. Rank the second set of values (y)in the same manner as in step 1 and mark
them Ry.
3. Determine the difference in rank for every pair of ranks.
4. Square each difference to get D2.
5. Sum the square difference to find ∑ D2.
6. Compute Spearman Rho (rs) by applying the formula

Spearman Rank Correlation Coefficient or Spearman Rho (rs)


 Is a statistics which is used to measure the relationship of paired ranks
assigned to individuals cores on two variables.
 A correlation estimates the degree of association of 2 sets of variables in
at least an ordinal scale (first, second, third, and so on) so that the
subjects under study may be ranked in a 2 ordered series.
Formula:
rs = 1 – 6 (∑D2)
N3-N
Where:  It is the most widely used of the ranked correlations methods.
rs = Spearman rank correlation  It is much easier, and therefore, faster to compute
∑D2 = sum of the squared differences between ranks  This is for 30 cases or less only.
N = number of case
Procedure:
1. Rank the values from highest to lowest in the first set of variable (x) and Correlation Coefficient for Grouped data.
mark them Rx. The highest value is given the rank 1, the second, 2, and so Scatter Diagram or Scattergram
on.
 This is to measure the relationship of two sets of variables x and y when 6. Find the sum of ∑fdxdy in the column and in the row (∑fdxfdy); ∑fdxdy
the total number of cases N is equal or greater than 30. This is applicable in the colum should equal ∑fdxdy in the row
to a larger number of cases. 7. Substitute the values obtained by the product moment formula.

Formula:

CHI SQUARE DISTRIBUTION


 Chi-square distribution was discovered by Karl Pearson.
 The distribution was introduced to determine whether or not
discrepancies between observed theoretical counts were significant.
 The test used to find out how well an observed frequency distribution
Where: ∑fx = cell frequency of the column conforms to or fits some theoretical frequency distribution is referred to
as a “goodness of fit test”.
√fy = cell frequency of the rows
 Also, chi-square distribution can be used to test the normality of any
fx = frequency of the column distribution.
fy = frequency of the rows  Tables representing rows and columns, often called contingency table are
dx = deviation of the column used with chi-square distribution.
dy = deviation of the rows  The value of chi-square varies for each number of degrees of freedom,
fdx = sum of the products of f and d column one of the assumption that apply for a contingency table.
fdy = sum of the products of f and d rows
∑fdx2 = sum of the products of f and dx2 in the column TESTING GOODNESS OF FIT
∑fdy2 = sum of the products of f anddy2 in the rows  Testing goodness of fit can be used to test how well an observed
∑fdxfdyc = sum of fdxfdy in column frequency distribution fits to some theoretical frequency distribution.
∑fdxfdyr = sum of fdxfdy in rows  For example, we want to test the claim that the fatal accidents does not
occur at the different widths of the road.
Procedure:
1. Add the cell frequency of the column and rows (∑fx, ∑fy) Ho: X1 = X2 = X3 = X4
2.Choose a class in the column and in the row and mark off deviation (dx, Ha: X1 = X2 = X3 = X4
dy)
3. Find the sum of the products of f and d (∑fdx, ∑fdy) Width of the Road
4. Find the sum of the products of f and dx2 and f and dy2 (∑fdx2, ∑fdy2)
5. Find the product of deviation for each cell by multiplying dx bydy
corresponding to each frequency in the cell, and write the product on the
lower right hand corner of the cell
Where: OF = observed frequency music. At 0.01 level of significance, test the claim that style of studying is
EF = expected frequency independent of the listed age groups. The table below summarizes the
Χ2 = chi-square information.

Age Group

Contingency Table
 in contingency tables, we intend to test that the row variable is
independent of the column variable. Computation for expected frequency
for the contingency table is different from the one in the goodness of fit.
The expected frequency can be computed with the use of this formula:

Df = degrees of freedom = k - 1, where k = number of categories (4 -1 = 3)

Χ2 α 0.05 df 3 = 7.815
df = (r -1) (c -1)
Conclusion:
Since the computed Χ2 (3.73) is lesser (<) than the tabular Χ2 (7.815), the
null hypothesis is accepted that accidents does not occur on the different
width of the road.
Obtaining the expected frequency for variables with 2 or more groups, the
following formula is applied:

df = (r -1) (c -1)

Example:
Teenagers and young adults have their own styles of studying. Some prefer
to study with music, others do not. A group of psychologists conducted a
study to determine the particular age of the students who like studying with
procedure for deciding whether the results of a research study support a
particular theory which applies to a population.
Interpretation:
Hypothesis testing uses sample data to evaluate a hypothesis about a
The obtained Χ2 (0.05) is lesser than the tabular Χ2 (11.343) at 0.01 level of
population. A hypothesis test assesses how unusual the result is, whether it is
significance. Therefore, there is a sufficient evidence to accept the null
reasonable chancevariation or whether the result is too extreme to be
hypothesis. The style of studying is independent of the listed age groups.
considered chance variation.
Degrees of Freedom for the Chi-square Basic concepts
 The degree of freedom involved in the one variable chi-square is
determined by this formula: Null and research hypotheses
df = K – 1 Where: df = degree of freedom To carry out statistical hypothesis testing, research and null hypothesis are
K = number of categories employed:
On the other hand, the degrees of freedom to use in the two variable chi-square
are determined by the formula:  Research hypothesis: this is the hypothesis that you propose, also
df = (c – 1) (r – 1) Where: df = degrees of freedom known as the alternative hypothesis HA. For example:
c = the number of columns
H A:  There is a relationship between intelligence and academic results.
r = the number of rows
CHI-SQUARE DISTRIBUTION H A:  First year university students obtain higher grades after an intensive
USES OF CHI-SQUARE Statistics course.
1. It is used in descriptive research if the researcher wants to
determine the significant difference between the observed H A;  Males and females differ in their levels of stress.
frequency and expected (theoretical) frequencies from The null hypothesis (Ho) is the opposite of the research hypothesis and
independent variables. expresses that there is no relationship between variables, or no differences
2. It is used to test the goodness of fit where a theoretical between groups; for example:
distribution is fitted to some data.
3. It is used to test the hypothesis that the variances of a normal Ho: There is no relationship between intelligence and academic results.
population is equal to a given value.
Ho:   First year university students do not obtain higher grades after an
4. It is used for the construction of confidence interval for
intensive Statistics course.
variances.
5. It is used to compare two uncorrelated and correlated Ho: Males and females will not differ in their levels of stress.
proportions.
The purpose of hypothesis testing is to test whether the null hypothesis (there
is no difference, no effect) can be rejected or approved. If the null hypothesis
is rejected, then the research hypothesis can be accepted. If the null
Hypothesis testing hypothesis is accepted, then the research hypothesis is rejected.

When interpreting research findings, researchers need to assess whether these In hypothesis testing, a value is set to assess whether the null hypothesis is
findings may have occurred by chance. Hypothesis testing is a systematic accepted or rejected and whether the result is statistically significant:
 A critical value is the score the sample would need to decide against This example illustrates how these five steps can be applied to text a
the null hypothesis. hypothesis:
 If computed value is greater than (>) the critical value, reject the null
 Let’s say that you conduct an experiment to investigate whether
hypothesis and accept the alternative hypothesis.
students’ ability to memorize words improves after they have
 If computed value is lesser than (<) the critical value, accept the null
consumed caffeine.
hypothesis.
 The experiment involves two groups of students: the first group
 A probability value is used to assess the significance of the statistical
consumes caffeine; the second group drinks water.
test. If the null hypothesis is rejected, then the alternative to the null
hypothesis is accepted.  Both groups complete a memory test.
 If probability value is lesser than (<) the level of significance (α),  A randomly selected individual in the experimental condition (i.e.
reject the null hypothesis and accept the alternative hypothesis. the group that consumes caffeine) has a score of 27 on the memory
test. The scores of people in general on this memory measure are
 If probability value is greater than (>) the level of significance (α),
normally distributed with a mean of 19 and a standard deviation of 4.
accept the null hypothesis.
 The researcher predicts an effect (differences in memory for these
The hypothesis testing process groups) but does not predict a particular direction of effect (i.e.
which group will have higher scores on the memory test). Using the
The hypothesis testing process can be divided into five steps:
5% significance level, what should you conclude?
1. Restate the research question as research hypothesis and a null hypothesis
about the populations.
Step 1: There are two populations of interest.
2. Determine the characteristics of the comparison distribution.
Population 1: People who go through the experimental procedure
3. Determine the cut off sample score on the comparison distribution at (drink coffee).
which the null hypothesis should be rejected. Population 2: People who do not go through the experimental
procedure (drink water).
4. Determine your sample’s score on the comparison distribution.  Research hypothesis: Population 1 will score differently from
5. Decide whether to reject the null hypothesis. Population 2.
 Null hypothesis: There will be no difference between the two
populations.
Step 2: We know that the characteristics of the comparison distribution
(student population) are:
Population M = 19, Population SD= 4, normally distributed. These are
the mean and standard deviation of the distribution of scores on the
memory test for the general student population. 
Step 3: For a two-tailed test (the direction of the effect is not specified)
at the 5% level (25% at each tail), the cut off sample scores are +1.96
and -1.99.
Step 4: Your sample score of 27 needs to be converted into a Z value. To independent samples. In other words, if the t-test is statistically
calculate Z = (27-19)/4= 2 (check the Converting into Z scores section if significant, we would conclude the populations from which the
you need to review how to do this process)  samples were drawn had different population means. 
Step 5: A ‘Z’ score of 2 is more extreme than the cut off Z of +1.96 (see The terminology that we would use is that the groups are
figure above). The result is significant and, thus, the null hypothesis is significantly different from one another.
rejected.
To compute the independent samples t-test, we start by setting
up two columns, one for each group. In our example, we have labeled
CRITICAL VALUES OF CHI SQUARE TEST (0.01 & 0.05 LEVEL OF the columns Group 1 and Group 2, but we could also have labeled
SIGNIFICANCE) them X1 and X2, respectively. 
Note that we do not label them as X and Y, as we did in
computing correlations, because these are the same measures being
taken in each group. Therefore, we would use the same letter to
indicate this fact and use a subscript to indicate which group we are
talking about. This notation is used in the formula for the t-test. 
We put the scores in the two
columns. Note that we do not necessarily
have the same number of participants in each
group, so the number of scores in the two
columns may be different. Then, for
each column, you compute the following
values.
1. The sum of the column.
2. The sum of the squared values of the
column (square each score and
then sum it).
3. The sample size for the column.
4. The mean for the column (sum
LINK T-TEST divided by the sample size).

Independent Samples t-Test 5. The sum of squares (SS) for the


column. (The formula for the SSs is
The independent samples t-test, sometimes called the simple t- listed under the computational
test, tests the null hypothesis that there is no difference between two procedures for the variance.)
populations from which the samples are drawn do have different
means.
LINK STUDENT’S T-TABLE

Compute the value of t using the equation below. All of the


notation should be familiar at this point (mean, SS, N). The only
difference in the notation is that the means, SSs, and Ns are
subscripted to show which group they represent.

To evaluate the results, you compare the computed t to the


critical value of t. The critical value of t (obtained from
the Student's t Table) is 2.365 (alpha = 0.05 and df = N1 + N2 - 2 = 7).
Because the computed value of t (4.52) exceeds the critical value
(2.365), we reject the null hypothesis and conclude that the two
Z-TEST & TABLE OF CRITICAL VALUES Why do we use t-test and z-test?
Z-TEST
A z-test is a statistical test used to determine whether two
population means are different when the variances are known and the
sample size is large.
What is difference between z-test and t test?

Z Test is the statistical hypothesis which is used in order to


determine that whether the two samples means calculated are different
in case the standard deviation is available and sample is large whereas
the T test is used in order to determine how averages of different
Z-tests are statistical calculations that can be used to compare data sets differs from each other in case...
population means to a sample. T-tests are calculations used to test a
hypothesis, but they are most useful when we need to determine if
there is a statistically significant difference between two independent Why do we use t instead of z?
sample groups.
Normally, you use the t-table when the sample size is small
(n<30) and the population standard deviation σ is unknown. Z-scores
are based on your knowledge about the population’s standard
deviation and mean. T-scores are used when the conversion is made
without knowledge of the population standard deviation and mean.
How do you perform a two-sample z test? Z-test - Definition, Formula, Examples, Uses, Z-Test vs T-Test
DEFINITION
Z-test is a statistical tool used for the comparison or
determination of the significance of several statistical measures,
particularly the mean in a sample from a normally distributed
population or between two independent samples.
 Like t-tests, z-tests are also based on normal probability
distribution.
 Z-test is the most commonly used statistical tool in research
methodology, with it being used for studies where the sample size
is large (n&gt;30).
 In the case of the z-test, the variance is usually known.
 Z-test is more convenient than t-test as the critical value at each
significance level in the confidence interval is the sample for all
sample sizes.
Procedure to execute Two Sample Proportion Hypothesis Test
 A z-score is a number indicating how many standard deviations
1. State the null hypothesis and alternative hypothesis. above or below the mean of the population is.
2. State alpha, in other words determine the significance level.
3. Compute the test statistic.
4. Determine the critical value (from critical value table). FORMULA
5. Define the rejection criteria.
For the normal population with one sample:
6. Finally, interpret the result.

Where x̄1 and x̄2 are the means of two samples, σ is the standard


deviation of the samples, and n1 and n2 are the numbers of
observations of two samples.
 A two-tailed z-test is performed to determine the relationship
between the population parameters of the two samples.
One sample z-test (one-tailed z-test)
 In the case of the two-tailed z-test, the alternative hypothesis is
 One sample z-test is used to determine whether a particular accepted as long as the population parameter is not equal to the
population parameter, which is mostly mean, significantly different assumed value.
from an assumed value.  The two-tailed test is appropriate when we have H0: µ ≠ µ0 and Ha:
 It helps to estimate the relationship between the mean of the µ ≠ µ0 which may mean µ > µ0 or µ < µ0.
sample and the assumed mean.  Thus, in a two-tailed test, there are two rejection regions, one on
 In this case, the standard normal distribution is used to calculate each tail of the curve.
the critical value of the test.
 If the z-value of the sample being tested falls into the criteria for
the one-sided test, the alternative hypothesis will be accepted EXAMPLE
instead of the null hypothesis.
If a sample of 400 male workers has a mean height of 67.47
 A one-tailed test would be used when the study has to test whether
inches, is it reasonable to regard the sample as a sample from a
the population parameter being tested is either lower than or
large population with a mean height of 67.39 inches and a
higher than some hypothesized value.
standard deviation of 1.30 inches at a 5% level of significance?
 A one-sample z-test assumes that data are a random sample
collected from a normally distributed population that all have the Taking the null hypothesis that the mean height of the
same mean and same variance. population is equal to 67.39 inches, we can write:                           
 This hypothesis implies that the data is continuous, and the
H0: µ = 67.39“
distribution is symmetric.
 Based on the alternative hypothesis set for a study, a one-sided z- Ha: µ ≠ 67.39“
test can be either a left-sided z-test or a right-sided z-test. 
x̄ = 67.47“, σ = 1.30“, n = 400
 For instance, if our H0: µ0 = µ and Ha: µ < µ0, such a test would be
a one-sided test or more precisely, a left-tailed test and there is one Assuming the population to be normal, we can work out the test
rejection area only on the left tail of the distribution. statistic z as under:
 However, if H0: µ = µ0 and Ha: µ > µ0, this is also a one-tailed test
(right tail), and the rejection region is present on the right tail of
the curve.

Two sample z-test (two-tailed z-test)


 In the case of two sample z-test, two normally distributed
independent samples are required.
The t-test is a Z-test is a
test in statistics that statistical tool used
is used for testing for the comparison
hypotheses regarding or determination of
Definition the mean of a small the significance of
sample taken several statistical
population when the measures,
standard deviation of particularly the mean
the population is not in a sample from a
known. normally distributed
population or
As Ha is two-sided in the given question, we shall be applying a between two
two-tailed test for determining the rejection regions at a 5% level of independent
significance which comes to as under, using normal curve area table: samples.
R: | z | > 1.96
The observed value of t is 1.231 which is in the acceptance The t-test is Z-test is
region since R: | z | > 1.96, and thus, H0 is accepted. Sample size usually performed in generally performed
samples of a smaller in samples of a
size (n≤30). larger size (n>30).
APPLICATION
 Z-test is performed in studies where the sample size is larger, and T-test is Z-test is
the variance is known. Type of distribution performed on performed on
 It is also used to determine if there is a significant difference of population samples distributed samples that are
on the basis of t- normally distributed.
between the mean of two independent samples.
distribution.
 The z-test can also be used to compare the population proportion to
an assumed proportion or to determine the difference between the
population proportion of two samples. A t-test is not Z-test is based
based on the on the assumption
Assumptions assumption that all that all key points on
DIFFERENCES OF Z-TEST & T-TEST key points on the the sample are
sample are independent.
independent.
Basis for T-test Z-test
comparison
Variance or Variance or
Variance or standard deviation is standard deviation is
standard deviation not known in the t- known in z-test.
test.

The sample In a normal


values are to be distribution, the
Distribution recorded or average is
calculated by the considered 0 and the
researcher. variance as 1.  In statistics, a range shows how spread out a set of data is. The
bigger the range, the more spread out the data. If the range is small,
the data is closer together or more consistent. The range of a set of
In addition, to In addition, to numbers is the largest value, subtract the smallest valu
the mean, the t-test mean, z-test can also
Population can also be used to be used to compare
parameters compare partial or the population
simple correlations proportion.
among two samples.

T-tests are less Z-test is more


convenient as they convenient as it has
Convenience have separate critical the same critical
values for different value for different
sample sizes. sample sizes.

REFERENCE:
https://microbenotes.com/z-test/

CRITICAL VALUES OF Z

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy