Biostatistics (Midterm)
Biostatistics (Midterm)
Variables that should be studied because they may influence or Continuous variables represent measured quantities (e.g., blood
“confound” the effect of the independent variable(s) on the dependent pressure and temperature).
variable(s). Continuous variables may be divided into interval and ratio
variables.
Ex. The points of an interval scale are equally spaced, and the
In a study of the effect of tobacco (independent variable)on oral cancer difference between two points is meaningful (e.g., the difference
(dependent variable), the nutritional status of the individual may play an between 30 degrees Celcius and 31 degrees Celcius is the same
intervening role. as 89 and 90 degrees Celcius). However, 100 degrees Celcius is
not twice as hot as 50 degrees Celcius. As you may guess, the
ratio between points on a ratio scale has meaning.
Age is an example of a ratio scale. Therefore, David, who is 18,
is twice as old as his brother Michael, who is 9. It should be
4. Background Variables noted that continuous variables, whether interval or ratio, are
analyzed in the same way.
Variables that are so often of relevance in investigations of groups or
DMFT and dfT outcomes are continuous, inter-scale variables,
populations that they should be considered for possible inclusion in the
whereas presence or absence of carious lesions is a nominal
study.
categorical variable.
SAMPLING FROM A POPULATION interest or variability may be low enough that the effort and cost
of probability sampling outweighs the risk of drawing a biased
In statistical language, the subset of the population of interest in sample.
a study is called a sample.
There are several subtypes of non-probability samples.
Usually, we wish to draw conclusions about some numeric
Cluster sampling divides the population into small groups
aspect of the population.
(clusters), draws a simple random sample of clusters, and
In statistical terms, parameter is a numeric characteristic of the assesses every subject in the sampled clusters. This may be a
population. A parameter has a set value, but we usually do not good approach when cost and time to travel between randomly
know the value. selected subjects would be prohibited.
A statistic is a numeric characteristic of the sample. We can A quota sample is drawn by selecting items or people in a block
know the value of a statistic in our sample, but the value will of predetermined siz. For example, you may select the first ten
change from sample to sample. women, without regard for the pool they may represent. Finally,
It is important that the sample be representative of the a convenience sample, a its name suggests, is selected on the
population of interest from which it was drawn because the basis of convenience to the researcher, with little concern for
statements (or inferences) about the whole population may be representativenss. The types of samples are summarized as
made from the measurements taken on the sample. follows:
If a sample is not representative of the population of interest, it is
a biased sample. For example, in caries prevalence measures, Probability Sample: A sampling from a population that you can
schoolchildren living in fluoridated community would be a identify aand to which you have access to all members.
biased sample of all children because, as a group, they would Simple Random Sample: Each item or person in the population of
have a lower prevalence than the entire population of interest. interest has an equal and independent chance of being selected.
The best way to ensure a representative, unbiased sample is to
perform simple random sampling. A simple random sample is Stratified Random Sample: Random sampling carried out in
one in which every item or person in the population has an equal subgroups of a population to ensure that selections will be made from
and independent chance of being selected. A simple random each level of the subgroup.
sample is an example of a probability sample.
Nonprobability Sample: A sampling when you cannot identify or do
Probability samples are those drawn when you are able to
not have access to the entire population of interest.
identify and have access to all members of the population of
interest. Cluster Sample: Drawing a simple random sample of small groups
A stratified random sample, another type of probability (clusters)of the population and assessing each subject in the sampled
sample, is a variant of the simple random sample. This sampling cluster.
scheme is random sampling carried out in subgroups of a
Quota Sample: Sampling items or people in a block of predetermined
population to ensure that selections will be made from each level
size.
of the subgroup.
For example, you may take steps to ensure that every age, sex, Convenience Sample: A sampling scheme in which the subjects are
race, or social stratum subgroup is represented in sufficient selected, partly or entirely, at the convenience of the researcher.
numbers in the sample.
At times, a probability sample may not be possible or warranted. The Probability Distribution (P)
For example, you may not have access to the entire population of
The most crucial link between the population and its characteristics. There are 2 steps in data analysis:
which allows us to draw inferences on the population based on sample
1st Step: Calculate the descriptive statistics. The
observations, depends on this probability distribution.
characteristics of the data found within the sample of individuals in
How to use probability distribution? whom the study was conducted.
Probability distributions indicate the likelihood of an event or outcome. 2nd Step: Calculate the inferential statistics. The purpose of
Statisticians use the following notation to describe probabilities: p(x) = generating inferential statistics is to determine whether the results found
the likelihood that random variable takes a specific value of x. The sum in the sample may be a result of chance or, assuming no other threats to
of all probabilities for all possible values must equal 1. validity, whether we can generalize our results to the general population
of interest.
What does a probability distribution (p) indicates?
A probability distribution (p) indicates the possible outcomes of a
random experiment and the probability that each of those outcomes will
occur.
Why do we use probability distributions?
Probability distributions are a fundamental concept in statistics. They
are used both on a theoretical level and a practical level. Some
practical uses of probability distributions are: To calculate confidence
intervals for parameters and to calculate critical regions for hypothesis
tests.
What is an example of probability distribution?
The probability distribution of a discrete random variable can always
be represented by a table.
For example, suppose you flip a coin two times. ... The probability of
getting 0 heads is 0.25; 1 head, 0.50; and 2 heads, 0.25. Thus, the table is
an example of a probability distribution for a discrete random variable.
What is the difference between discrete and continuous probability
distributions?
A probability distribution may be either discrete or continuous.
A discrete distribution means that X can assume one of a countable
(usually finite) number of values, while a continuous
distribution means that X can assume one of an infinite (uncountable)
number of different values.
DATA ANALYSIS
Module 2
MEASURES OF CENTRAL TENDENCY The mean only makes sense in the context of continuous variables;
however, in practice, the mean is also frequently calculated for
LEARNING OUTCOMES: ordinal variables with many levels, for example, age in years.
At the end of the topic, students are able to:
MEDIAN
1. Define central tendency, mean, median, and mode.
2. Differentiate mean, mode, and median. The median of a sample is the middle item (midpoint) of a data set,
3. Determine the mean, mode, and median of a given data. which will divide a data set arranged in order in half.
A measure of central tendency is a single value that attempts to describe a To find the median, the data must first be arranged in order of
set of data by identifying the central position within that set of data. increasing value. Continuing the example from the mean, this results
Measures of central tendency attempt to identify the middle of a distribution in 0, 1, 2, 3, 4. In this case, the median is 2. This example is
to provide one sample statistic that describes the character of an entire data straightforward because there were odd number of observations, with
set. a single observation in the middle to serve as a median. If there is an
even number of observations, then the median is the mean of the
Three Measures of Central Tendency middle pair of observations. To illustrate:
1. Mean If you had collected dmfs data on one additional child, you would
2. Mode have the following observations: 0, 1, 2, 3, 3, and 4.
3. Median The middle pair of observations is formed by 2 and 3 , and the
mean of this pair is ( 2 + 3 ) / 2 = 5 / 2 = 2.5 .
MEAN
The sample mean of a data set is the arithmetic average, which is the Thus, the median in this case is 2.5 Note that 3 observations fall
sum of observations divided by the number of observations. below 2.5 and 3 observations fall above 2.5.
Because it would be very difficult to visually judge the location of
o Ex.: If you measured decayed, missing, or filled primary the middle point in a large data set, out of n ordered observations,
tooth surfaces (dmfs) among five (5) first grade the (( n + 1 ) / 2)th observation is the median. Using this technique
schoolchildren and obtained the following data: 0, 3, 1, 2, to identify the median in an odd number of observations is
and 4 straightforward, so we will illustrate this technique on the data set
o The mean, or average, dmfs would be: with an even number of observations, so the ( 6 + 1 ) / 2 = 3.5th
( 0 + 3 + 1 + 2 + 4 ) / 5 = 10/5 = 2.0 observation should be the median.
By substituting symbols for these numbers, we can represent the The 3.5th observation is midway between 2 and 3 (0, 1, 2,….3, 3, 4),
general formula for the mean. Each symbol, X1, X2, X3 ….. , etc. to or 2.5, agreeing with what was concluded in the mean.
Xn, represents an individual observation, where n is the total number Median is based only on the order of information of data (i.e., how
of observations. The mean of the sample is represented by the many observations are above and below a given point). Therefore
symbol Ẋ (x-bar). Thus, the formula for the mean would be: median is useful for describing the central tendency of ordinal
categorical variables, as well as continuous variables.
X1 + X2 + X3 ….. + Xn Median is not influenced by and does not convey the actual numeric
Ẋ = ------------------------------------
values of the observations
n
The mean is the most commonly used measure of central tendency. MODE
The mode is the most frequently occurring value in a set of
observations. Again, it is convenient to arrange the observations in
increasing order to judege how often a value occurs. In case of a positively skewed frequency distribution, the mean is
o Ex.: The mode of 0 , 1 , 2, 3 , 3 , 4 data set is 3 because this
value occurs twice while all other values occur only once.
If our data set were 0 , 1 , 1 ,1 , 2 , 3 ,3 , 4 there would be two
modes, 1 and 3 , and this data set would be called bimodal.
When all values occur with the same frequency, the data set is said to
have no mode.
The chief advantage of mode is that it is the only measure of central
tendency that makes sense for nominal categorical variables, such as
eye color. It would not make sense to place eye color in ascending
order to identify the median, nor would it make sense to identify the
average eye color. It would, however, be perfectly sensible to say always greater than median and the median is always greater than
that the most frequent eye color in a given sample is brown. the mode.
Otherwise, mode is not often used, as it records only the most
frequent value, which may be far from the center of the distribution MODULE 3
of values.
MODULE 3
MEAN, MODE, and MEDIAN MEASURES OF DISPERSION:
The relationship between mean, median, and mode may be RANGE, MEAN DEVIATION, AND STANDARD DEVIATION
graphically appreciated through the frequency curve of a distribution.
The frequency curve is simply a smooth version of the histogram. MEASURES OF VARIABILITY
The mode is the highest point of the curve; the median is the value In statistics, variability (also called dispersion, scatter, or spread) is
that divides the area under the curve in half. The position of the the extent to which a distribution is stretched or squeezed.
mean is slightly more difficult to conceptualize. If you think of the It describes the characteristics of a set of samples in a given
curve as a solid object, the mean would be the point at which the population.
shape would balance. When the measure of variability or dispersion is small, the group is
The mean, median, and mode coincide on a symmetrical frequency more or less homogeneous but when the measure of variability is large,
curve. If, however, the distribution is skewed, the mean is drawn the group is more or less heterogeneous.
toward the long tail of the distribution, again demonstrating how o Common examples of measures of variability are the:
sensitive the mean is to extreme values. 1. Range
2. Standard Deviation
Graphical Relationships between Mean, Median, and Mode 3. Variance
If a frequency distribution graph has a symmetrical frequency curve, RANGE
then mea n, median, and In statistics, a range shows how spread out a set of data is. The
mode wil l be equal. bigger the range, the more spread out the data. If the range is small,
the data is closer together or more consistent. The range of a set of
numbers is the largest value, subtract the smallest value.
In statistics, a range shows how spread out a set of data is. The
bigger the range, the more spread out the data. If the range is small,
the data is closer together or more consistent. The range of a set of SD2 = ∑ (X - Ẋ)2
numbers is the largest value, subtract the smallest value. N-1
In statistics, a range shows how spread out a set of data is. The Where:
bigger the range, the more spread out the data. If the range is small, o SD2 = variance
the data is closer together or more consistent. The range of a set of o ∑ (X - Ẋ)2 = sum of squares of deviations about the mean
number is the largest value, subtract the smallest value. sum variance
o N = number of cases
The Range is the difference between the lowest and highest values. Standard Deviation
Example:
In statistics, the standard deviation is a measure of the amount of
o Formula: Range= Largest value (L) – Smallest Value variation or dispersion of a set of values. A low standard deviation
indicates that the values tend to be close to the mean of the set, while
In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9. So
a high standard deviation indicates that the values are spread out over
the range is 9 − 3 = 6.
a wider range.
The range is useful for showing the spread within a dataset and for Standard deviation tells you how spread out the data is. It is a
comparing the spread between similar datasets. measure of how far each observed value is from the mean. In any
The prime advantage of this measure of dispersion is that it is easy distribution, about 95% of values will be within 2 standard
to calculate. On the other hand, it has lot of disadvantages. It is very deviations of the mean
sensitive to outliers and does not use all the observations in a data The standard deviation measures the spread of the data about the
set. mean value. ... For example, the mean of the following two is the
same: 15, 15, 15, 14, 16 and 2, 7, 14, 22, 30. However, the second is
Outlier - is an observation that lies an abnormal distance clearly more spread out. If a set has a low standard deviation, the
from other values in a random sample from a population. values are not spread out too much.
Ex.: 71, 70, 73, 70,70, 69, 70, 72, 71, 300, 71, 89 ….. 300
is an outlier Formula:
√∑ (X - Ẋ)2
Merits of Range: N-1
It is simple to understand and easy to calculate. Where:
It is less time consuming. o SD = Standard Deviation
o X = Score
Demerits of Range: o Ẋ = Arithmetic Mean
It is not based on each and every item of the distribution. o ∑ (X - Ẋ) = Sum of the deviations of X and the mean
It is very much affected by the extreme values. o N = Total number of cases
We define range in such a way so as to eliminate the outliers and Procedures:
extreme points in the data set.
1. Find the arithmetic mean.
The Variance 2. Determine the deviation of each score (x) from the mean which
It is the square of the standard deviation an is also known as the equals to X - Ẋ
mean square. 3. Square each deviations to get ∑ (X - Ẋ)2
4. Sum the squared deviations to get ∑ (X - Ẋ)2
Formula: 5. Divide the sum by N – 1 to find ∑ (X - Ẋ)2
N-1
6. Extract the square root of the result. SD = √ 11,861,725.00 - 11,481,932.25
SD = √ 379,792.75
2,450
SD = √∑ (X - Ẋ)2
SD = √ 155.01744898
N-1
SD = 12.45
SD = √ 832.50
Illustration of Variance and Standard Deviation
10-1
You want to find the variance and SD of the five observations: 2,
SD = 9.62
SD = √ N ∑f M2 - (∑ fM) 2
N2 - N
SD = √ 50 (237,234.50) - 3,388.52
2. Determine the squared difference (deviation) between each
502 - 50 observation x and the mean.
3. Calculate the variance by determining the mean squared deviation: ecause it was first discovered by Carl Friedrich Gauss. The normal
Variance = sum of squared deviation = 41.20 = 8.24 distribution is a continuous probability distribution that is very
number of observation 5 important in many fields of science.
4. Determine the SD by taking the square root of the variance:
SD = √ variance = √ 8.24 = 2.87
How do you
convert a normal
distribution into The Empirical Rule
a standard
If X is a random
normal curve?
variable and has a normal
Any point (x) from a normal
distribution w ith mean µ and
distribution can be converted to
the standard normal distribution (z) standard deviation σ,
with the formula z = (x- then the Empirical
mean) / standard deviation. Rule says the following:
About 68% of the x values lie between the range between µ – σ
and µ + σ (within one standard deviation of the mean).
About 95% of the x values lie between the range between µ – 2σ
and µ + 2σ (within two standard deviations of the mean).
About 99.7% of the x values lie between the range between µ – 3σ
and µ + 3σ(within three
2. The histogram above for variable1 represents perfect symmetry 4. Look at normality plots of the data. “Normal Q-Q Plot” provides a
(skewness) and perfect peakedness (kurtosis); and the descriptive graphical way to determine the level of normality. The black line
statistics below for variable1 parallel this information by reporting indicates the values your sample should adhere to if the distribution
"0" for both skewness and kurtosis. The histogram above for was normal. The dots are your actual data. If the dots fall exactly on
variable2 represents the black line, then your data are normal. If they deviate from the
black line, your data are non-normal. Notice how the data for
3. Look at established tests for normality that take into account both
Skewness and Kurtosis simultaneously. The Kolmogorov-Smirnov
Y, is the number of heads we get from tossing two coins,
then Y could be 0, 1, or 2.
A fundamental task in many statistical analyses is to characterize
the location and variability of a data set. A further characterization of
THE NORMAL CURVE/NORMAL DISTRIBUTION
the data includes skewness and kurtosis.
What Is a Probability Distribution? Skewness is a measure of symmetry, or more precisely, the lack of
symmetry. A distribution, or data set, is symmetric if it looks the
A probability distribution is a statistical function that describes all same to the left and right of the center point.
the possible values and likelihoods that a random variable can take Kurtosis is a measure of whether the data are heavy-tailed or light-
within a given range. This range will be bounded between the tailed relative to
minimum and maximum possible values, but precisely where the a normal
possible value is likely to be plotted on the probability distribution
depends on a number of factors. These factors include the
distribution's mean (average), standard deviation, skewness,
and kurtosis.
A random variable is a variable whose value is unknown or a
function that assigns values to each of an experiment's
outcomes. ... Random variables are often used in econometric or
regression analysis to determine statistical relationships among one
another.
distribution. That is, data sets with high kurtosis tend to have heavy
EX.: A typical example of a random variable is the outcome tails, or outliers. Data sets with low kurtosis tend to have light tails,
of a coin toss. Consider a probability distribution in which or lack of outliers. A uniform distribution would be the extreme case.
the outcomes The histogram is an effective graphical technique for showing both
of a random the skewness and kurtosis of data set. A histogram is an approximate
event are representation of the distribution of numerical data. It was first
not equally introduced by Karl Pearson. It is a diagram consisting of rectangles
likely to whose area is proportional to the frequency of a variable and whose
happen. If width is equal to the class interval.
random
variable,
greater than +1, the distribution is too peaked. Likewise, a kurtosis of less
than –1 indicates a distribution that is too flat.
How do you interpret skewness?
Interpreting
1. If skewness is less than −1 or greater than +1, the distribution is highly
skewed.
2. If skewness is between −1 and −½ or between +½ and +1, the
distribution is moderately skewed.
3. If skewness is between −½ and +½, the distribution is approximately
symmetric.
Skewness refers to For kurtosis, the general guideline is that if the number is greater than +1,
a distortion or asymmetry that deviates from the symmetrical bell the distribution is too peaked. Likewise, a kurtosis of less than –1 indicates
curve, or normal distribution, in a set of data. ... A normal a distribution that is too flat. Distributions exhibiting skewness and/or
distribution has a skew of zero, while a lognormal distribution, for kurtosis that exceed these guidelines are considered nonnormal." (Hair et al.,
2017, p.
example, would exhibit some degree of right- skew.
Kurtosis - the sharpness of the peak of a frequency-distribution What does the kurtosis value tell us?
curve.
Formula:
Age Group
Contingency Table
in contingency tables, we intend to test that the row variable is
independent of the column variable. Computation for expected frequency
for the contingency table is different from the one in the goodness of fit.
The expected frequency can be computed with the use of this formula:
Χ2 α 0.05 df 3 = 7.815
df = (r -1) (c -1)
Conclusion:
Since the computed Χ2 (3.73) is lesser (<) than the tabular Χ2 (7.815), the
null hypothesis is accepted that accidents does not occur on the different
width of the road.
Obtaining the expected frequency for variables with 2 or more groups, the
following formula is applied:
df = (r -1) (c -1)
Example:
Teenagers and young adults have their own styles of studying. Some prefer
to study with music, others do not. A group of psychologists conducted a
study to determine the particular age of the students who like studying with
procedure for deciding whether the results of a research study support a
particular theory which applies to a population.
Interpretation:
Hypothesis testing uses sample data to evaluate a hypothesis about a
The obtained Χ2 (0.05) is lesser than the tabular Χ2 (11.343) at 0.01 level of
population. A hypothesis test assesses how unusual the result is, whether it is
significance. Therefore, there is a sufficient evidence to accept the null
reasonable chancevariation or whether the result is too extreme to be
hypothesis. The style of studying is independent of the listed age groups.
considered chance variation.
Degrees of Freedom for the Chi-square Basic concepts
The degree of freedom involved in the one variable chi-square is
determined by this formula: Null and research hypotheses
df = K – 1 Where: df = degree of freedom To carry out statistical hypothesis testing, research and null hypothesis are
K = number of categories employed:
On the other hand, the degrees of freedom to use in the two variable chi-square
are determined by the formula: Research hypothesis: this is the hypothesis that you propose, also
df = (c – 1) (r – 1) Where: df = degrees of freedom known as the alternative hypothesis HA. For example:
c = the number of columns
H A: There is a relationship between intelligence and academic results.
r = the number of rows
CHI-SQUARE DISTRIBUTION H A: First year university students obtain higher grades after an intensive
USES OF CHI-SQUARE Statistics course.
1. It is used in descriptive research if the researcher wants to
determine the significant difference between the observed H A; Males and females differ in their levels of stress.
frequency and expected (theoretical) frequencies from The null hypothesis (Ho) is the opposite of the research hypothesis and
independent variables. expresses that there is no relationship between variables, or no differences
2. It is used to test the goodness of fit where a theoretical between groups; for example:
distribution is fitted to some data.
3. It is used to test the hypothesis that the variances of a normal Ho: There is no relationship between intelligence and academic results.
population is equal to a given value.
Ho: First year university students do not obtain higher grades after an
4. It is used for the construction of confidence interval for
intensive Statistics course.
variances.
5. It is used to compare two uncorrelated and correlated Ho: Males and females will not differ in their levels of stress.
proportions.
The purpose of hypothesis testing is to test whether the null hypothesis (there
is no difference, no effect) can be rejected or approved. If the null hypothesis
is rejected, then the research hypothesis can be accepted. If the null
Hypothesis testing hypothesis is accepted, then the research hypothesis is rejected.
When interpreting research findings, researchers need to assess whether these In hypothesis testing, a value is set to assess whether the null hypothesis is
findings may have occurred by chance. Hypothesis testing is a systematic accepted or rejected and whether the result is statistically significant:
A critical value is the score the sample would need to decide against This example illustrates how these five steps can be applied to text a
the null hypothesis. hypothesis:
If computed value is greater than (>) the critical value, reject the null
Let’s say that you conduct an experiment to investigate whether
hypothesis and accept the alternative hypothesis.
students’ ability to memorize words improves after they have
If computed value is lesser than (<) the critical value, accept the null
consumed caffeine.
hypothesis.
The experiment involves two groups of students: the first group
A probability value is used to assess the significance of the statistical
consumes caffeine; the second group drinks water.
test. If the null hypothesis is rejected, then the alternative to the null
hypothesis is accepted. Both groups complete a memory test.
If probability value is lesser than (<) the level of significance (α), A randomly selected individual in the experimental condition (i.e.
reject the null hypothesis and accept the alternative hypothesis. the group that consumes caffeine) has a score of 27 on the memory
test. The scores of people in general on this memory measure are
If probability value is greater than (>) the level of significance (α),
normally distributed with a mean of 19 and a standard deviation of 4.
accept the null hypothesis.
The researcher predicts an effect (differences in memory for these
The hypothesis testing process groups) but does not predict a particular direction of effect (i.e.
which group will have higher scores on the memory test). Using the
The hypothesis testing process can be divided into five steps:
5% significance level, what should you conclude?
1. Restate the research question as research hypothesis and a null hypothesis
about the populations.
Step 1: There are two populations of interest.
2. Determine the characteristics of the comparison distribution.
Population 1: People who go through the experimental procedure
3. Determine the cut off sample score on the comparison distribution at (drink coffee).
which the null hypothesis should be rejected. Population 2: People who do not go through the experimental
procedure (drink water).
4. Determine your sample’s score on the comparison distribution. Research hypothesis: Population 1 will score differently from
5. Decide whether to reject the null hypothesis. Population 2.
Null hypothesis: There will be no difference between the two
populations.
Step 2: We know that the characteristics of the comparison distribution
(student population) are:
Population M = 19, Population SD= 4, normally distributed. These are
the mean and standard deviation of the distribution of scores on the
memory test for the general student population.
Step 3: For a two-tailed test (the direction of the effect is not specified)
at the 5% level (25% at each tail), the cut off sample scores are +1.96
and -1.99.
Step 4: Your sample score of 27 needs to be converted into a Z value. To independent samples. In other words, if the t-test is statistically
calculate Z = (27-19)/4= 2 (check the Converting into Z scores section if significant, we would conclude the populations from which the
you need to review how to do this process) samples were drawn had different population means.
Step 5: A ‘Z’ score of 2 is more extreme than the cut off Z of +1.96 (see The terminology that we would use is that the groups are
figure above). The result is significant and, thus, the null hypothesis is significantly different from one another.
rejected.
To compute the independent samples t-test, we start by setting
up two columns, one for each group. In our example, we have labeled
CRITICAL VALUES OF CHI SQUARE TEST (0.01 & 0.05 LEVEL OF the columns Group 1 and Group 2, but we could also have labeled
SIGNIFICANCE) them X1 and X2, respectively.
Note that we do not label them as X and Y, as we did in
computing correlations, because these are the same measures being
taken in each group. Therefore, we would use the same letter to
indicate this fact and use a subscript to indicate which group we are
talking about. This notation is used in the formula for the t-test.
We put the scores in the two
columns. Note that we do not necessarily
have the same number of participants in each
group, so the number of scores in the two
columns may be different. Then, for
each column, you compute the following
values.
1. The sum of the column.
2. The sum of the squared values of the
column (square each score and
then sum it).
3. The sample size for the column.
4. The mean for the column (sum
LINK T-TEST divided by the sample size).
REFERENCE:
https://microbenotes.com/z-test/
CRITICAL VALUES OF Z