11 Chapter5
11 Chapter5
DATA ANALYSIS
TECHNIQUES
76
5. Introduction:
Data analysis is the method of assessing data with the use of varied logical
reasoning tools and analytical techniques, in order to examine each component
of the provided data. Data analysis is the discipline of knowledge of probing raw
data with the function of extracting outcomes from that information. Different
statistical tests were used to analyze the data collected from the sample
respondents
The data analysis was done by using software package for statistical analysis.
One of the most popular software packages to perform statistical analysis on
survey data is Package for Social Sciences (SPSS). Its first version was released
in 1968 and since then, it has come a long way. It is being used by a plethora of
entities viz. researchers, educational Institutes, research organizations,
government, marketing firms etc.
In this software the data of 325 sample respondents was collected .This data
was properly coded and edited. Duly filled questionnaires were considered for
final data analysis and the incomplete questionnaires were rejected. Following
techniques are employed to analyse the collected data:
77
used to describe the reliability of factors extracted from dichotomous and/or
multi-point formatted questionnaires or scales. The generated scale is more and
more reliable as the score is higher and higher. It is an indicator of reliability
related to the variation accounted.
Reliability Statistics
Cronbach's N of
Alpha Items
.934 35
Frequency distributions are visual displays that organise and present frequency
count so that the data can be interpreted more easily. It is a mathematical
distribution whose objective is to obtain a count of the number of responses
associated with different values of one variable and to express the counts in
percentage terms. It helps to determine the extent of item non response and also
the extent of illegitimate responses. A tabular or graphical representation can be
made to exhibit a frequency distribution . Some common methods of showing
frequency distributions include frequency tables, histograms, pie charts or bar
charts.
78
It is also used to create a set of variables for similar items in the set (these sets
of variables are called dimensions). It can be a very useful tool for complex sets
of data involving psychological studies, socioeconomic status and other involved
concepts. A “factor” is a set of observed variables that have similar response
patterns; They are associated with a hidden variable (called a confounding
variable) that isn’t directly measured. Factors are listed according to factor
loadings, or how much variation in the data they can explain. The two types:
exploratory and confirmatory.
• Exploratory factor analysis is if you don’t have any idea about what
structure your data is or how many dimensions are in a set of variables.
• Confirmatory Factor Analysis is used for verification as long as you have
a specific idea about what structure your data is or how many dimensions are
in a set of variables.
Factor Loadings: Not all factors are created equal; some factors have more
weight than others.
.
Image: USGS.gov
79
The factors that are factor bolded have the highest factor loading and maximum
impact on the question. The loadings of factor are very much like correlation
coefficients , they range from -1 to 1. The more nearer factors are to minus one
or one, the more is their impact on the variable. Zero factor load would mean no
impact.
80
typically used in the social sciences,however it is technically applicable to any
discipline.
81
Software is usually required to perform confirmatory factor analysis. SAS can
be used to perform CFA. At the time of writing, SPSS is limited to EFA only.
• SAS CFA procedure.
• AMOS instructions (download document from East Carolina University).
• AMOS, LISREL, MPLUS procedures.
Exploratory Factor:
Exploratory Factor Analysis (EFA) is used to find the underlying structure of a
large set variables. It reduces data to a much smaller set of summary variables.
EFA is almost identical to Confirmatory Factor Analysis (CFA). Both
techniques can (perhaps surprisingly) be used to confirm or explore.
Similarities are:
There are, however, some differences, mostly concerning how factors are
treated/ used. EFA is basically a data-driven approach, allowing all items to load
on all factors, while with CFA you must specify which factors to load. EFA is a
good choice if you don’t have any idea about what common factors might exists.
EFA can generate a large number of possible models for your data, something
that may not be possible if a researcher has to specify factors. If you do have an
idea about what the models look like, and you want to test your hypotheses about
the data structure, CFA is a better approach.
82
5.4 ANOVA
• A manufacturer has two different processes to make light bulbs. They want
to know if one process is better than the other.
• Students from different colleges take the same exam. You want to see if one
college outperforms the other.
• One-way has one independent variable (with 2 levels). For example: brand
of cereal,
• Two-way has two independent variables (it can have multiple levels). For
example: brand of cereal, calories.
Group or Levels:
Groups or levels are different groups within the same independent variable. In
the above example, your levels for “brand of cereal” might be Lucky Charms,
Raisin Bran, Cornflakes — a total of three levels. Your levels for “Calories”
might be: sweetened, unsweetened — a total of two levels.
83
If you are studying if an alcoholic support group and individual counseling
combined is the most effective treatment for lowering alcohol consumption. You
might split the study participants into three groups or levels:
• Medication only,
• Counseling only.
If your groups or levels have a hierarchical structure (each level has unique
subgroups), then use a nested ANOVA for the analysis.
Replication: It’s whether you are replicating (i.e. duplicating) your test(s)
with multiple groups. With a two way ANOVA with replication, you have
two groups and individuals within that group are doing more than one
thing (i.e. two groups of students from two colleges taking two tests). If you
only have one group taking two tests, you would use without replication.
There are two main types: one-way and two-way. Two-way tests can be with or
without replication.
• One-way ANOVA between groups: used when you want to test two
groups to see if there’s a difference between them.
• Two way ANOVA without replication: used when you have one group and
you’re double-testing that same group. For example, you’re testing one set
of individuals before and after they take a medication to see if it works or
not.
84
• Two way ANOVA with replication: Two groups, and the members of those
groups are doing more than one thing. For example, two groups of patients
from different hospitals trying two different therapies.
A one way ANOVA is used to compare two means from two independent
(unrelated) groups using the F-distribution. The null hypothesis for the test is
that the two means are equal. Therefore, a significant result means that the two
means are unequal.
Situation 1: You have a group of individuals randomly split into smaller groups
and completing different tasks. For example, you might be studying the effects
of tea on weight loss and form three groups: green tea, black tea, and no tea.
Situation 2: Similar to situation 1, but in this case the individuals are split into
groups based on an attribute they possess. For example, you might be studying
leg strength of people according to weight. You could split participants into
weight categories (obese, overweight and normal) and measure their leg strength
on a weight machine.
A one way ANOVA will tell you that at least two groups were different from
each other. But it won’t tell you which groups were different. If your test
returns a significant f-statistic, you may need to run an ad hoc test (like
the Least Significant Difference test) to tell you exactly which groups had
a difference in means.
Two Way ANOVA
A Two Way ANOVA is an extension of the One Way ANOVA. With a One
Way, you have one independent variable affecting a dependent variable. With a
Two Way ANOVA, there are two independents. Use a two way ANOVA when
85
you have one measurement variable (i.e. a quantitative variable) and
two nominal variables. In other words, if your experiment has a quantitative
outcome and you have two categorical explanatory variables, a two way
ANOVA is appropriate.
For example, you might want to find out if there is an interaction between income
and gender for anxiety level at job interviews. The anxiety level is the outcome,
or the variable that can be measured. Gender and Income are the two categorical
variables. These categorical variables are also the independent variables, which
are called factors in a Two Way ANOVA.
The factors can be split into levels. In the above example, income level could be
split into three levels: low, middle and high income. Gender could be split into
three levels: male, female, and transgender. Treatment groups are all possible
combinations of the factors. In this example there would be 3 x 3 = 9 treatment
groups.
The results from a Two Way ANOVA will calculate a main effect and
an interaction effect. The main effect is similar to a One Way ANOVA: each
factor’s effect is considered separately. With the interaction effect, all factors are
considered at the same time. Interaction effects between factors are easier to test
if there is more than one observation in each cell. For the above example,
multiple stress scores could be entered into cells. If you do enter multiple
observations into cells, the number in each cell must be equal.
Two null hypotheses are tested if you are placing one observation in each cell.
For this example, those hypotheses would be:
H01: All the income groups have equal mean stress
86
For multiple observations in cells, you would also be testing a third hypothesis:
H03: The factors are independent or the interaction effect does not exist.
5.5 T- Test
The t test tells you how significant the differences between groups are; In other
words it lets you know if those differences (measured in means) could have
happened by chance.
The T Score: The t score is a ratio between the difference between two
groups and the difference within the groups. The larger the t score, the more
difference there is between groups. The smaller the t score, the more
similarity there is between groups. A t score of 3 means that the groups are
three times as different from each other as they are within each other. When
you run a t test, the bigger the t-value, the more likely it is that the results
are repeatable.
How big is “big enough”? Every t-value has a p-value to go with it. A p-value is
the probability that the results from your sample data occurred by chance. P-
values are from 0% to 100%. They are usually written as a decimal. For example,
87
a p value of 5% is 0.05. Low p-values are good; They indicate your data did not
occur by chance. For example, a p-value of .01 means there is only a 1%
probability that the results from an experiment happened by chance. In most
cases, a p-value of 0.05 (5%) is accepted to mean the data is valid.
Choose the paired t-test if you have two measurements on the same item, person
or thing. You should also choose this test if you have two items that are being
measured with a unique condition. For example, you might be measuring car
safety performance in Vehicle Research and Testing and subject the cars to a
series of crash tests. Although the manufacturers are different, you might be
subjecting them to the same conditions.
With a “regular” two sample t test, you’re comparing the means for two
different samples. For example, you might test two different groups of customer
service associates on a business-related test or testing students from two
universities on their English skills. If you take a random sample each group
88
separately and they have different conditions, your samples are independent and
you should run an independent samples t test (also called between-samples and
unpaired-samples).
The null hypothesis for the independent samples t-test is μ1 = μ2. In other words,
it assumes the means are equal. With the paired t test, the null hypothesis is that
the pair wise difference between the two tests is equal (H0: µd = 0). The
difference between the two tests is very subtle; which one you choose is based
on your data collection method.
Linear regression is a basic and commonly used type of predictive analysis. The
overall idea of regression is to examine two things:
Naming the Variables: There are many names for a regression’s dependent
variable. It may be called an outcome variable, criterion variable, endogenous
variable, or regress and. The independent variables can be called exogenous
variables, predictor variables, or regressors.
89
Three major uses for regression analysis are
• Trend forecasting.
First, the regression might be used to identify the strength of the effect that the
independent variable(s) have on a dependent variable. Typical questions are
what is the strength of relationship between dose and effect, sales and marketing
spending, or age and income.
Second, it can be used to forecast effects or impact of changes. That is, the
regression analysis helps us to understand how much the dependent variable
changes with a change in one or more independent variables. A typical question
is, “how much additional sales income do I get for each additional $1000 spent
on marketing?”
Third, regression analysis predicts trends and future values. The regression
analysis can be used to get point estimates. A typical question is, “what will the
price of gold be in 6 months?
90
Structural equation modeling is extension from SPSS Software with the help of
AMOS.
• Maximum likelihood
• Unweighted least squares
• Generalized least squares
• Browne’s asymptotically distribution-free criterion
• Scale-free least squares
91
AMOS will produce the following important output:
• Estimates: In AMOS text output, the estimate option will give the result
for regression weight, standardized loading for factor, residual,
correlation, covariance, direct effect, indirect effect, total effect, etc.
• Model Fit: In AMOS text output, model fit option will give the result for
goodness of fit model statistics. It will show all the goodness of fit
indexes, such as GFI, RMR, TLI, BIC, RMSER, etc.
• Error Message: If there is any problem, during the process of drawing the
model (for example, if we forget to draw the error term or if we draw the
covariance between two variables, or if missing data is present), then
AMOS will either not calculate the result or it will give an error message.
92