0% found this document useful (0 votes)
29 views

Exploratory Factror Analysis

Uploaded by

researcherniaz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Exploratory Factror Analysis

Uploaded by

researcherniaz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Exploratory Factor Analysis

Factor Analysis Decision Process


Data Summarization & Data reduction Assumption

Submitted by,
Abhishek A S
S2 IEM
GECT
Factor Analysis

Factor analysis is used as a data reduction technique.

Factor analysis takes a large number of variables and reduces or summarizes
it to represent them in different factors or components.

Factor analysis is a method for investigating whether a number of variables
of interest are related to a smaller number of unobservable factors. This is
done by grouping variables based on inter-correlations among set of
variables.
Exploratory Factor Analysis


Examines the interrelationships among a large number of variables and then
attempts to explain them in terms of their common underlying dimensions.

These common underlying dimensions are referred to as factors.

A summarization and data reduction technique that does not have
independent and dependent variables, but is an interdependence technique in
which all variables are considered simultaneously.
Example

Data represents the correlation matrix for nine store image elements.
Example

After the analysis some interesting patterns emerge.

First, four variables all relating to the in-store experience of shoppers are
grouped together.

Then, three variables describing the product assortment and availability are
grouped together.

Finally, product quality and price levels are grouped. Each group represents
a set of highly interrelated variables that may reflect a more general
evaluative dimension.

In this case, we might label the three variable groupings by the labels in-
store experience, product offerings, and value.
Factor Analysis Decision Process


Objectives of factor analysis

Designing a factor analysis

Assumptions in factor analysis

Deriving factors and assessing overall fit

Interpreting the factors

Validation of factor analysis

Additional use of factor analysis research
Objectives of factor analysis

Data summarization

· Definition of structure

Data reduction

· Purpose is to retain the nature and character of


original variables but reduce their numbers to
simplify the subsequent multivariate analysis
Designing a factor analysis
Designing a factor analysis


Both type of factor analysis use correlation matrix as an input data.

With R type we use traditional correlation matrix

In Q type factor analysis there would be a factor matrix that would identify
similar individuals

Q type factor analysis is based on the intercorrelations between respondents
while cluster analysis forms grouping based on distance based similarity
measure between repondant's scores on variables being analyzed.
Assumptions in factor analysis

Basic assumption: some underlying structure does
exist in set of selected variables (ensure that observed
patterns are conceptually valid).

Sample is homogenous with respect to underlying
factor structure.

Departure from normality, homoscedasticity and
linearity can apply to extent that they diminish
observed correlation

Some degree of multicollinearity is desirable
Assumptions in factor analysis


Researcher must ensure that data matrix has sufficient correlations to justify
application of factor analysis(No equal or low correlations).


Correlation among variables can be analysed by partial correlation
(Correlation which is unexplained when effect of other variables taken into
account). High partial correlation means factor analysis is inappropriate.
Rule of thumb is to consider correlation above 0.7 as high.
Assumptions in factor analysis

Another method of determining appropriateness of factor analysis is Bartlett
test of sphericity which provide statistical significance that correlation
matrix has significant correlation among at least some variables

Bartlett test should be significant i.e. less than 0.05 this means that the
variables are correlated highly enough to provide a reasonable basis for
factor analysis. This indicate that correlation matrix is significantly different
from an identity matrix in which correlations between variables are all zero.

Another measure is measure of sampling adequacy. This index ranges from
0 to 1 reaching 1 when each variable is perfectly correlated. This must
exceed 0.5 for both the overall test and individual value
Deriving factors and assessing overall fit

Method of extraction of factors is decided here
· Common factor analysis
· Component factor analysis

Number of factors selected to represent underlying structure in data

Factor extraction method

Decision depend on objective of factor analysis and concept of partioning the
variance of variable

Variance is value that represent the total amount of dispersion of values about
its mean.
Deriving factors and assessing overall fit

When variable is correlated it shares variance with other variables and
amount of sharing is the squared correlation. e.g 2 variables having .5
correlation will have .25 shared variance

Total variance of variable can be divided in 3 types of variances

Common variance
variance in variable which is shared with all other variables in analysis.
Variable's communality is estimate of shared variance

Specific variance
Variance associated with only specific variable. This variance cant be
explained by correlation
Deriving factors and assessing overall fit

Error variance

Unexplained variance

Common analysis consider only the common or shared variance

Component analysis consider the full variance. It is more appropriate when data
reduction is a primary concern. Also when prior reserrch shows that specific and
error variance reprresent a relatively small proportion of total Variance.

Common analysis is mostly used when primary objective is to identify the latent
dimensions and researchers have little knowledge abtout number of specific and
error variance.

In most applications both common and component analysis arrive at essentially
identical results if number of variables exceed 30.or communalities exceed .6 for
most variables.
Deriving factors and assessing overall fit
Criteria for number of variables to extract

An exact quantitative base for deciding number of factors to extract has not been
developed. Different stopping criteria's are as follow:

Latent root criteria

With component analysis each variable contributes a value of 1 to the total Eigen
values. Thus only the factors having the latent roots or Eigen values greater than 1
are considered significant.

This method is suitable when number of variables is between 20 and 50.

A priori criterion

Researcher already knows how many factors to extract. Thus researcher instruct the
computer to stop analysis when specified number of factors have been extracted.
Deriving factors and assessing overall fit

Percentage of variance criterion

Approach based on achieving a specified cumulative percentage of total
variance extracted by successive factors.

In natural sciences factor analysis can't be stopped until the extracted factors
account for 95% of variance

In social sciences criteria can be 60% variance

Scree test criterion

Proportion of unique variance is substantially higher in the later variables.
The scree test is used to identify the optimum number of factors to be
extracted before the amount of unique variance begins to dominate the
common variance structure.
Deriving factors and assessing overall fit

Scree test is derived by plotting the latent roots against the number of factors
in their order of extraction. The shape of resulting curve is used as a criteria
for cutting off point.


The point at which the curve first begin to straighten out is considered to
indicate the maximum number of factors to be extracted.


As a general rule scree test results in at least 1 and sometimes 2 or 3 more
factors being extracted than does latent root criteria.
Deriving factors and assessing overall fit
Interpreting the factors

Three process of factor interpretation includes:
1) Estimate the factor matrix
-Initial unrotated factor matrix is computed containing factor loading for each
variables.
-Factor loadings are correlation of each variable and factor
-They indicate the degree of correspondence between the variable and factor
-Higher loading indicates that variable is representative of factor
-They achieve objective of only data reduction.
Interpreting the factors
2) Factor rotation
-As un rotated factor don't provide the required information that provide adequate
interpretation of data. Thus we need the rotational method to achieve simpler factor
solutions.
-Factor rotation improves interpretation of data by reducing ambiguities
-Rotation means that reference axes of factors are turned about the origin until some
position has been achieved.
-Un rotated factor solution extract factors in order of their variances extracted (i.e first
factors that accounts for the largest variance and then so on)
-Ultimate effect of rotation is to redistribute the variance from earlier factors to later ones
Interpreting the factors
-Two methods of factor rotation includes
· Orthogonal factor rotation
*Axes are maintained at 90 degree.
*Mostly used as almost all software include it
*More suitable when research goal is data reduction
· Oblique factor rotation
* Axes are rotated but they don't retain the 90 degree angle between
reference axes.
* Oblique is more flexible
*Best suited to the goal of obtaining several theoretically meaningful factors
Interpreting the factors
-Major orthogonal factor rotation approaches include:
* Quartimax
Goal is to simplify the rows of factor matrix i.e. it focus on rotating the initial
factor so that variable loads high on one factor and as low as possible on other
factors.
*Varimax
Goal is to simplify the columns of factor matrix. It maximizes the sum of variance
of required loading of factor matrix
With this some high loadings (close to +1 or -1) are likely as are some loadings
near zero.
*Equimax
Compromise between quartimax and varimax. It hasn't gain wider acceptance
Interpreting the factors
-Oblique Rotation Methods
*Promax Rotation: Promax is one of the most commonly used oblique rotation methods in
EFA. It simplifies the factor matrix by allowing the factors to be correlated. The
simplification helps in the interpretability of the factors.
*Oblimin Rotation: Oblimin is another widely used oblique rotation method. It does not
force the factors to be uncorrelated and uses a mathematical criterion to determine the degree
of correlation between factors.
*Direct Oblimin: Direct Oblimin is a specific type of Oblimin rotation that allows for simple
structure by minimizing the number of salient loadings on each factor and maximizing the
independence of the factors.
*Quartimax Rotation: Quartimax is an oblique rotation method that yields a simpler factor
structure by forcing the factor loadings to be zero in many cases, thus emphasizing the
presence of a few large loadings.
Interpreting the factors
3) Assessing Statistical Significance
- A factor loading represents the correlation between an
original variable and its factor.
-Concept of statistical power is used to determine
factor loadings significant for various samples.
-Table 3.2 contains the sample sizes necessary for
each factor loading value to be considered significant.
Interpreting the factors
Interpreting a Factor Matrix

Step 1: Evaluate factor matrix of loadings
· Factor loading matrix contains the factor loading of each variable on each
factor.
· If oblique rotation is used, it provides 2 matrices
· Factor pattern matrix: loadings that show unique contribution of each
variable to factor
· Factor structure matrix: Simple correlation between variables and factor
but loading contain both unique variance and correlation among factors
Interpreting the factors

Step 2: Identify the Significant Loading(s) for Each Variable
· Interpretation start with the first variable on first factor then move
horizontally. When highest loading for factor is identified underline it
and then move to 2nd variable
. when an observed variable has a high loading (correlation) on more than
one factor, indicating that it is influenced by multiple latent variables, it
is called cross loading.
. To avoid it, use different rotation methods firstly to remove cross loading
or delete variable
Interpreting the factors

Step 3:Assess communalities of variables
· Once significant loadings identified, look for variables that are not
adequately accounted for by factor analysis
· Identify variable lacking at least 1 significant loading
· Examine each variable communality i.e amount of variance accounted for
by factor solution for each variable
· Identify all variables with communalities less than .5 as not having
sufficient explanation
Interpreting the factors

Step 4:Re specify the factor models if needed
· Done when researcher finds that a variable has no significant loadings or with a
significant loading a variable's communality is low. Following remedies are considered
there
· Ignore those problematic variables and interpret solutions if objective is solely data
reduction.
· Evaluate each of variables for possible deletion.
· Employ an alternative rotation method.
· Decrease/increase number of factors retained
· Modify type of factor model used (common versus component)
Interpreting the factors

Step 4:Label the factors
· Researcher will assign a name or label to factors that accurately reflect
variables loading on that factor.
Validation of factor analysis

Assessing the degree of generalizability of results to population and potential
influences of cases on the overall result
i. Use of confirmatory perspective
· The most direct method of validating the results
· Require separate software called as LISREL

Assessing factor structure stability
· Factor stability is dependant on sample size and on number of cases per variable.
· Researcher may split sample into two subsets and estimate factor model for each
subset.
· Comparison of 2 resulting factor matrices will provide assessment of robutness
of solution across sample
Data Reduction

Selecting Surrogate Variables

Creating Summated Scales

Computing Factor Scores
Data Reduction

Selecting Surrogate Variables

Surrogate variables refer to a method used in data reduction to
represent the original variables in a more compact and easier to
analyze form.

The selection of surrogate variables in data reduction involves using
statistical techniques like principal component analysis along with
domain knowledge to understand the relationships between variables.

The main objective of selecting surrogate variables is to reduce the
dimensionality of the data, making the dataset more manageable by
having fewer surrogate variables compared to the original variables.

The accuracy of the analysis and the results obtained are significantly
influenced by the choice of surrogate variables. Therefore, careful
consideration of trade-offs is essential when selecting them.
Data Reduction

It is important to balance the benefits of reducing complexity and simplifying
analysis with the potential loss of information or accuracy when choosing
surrogate variables.

The selection of the right surrogate variables directly impacts the accuracy
and reliability of data analysis results. Thus, choosing variables that retain
relevant information while reducing dataset size is crucial for effective data
reduction.

Creating Summated Scales

The ability of the summated scale to portray complex concepts in a single
measure while reducing measurement error makes it a valuable addition in
any multivariate analysis.

Summated scales are created by aggregating responses across related items
that measure the same underlying construct of factor.
Data Reduction

For example, in psychological research, multiple survey items measuring
depression, anxiety, or self-esteem can be combined to form compposite
scores for each construct.

This simplifies data analysis by reducing the number of variables while still
capturing essential dimensions of interest.

It's crucial to ensure the reliability and quality of the items being compiled and
to standardise the scores for comparability across different scales or subscales.

Computing Factor Scores

Factor scores in factor analysis represent composite measures of each factor
calculated for each individual in a dataset.

These scores indicate the extent to which individuals exhibit characteristics
associated with the group of items having high loadings on a specific factor.
Data Reduction

Unlike summated scales, which combine only selected variables, factor scores
are computed based on the factor loadings of all variables associated with the
factor.

While variables with the highest loadings contribute more to the factor score,
lower-loading variables also influence the score to some extent.

Factor scores offer a nuanced understanding of individual differences in relation
to underlying factors, considering the entire set of variables rather than just a
subset.

Researchers must consider both the prominent and lesser loadings of variables
when interpreting factor scores to ensure a comprehensive understanding of the
factor's representation in the data.
Data Summarization & Data reduction Assumption
1. Data Summarization Assumptions

- Variables used in factor analysis should be correlated with each other. This
assumption ensures that there is sufficient shared variance among the variables
to extract underlying factors.

- There should be an adequate sample size relative to the number of variables
being analyzed. A larger sample size provides more stable estimates of factor
loadings and reduces the risk of overfitting the model.
2. Data Reduction Assumptions

- The underlying factors are assumed to be independent of each other. This
assumption allows for the identification of distinct, non-overlapping factors that
capture unique aspects of the data.
Data Summarization & Data reduction Assumption

Factors are assumed to be linear combinations of the observed variables.
This assumption implies that each factor represents a weighted sum of the
original variables, with the weights represented by the factor loadings.

The observed variables are assumed to be influenced only by the underlying
factors and random error. This assumption suggests that any variance not
accounted for by the factors is due to measurement error or other extraneous
factors not captured in the analysis.
THANK YOU

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy