Aspects of Multivariate Analysis
Aspects of Multivariate Analysis
1.1Introduction
Scientific inquiry is an iterative learning process. Objectives pertaining to the
explanation of a social or physical phenomenon must be specified and then
tested by gathering and analyzing data. In turn, an analysis of the data gathered
by experimentation. Throughout this iterative learning process, variables are
often added or deleted from the study. Thus, the complexities of most
phenomena require an investigator to collect observations on many different
variables. This book is concerned with statistical methods designed to elicit
information from these kinds of data sets. Because the data include simultaneous
measurements on many variables, this body of methodology is called
multivariate analysis.
The need to understand the relationships between many variables makes
multivariate analysis an inherently difficult subject. Often, the human mind is
overwhelmed by the sheer bulk of the data. Additionally, more mathematics is
required to derive multivariate statistical techniques for making inferences than
in a univariate setting. We have chosen to provide explanations based upon
algebraic concepts and to avoid the derivations of statistical results that require
the calculus of many variables. Our objective is to introduce several useful
multivariate techniques in a clear manner, making heavy use if illustrative
examples and a minimum of mathematics. Nonetheless, some mathematical
sophistication and a desire to think quantitatively will be required.
Most of our emphasis will be on the analysis of measurements obtained
without actively controlling or manipulating any of the variables on which the
measurements are made. Only in Chapters 6 and 7 shall we treat a few
experimental plans (designs) for generating data that prescribe the active
manipulation of important variables. Although the experimental design is
ordinarily the most important part of a scientific investigation it is frequently
impossible to control the generation of appropriate data in certain disciplines.
(This is true, for example, in business, economics, ecology, geology, and
sociology.) You should consult [6] and [7] for detailed accounts of design
principles that, fortunately, also apply to multivariate situations.
It will become increasingly clear that many multivariate methods are based
upon an underlying probability model known as the multivariate normal
distribution. Other methods are ad hoc in nature and are justified by logical or
commonsense arguments. Regardless of their origin, multivariate techniques
must, invariably, be implemented on a computer. Recent advances in computer
technology have been accompanied by the development of rather sophisticated
statistical software packages, making the implementation step easier.
Multivariate analysis is a mixed bag. It is difficult to establish a
classification scheme for multivariate techniques that is both widely accepted
and indicates the appropriateness of the techniques. Once classification
distinguishes techniques designed to study interdependent relationships from
those designed to study dependent relationships. Another classifies techniques
according to the number of populations and the number of set of variables being
studied. Chapters in this text are divided into section according to inference
about treatment means, inference about covariance structure, and techniques for
sorting or grouping. This should not, however, be considered an attempt to place
each method into a slot. Rather, the choice of methods and the types of analyses
employed are largely determined by the objectives of the investigation. In
Section 1.2, we list a smaller number of practical problems designed to illustrate
the connection between the choice of a statistical method and the objectives of
the study. These problems plus the examples in the text, should provide you with
an appreciation of the applicability of multivariate techniques across different
fields.
The objectives of scientific investigations to which multivariate methods
most naturally lend themselves include the following:
1. Data reduction or structural simplification. The phenomenon being studied
is represented as simply as possible without sacrificing valuable
information. It is hoped that this will make interpretation easier.
2. Sorting and grouping. Groups of similar objects or variables are created,
based upon measured characteristics. Alternatively, rules for classifying
objects into well-defined groups may be required.
3. Investigation of the dependence among variables. The nature of the
relationships among variables is of interest. Are all the variables mutually
independent or are one or more variables dependent on the others? If so,
how?
4. Prediction. Relationships between variables must be determined for the
purpose of predicting the values of one or more variables on the basis of
observations on the other variables.
5. Hypothesis construction and testing. Specific statistical hypotheses,
formulated in terms of the parameters of multivariate populations, are
tested. This may be done to validate assumptions or to reinforce prior
convictions.
We conclude this brief overview of multivariate analysis with a quotation
from F. H. C. Marriot [19], page 89. The statement was made in a discussion of
cluster analysis, but we feel it is appropriate for a broader range of methods. You
should keep it in mind whenever you attempt or read about a data analysis. It
allows one to maintain a proper perspective and not be overwhelmed by the
elegance of some of the theory:
If the results disagree with informed opinion, do not admit a simple logical
interpretation, and do not show up clearly in a graphical presentation, they are probably
wrong. There is no magic about numerical methods, and may ways in which they can
break down. They are a valuable aid to the interpretation of data, not sausage machines
automatically transforming bodies of numbers into packets of scientific fact.
of objectives given in the previous section. Of course, many of our examples are
multifaceted and could be placed in more than one category.
Data reduction or simplification
Hypotheses testing
Pg. 25