An Introduction To Rorschach Assessment
An Introduction To Rorschach Assessment
An Introduction
to Rorschach Assessment1
GREGORY J. MEYER
DONALD J. VIGLIONE
Introduction
The Rorschach is a performance-based task or behavioral assessment measure2 that assesses a broad range of personality, perceptual, and problemsolving characteristics, including thought organization, perceptual accuracy
and conventionality, self-image and understanding of others, psychological
resources, schemas, and dynamics. The task provides a standard set of inkblot
stimuli, and is administered and coded according to standardized guidelines.
In many respects, the task is quite simple. It requires clients to identify what
a series of richly constructed inkblots look like in response to the query,
What might this be? Despite its seeming simplicity, the solution to this
task is quite complex, as each inkblot provides myriad response possibilities
that vary across multiple stimulus dimensions. Solving the problem posed
in the query thus invokes a series of perceptual problem-solving operations
related to scanning the stimuli, selecting locations for emphasis, comparing
potential inkblot images to mental representations of objects, filtering out
responses judged less optimal, and articulating those selected for emphasis
to the examiner. This process of explaining to another person how one looks
at things against a backdrop of multiple competing possibilities provides the
foundation for the Rorschachs empirically demonstrated validity. Unlike
interview- based measures or self-report inventories, the Rorschach does not
require clients to describe what they are like but rather it requires them to
281
RT20256_C008.indd 281
9/7/2007 5:07:10 PM
RT20256_C008.indd 282
9/7/2007 5:07:25 PM
and utility of its scales, (c) does the instrument have a reasonable base of
normative data, (d) can it reasonably be applied across cultures, and (e) does
the evidence suggest certain modifications should be made to traditional
interpretive postulates?
Because it is not possible to learn how to do Rorschach administration,
scoring, and interpretation by reading a single book chapter, we assume that
readers interested in gaining applied proficiency with the instrument will
rely on other resources. As such, even though we provide readers with a
general understanding of the Rorschach and how it is administered, scored,
and interpreted, our goal in this chapter is to emphasize the psychometric
evidence and issues associated with its use.
RT20256_C008.indd 283
9/7/2007 5:07:26 PM
Figure 8.1 Early inkblot for possible use created by Hermann Rorschach. (Used with permission of
the Hermann Rorschach Archives and Museum.)
RT20256_C008.indd 284
9/7/2007 5:07:26 PM
RT20256_C008.indd 285
9/7/2007 5:07:29 PM
on response content. The coding criteria are theoretically derived from the
psychodynamic construct of orality (Schafer, 1954) and include imagery
such as food sources, oral activity, nurturance, passivity, and helplessness.
Another example is Blatts Concept of the Object Scale (COS; Blatt, Brenneis, Schimek, & Glick, 1976). Like the GHR and PHR scores, the COS is
based on object relations theory. However, unlike GHR and PHR, the COS
coding criteria are derived entirely from theorizing about developmental
processes; they do not make allowances for the stimulus pull of the individual
inkblots and the extent to which that pull produces typical responses that do
not conform to theory. As a result, some of the things that people typically
or normatively see on the Rorschach receive less healthy COS scores than
do perceptions that are normatively atypical or unusual. For instance, the
stimulus features of Cards IV and IX pull for people to see quasi-human or
human-like figures (e.g., a monster or a wizard) rather than ordinary people.
Even though these responses are so common they are considered Popular,
the COS assigns them a less than optimal score because the latter is reserved
for human beings.
There are at least three other models for understanding types of Rorschach
scores; those that are founded on (1) simple classification, (2) clinical observation, and (3) behavioral similarity. The first is the least important. These
are response features that are coded primarily to exhaust a coding category.
Probably the best examples are some of the content codes in the CS. Every
response is coded for the content it contains, though not all of the content
categories are interpretively valuable. For instance, the CS has separate categories for household objects, science based percepts, botany as distinct from
landscape content, and an idiographic category for not otherwise classifiable
objects. None of these distinctions factor into standard interpretation.
Clinical observation is a form of empirical keying, in that response features
are linked to personality characteristics through clinical experience even if
there is no obvious parallel between the response feature and the characteristic that is thought to be indicated by the score. As an example, clinical
observation suggested that the perception of moving inanimate objects (an
m score) is associated with environmental stress, internal tension, agitated
cognitive activity, and loss of control, while responses that are prompted
by the general shading features in the ink (Y scores) are associated with
disruptive experiences of anxiety or helplessness. In each example there are
nonobvious links between the score and the construct that it is hypothesized
to measure. The big difference between scores based on clinical observations
and those based on empirical keying is that the former may or may not demonstrate empirical relationships when actually tested. However, both of the
example scores (m and Y) have replicated data supporting their construct
validity (e.g., Hartmann, Nrbech, & Grnnerd, 2006; Hartmann, Sunde,
Kristensen, & Martinussen, 2003; Hartmann, Wang, Berg, Sther, 2003;
RT20256_C008.indd 286
9/7/2007 5:07:29 PM
McCowan, Fink, Galina, & Johnson, 1992; Nygren, 2004; Perry et al., 1995;
Sultan, Jebrane, & Heurtier-Hartemann, 2002). As has been the case for m
and Y, other clinical observation scores that garner empirical support over
time also typically develop an experiential explanation or theory that links the
observed test behavior to the criterion construct. For instance, in hindsight
it is now not too difficult to see how at an experiential level a person who
feels considerable stress, tension, and agitation may see an elevated number
of nonliving objects in motion (e.g., percepts of objects exploding, erupting,
falling, spinning, tipping, or shooting).
Finally, many Rorschach scores are rationally constructed behavioral
representation scores, in that the response characteristic coded in the testing
situation closely parallels the real-life behavior that it is thought to measure
(Weiner, 1977). That is, what is coded in the microcosm of the test setting is
a representative sample of the behavior or experience that one expects to be
manifested in the macrocosm of everyday life (Viglione & Rivera, 2003). For
instance, the CS morbid score (MOR) is coded when dysphoric or sad affect
is attributed to an object or when an object is described as dead, injured, or
damaged in some manner. When responses of this type occur fairly often,
they are thought to indicate a sense of gloomy, pessimistic inadequacy. Thus,
the behavior coded in the testing situation is thought to be representative of
the dysphoric, negative, damaged mental set that the person generally uses
to interpret and filter life experiences. Similarly, the CS cooperative movement scores (COP) is coded when two or more objects are described as
engaging in a clearly cooperative or positive interaction. Higher COP scores
are thought to assess a greater propensity to conceptualize relationships as
supportive and enhancing.
Probably the most well-known and best-validated behavioral representation scores on the Rorschach are the indicators of disordered thought and
reasoning. In the CS these are called the Cognitive Special Scores and they are
coded in a number of instances, including when responses are circumstantial
or digressive, when objects have an implausible or impossible relationship
(e.g., two chickens lifting weights), and when reasoning is strained or overly
concrete. In all these examples, the coded test behavior represents the extratest characteristic it is thought to measure. Thus, behavioral representation
scores require relatively few inferential steps to link what is coded on the
test to everyday behavior.
Basic Psychometrics
Reliability
Reliability is the extent to which a construct is assessed consistently. Once
assessed consistently, it is necessary to establish that what is being measured
is actually what is supposed to be measured (validity) and that the measured
RT20256_C008.indd 287
9/7/2007 5:07:29 PM
RT20256_C008.indd 288
9/7/2007 5:07:30 PM
RT20256_C008.indd 289
9/7/2007 5:07:30 PM
RT20256_C008.indd 290
9/7/2007 5:07:30 PM
examiner. Thus, repeated practice and calibration with criterion ratings are
essential for good practice.
Another issue is that most reliability research (for the Rorschach and for
other instruments) relies on raters who work or train in the same setting. To
the extent that local guidelines develop to contend with scoring ambiguity,
agreement among those who work or train together may be greater than
agreement across different sites or workgroups. As a result, existing reliability
data may then give an overly optimistic view of scoring consistency across
sites or across clinicians working independently. Another way to say this is
that scoring reliability (i.e., agreement among two fallible coders) may be
higher than scoring accuracy (i.e., correct coding).
This issue was recently examined for the CS. In a preliminary report of
the data, Meyer, Viglione, Erdberg, Exner, and Shaffer (2004) examined
40 randomly selected protocols from Exners new CS nonpatient reference
sample (Exner & Erdberg, 2005) and 40 protocols from Shaffer, Erdberg,
and Haroians (1999) nonpatient sample from Fresno, California. These 80
protocols were then blindly recoded by a third group of advanced graduate
students who were trained and supervised by the second author. To determine the degree of cross-site reliability, the original scores were compared
to the second set of scores. The data revealed an across site median ICC of
.72 for summary scores. Although this would be considered good reliability according to established benchmarks, it is lower than the value of .85
or higher that typically has been generated by coders working together in
the same setting.
Findings like this suggest there are complexities in the coding process
that are not fully clarified in standard CS training materials (Exner, 2001,
2003). As a result, training sites, such as specific graduate programs, may
develop guidelines or benchmarks for coding that help resolve these residual
complexities. However, these principles may not generalize to other training sites. To minimize these problems, students learning CS scoring should
find Vigliones (2002) coding text helpful and should thoroughly practice
their scoring relative to the across-site gold standard scores that can be
found in the 300 practice responses in Exners (2001) workbook and in the
25 cases with complete responses in the basic CS texts (Exner, 2003; Exner
& Erdberg, 2005).
Beyond agreement in scoring the Rorschach, an important question is
the extent to which clinicians show consistency in the way they interpret
Rorschach results. Interclinician agreement when interpreting psychological
tests (not just the Rorschach) was studied fairly often in the 1950s and 1960s,
though it then fell out of favor (Meyer, Mihura, & Smith, 2005). The reliability
of Rorschach interpretation in particular has been challenged, with some
suggesting that the inferences clinicians generated said more about them than
RT20256_C008.indd 291
9/7/2007 5:07:30 PM
Validity
Construct validity refers to evidence that a test scale is measuring what it is
supposed to measure. It is determined by the conglomerate of research findings related to both convergent and discriminant validity. Convergent validity
refers to expected associations with criteria that theoretically should be related
to the target construct, while discriminant validity refers to an expected lack
of association with criteria that theoretically should be independent of the
target construct. Evaluating the validity of a complex, multidimensional
measure like the Rorschach is challenging because it is difficult to systematically review the full historical pattern of evidence attesting to convergent
and discriminant validity for every test score. As such, we focus primarily
on results from meta-analytic reviews.
Thousands of studies from around the world have provided evidence for
Rorschach validity (e.g., for narrative summaries of specific variables see
Bornstein & Masling, 2005; Exner & Erdberg, 2005; Viglione, 1999). Meyer
and Archer (2001) summarized the available evidence from Rorschach
meta-analyses, including four that examined the global validity of the test
RT20256_C008.indd 292
9/7/2007 5:07:31 PM
and seven that examined the validity of specific scales in relation to particular criteria. The scales included CS and non-CS variables. For comparison,
they also summarized the meta-analytic evidence available on the validity
of the MMPI and IQ measures. Subsequently, Meyer (2004) compared the
validity evidence for these psychological tests to meta-analytic findings for
the medical assessments reported in Meyer et al. (2001).
Although the use of different types of research designs and validation tasks
makes it challenging to compare findings across meta-analyses, the broad
review of evidence indicated three primary conclusions. First, psychological
and medical tests have varying degrees of validity, ranging from scores that
are essentially unrelated to a particular criterion to scores that are strongly
associated with relevant criteria. Second, it was difficult to distinguish between medical tests and psychological tests in terms of their average validity; both types of tests produced a wide range of effect sizes and had similar
averages. Third, test validity is conditional and dependent on the criteria
used to evaluate the instrument. For a given scale, validity is greater against
some criteria and weaker against others.
Within these findings, validity for the Rorschach was much the same as
it was for other instruments; effect sizes varied depending on the variables
considered but, on average, validity was similar to other instruments. Thus,
Meyer and Archer (2001) concluded that the systematically collected data
showed the Rorschach produced good validity coefficients that were on par
with other tests:
Across journal outlets, decades of research, aggregation procedures,
predictor scales, criterion measures, and types of participants, reasonable hypotheses for the vast array of Rorschach scales that have been
empirically tested produce convincing evidence for their construct
validity (Meyer & Archer, 2001, p. 491).
Atkinson, Quarrington, Alp, and Cyr (1986) conducted one of the earliest meta-analytic reviews of the Rorschach and found good evidence for its
validity. They noted that the test is regularly criticized and challenged despite
the evidence attesting to its validity. To understand why, they suggested that
deprecation of the Rorschach is a sociocultural, rather than scientific, phenomenon (p. 244). Meyer and Archer (2001) reached a similar conclusion
about the evidence base and concluded that a dispassionate review of the evidence would not warrant singling out the Rorschach for particular criticism.
However, they also noted that the same evidence would not warrant singling
out the Rorschach for particular praise. Its broadband validity appears both
as good as and also as limited as that for other psychological tests.
Robert Rosenthal, a widely recognized and highly regarded expert in metaanalysis, was commissioned to conduct a comparative analysis of Rorschach
RT20256_C008.indd 293
9/7/2007 5:07:31 PM
and MMPI validity for a Special Issue of the journal Psychological Assessment.
He and his coworkers (Hiller, Rosenthal, Bornstein, Berry, & Brunell-Neuleib,
1999; Rosenthal, Hiller, Bornstein, Berry, & Brunell-Neuleib, 2001) found
that on average the Rorschach and MMPI were equally valid. However, they
also identified moderators to validity for each instrument. Moderators are
factors that influence the size of the validity coefficients observed across
studies. The Rorschach demonstrated greater validity against criteria that
they classified as objective, while the MMPI demonstrated greater validity
against criteria consisting of other self-report scales or psychiatric diagnoses.5
The criteria they considered objective encompassed a range of variables that
were largely behavioral events, medical conditions, behavioral interactions
with the environment, or classifications that required minimal observer
judgment, such as dropping out of treatment, history of abuse, number of
driving accidents, history of criminal offenses, having a medical disorder,
cognitive test performance, performance on a behavioral test of ability to
delay gratification, or response to medication. Viglione (1999) conducted
a systematic descriptive review of the Rorschach literature and similarly
concluded that the Rorschach was validly associated with behavioral events
or life outcomes involving person-environment interactions that emerge
over time. In general, these findings are consistent with the types of spontaneous behavioral trends and longitudinally determined life outcomes that
McClelland, Koestner, and Weinberger (1989) showed were best predicted
by tests measuring implicit characteristics, as opposed to the conscious and
deliberately chosen near-term actions that were best predicted by explicit
self-report tests (also see Bornstein, 1998).
In the most recent Rorschach meta-analysis, which was not considered
in the previous reviews, Grnnerd (2004) systematically summarized the
literature examining the extent to which Rorschach variables could measure
personality change as a function of psychological treatment. The Rorschach
produced a level of validity that was equivalent to alternative instruments
based on self-report or clinician ratings. Grnnerd also examined moderators to validity and, consistent with expectations from the psychotherapy
literature, found that Rorschach scores changed more with longer treatment,
suggesting that more therapy produced more healthy change in personality.
Grnnerd also noted that effect sizes were smaller when coders clearly did
not know whether a protocol was obtained before or after treatment but larger
in studies that clearly described scoring reliability procedures and obtained
good reliability results using conservative statistics.
Overall, the meta-analytic evidence supports the general validity of the
Rorschach. Globally, the test appears to function as well as other assessment
instruments. To date, only a few meta-analyses have systematically examined
the validity literature for specific scales in relation to particular criteria. The
RT20256_C008.indd 294
9/7/2007 5:07:32 PM
evidence has been positive and supportive for the ROD, the Rorschach Prognostic Rating Scale (RPRS), and the precursor to the PTI, the Schizophrenia
Index (SCZI), though it has not been supportive of the CS Depression Index
(DEPI) when used as a diagnostic indicator. As is true for other commonly
used tests, such as the MMPI-2, Personality Assessment Inventory (PAI; Morey, 1991), Millon Clinical Multiaxial Inventory (MCMI-III; Millon, 1994),
or Wechsler scales (e.g., Wechsler, 1997), additional focused meta-analytic
reviews that systematically catalog the validity evidence of particular Rorschach variables relative to specific types of criteria will continue to refine
and enhance clinical practice.
Utility
In general, the utility of an assessment instrument refers to the practical value
of the information it provides relative to its costs. The Rorschach takes time
to administer, score, and interpret. To make up for these costs, the Rorschach
needs to provide useful information that cannot be obtained from tests, interviews, or observations that are readily available and less time consuming.
One way to evaluate this issue in research is through incremental validity
analyses (see Hunsley & Meyer, 2003), where the Rorschach and a less time
intensive source of information are compared statistically. To demonstrate
incremental validity, the Rorschach would need to predict the criterion over
and above what could be predicted by the simpler method. Such a finding
demonstrates statistically that the Rorschach provides unique information.
Although utility cannot be equated with statistical evidence of incremental
validity, the latter is one commonly obtained form of evidence that can attest
to utility. Utility also can be demonstrated by predicting important real-world
behaviors, life outcomes, and the kind of ecologically valid criteria that are
important in the context of applied practice with the test. Research reviews
and meta-analyses show that the Rorschach possesses utility in all of these
forms, such that Rorschach variables predict clinically relevant behaviors and
outcomes and have demonstrated incremental validity over other tests, demographic data, and other types of information (Bornstein & Masling, 2005;
Exner & Erdberg, 2005; Hiller et al., 1999; Meyer, 2000a; Meyer & Archer,
2001; Viglione, 1999; Viglione & Hilsenroth, 2001; Weiner, 2001).
We do not have the space to review more than a sampling of utility findings.
With respect to incremental validity, recent studies published in the United
States and Europe show the Rorschach yields important information that is
not attainable through simpler, less time consuming methods. The criteria
include predicting future success in Norwegian naval special forces training
(Hartmann et al., 2003), future delinquency in Swedish adolescents and adults
based on clinician ratings of ego strength from childhood Rorschach protocols (Janson & Stattin, 2003), future psychiatric relapse among previously
RT20256_C008.indd 295
9/7/2007 5:07:32 PM
RT20256_C008.indd 296
9/7/2007 5:07:32 PM
RT20256_C008.indd 297
9/7/2007 5:07:32 PM
Quick Reference
The Rorschach can evaluate personality and problem solving in psychiatric, medical, forensic, and nonclinical settings.
It is used with children, adolescents, and adults in any language or culture.
The task is individually administered in a collaborative two-step process that elicits
responses with the prompt, What might this be?, and then clarifies the what,
where, and why of each percept.
Responses are recorded verbatim. The CS requires a minimum of 14; data and cost
benefit considerations support prompting for at least two per card but obtaining
no more than four.
Proper administration, scoring, and interpretation require considerable training.
Computer-assisted scoring is recommended and likely will become increasingly
important.
RT20256_C008.indd 298
9/7/2007 5:07:33 PM
RT20256_C008.indd 299
9/7/2007 5:07:33 PM
RT20256_C008.indd 300
9/7/2007 5:07:33 PM
Scoring
To score the Rorschach, codes are typically applied to each response and then
aggregated across all responses. In the CS the codes assigned to each response
RT20256_C008.indd 301
9/7/2007 5:07:34 PM
form what is known as the Sequence of Scores and the tally of codes across
all responses is known as the Structural Summary. The scoring process can
be fairly simple for single construct scoring systems, like the ROD, or fairly
complex for multidimensional scoring systems, like the CS. However, scoring
according to any system requires the same ingredients: a clearly articulated
set of scoring guidelines, an understanding of those guidelines by the coder,
and the coders repeated practice of scoring against gold standard example
material until proficiency is obtained. For a multidimensional system like the
CS, fairly substantial training is required for proficiency. Table 8.1 provides
a brief list of the standard CS codes that can be assigned to each response
to generate the Sequence of Scores. These scores are then summed across
responses and form the basis for about 70 ratios, percentages, and derived
scores that are given interpretive emphasis on the Structural Summary.
Because of the complexity of this material, we do not provide a detailed
description. However, a full guide to interpretation can be found in standard interpretive texts (Exner, 2003; Exner & Erdberg, 2005; Weiner, 2003).
These sources make it clear that formal coding is only part of the data that
contributes to an interpretation. There are behaviors expressed during the
testing, themes associated with response imagery, and perceptual or content
based idiosyncrasies that are not captured by the formal scores but that may
nonetheless be very important for helping to develop an idiographic and
unique understanding of the client (e.g., Peebles-Kleiger, 2002).
The requirements for competent administration and interpretation are
similar to the requirements for coding. In order to perform an adequate
administration the examiner must first understand scoring in order to formulate suitable Inquiry questions. Like with scoring, developing proficient
administration skills requires practice and accurate feedback about errors
or problems. The latter can be accomplished most adequately when a thoroughly trained supervisor is physically present to observe and correct the
students practice administrations as they are occurring, though supervisory
feedback on videotaped administrations also can be quite helpful. The least
optimal training occurs when supervision feedback is only provided on hand
written or typed protocols, as many nuances of nonverbal interaction are
not captured by this written record and it is not possible for the supervisor
to see how adequately the written record captured what actually transpired
during the administration.
Interpretation
Not surprisingly, Rorschach interpretation is the most complex or difficult
activity, as proficiency requires knowledge and skills in multiple areas. These
include:
RT20256_C008.indd 302
9/7/2007 5:07:34 PM
The client either makes use of the whole inkblot (W), one or more
of its commonly perceived detail (D) locations, or one or more
of its small or rarely used detail (Dd) locations. The background
white space (S) can also be incorporated with each location (i.e.,
WS, DS, or DdS).
RT20256_C008.indd 303
9/7/2007 5:07:34 PM
Pairs
A pair (2) is coded when the same object is identified on each side
of the blot. This is a symmetry based score, like the reflection
response.
Contents
Organizational
activity
RT20256_C008.indd 304
9/7/2007 5:07:34 PM
Other special
scores
RT20256_C008.indd 305
9/7/2007 5:07:35 PM
the capacity for disciplined reasoning to rule in and rule out inferences; and
the ability to integrate Rorschach-based inferences with inferences
obtained from other tests, from observed behavior, and from history
as reported by the client and other sources of collateral information.
Of course, to adequately perform the last step of integration, the examiner
must also have parallel forms of knowledge about the other tests and sources
of information that are contributing inferences. That is, for each non-Rorschach data source, the clinician must understand the interpretive postulates
associated with the observation, understand the kind of information that
the data source can and cannot provide, know what forms of systematic bias
influence the data source, and know the reliability and validity evidence for
the alternative data source. To become proficient with the idiographic task
of correctly interpreting a complex array of personality test results, including
Rorschach scores, requires considerable closely supervised clinical experience
with a well-trained individual.
Computerization
Although computerized administration has been used in Rorschach research,
standard CS test administration does not lend itself to automated, computer-adapted administration or to computer automated scoring. However,
computer-assisted scoring and interpretation for the CS is quite common,
with the two primary software programs being the Rorschach Interpretive
Assistance Program (RIAP), which is now in its 5th edition and authored
by John Exner and Irving Weiner, and ROR-SCAN, which is now in its 6th
edition and authored by Philip Caracena. Reviews of each program can be
found in Acklin (2000; for the 4th edition of RIAP) and Smith and Hilsenroth
(2003; for the 6th edition of ROR-SCAN).
Because the CS Structural Summary tabulates many different scores and
then generates numerous other ratios or derived scores, we strongly recommend computer-assisted scoring to minimize the prospect of computational
errors. For computer-assisted scoring, the examiner manually assigns codes to
each response on the sequence of scores, but allows the computer algorithms
to generate the final Structural Summary. Doing so has a number of benefits.
First, it allocates the clinicians time and expertise where it is required, which
is with judging what codes should be assigned to each response, and it leaves
the mundane (but error prone) mathematical operations to a machine that
is perfectly suited to these clerical tasks. Second, computer-assisted scoring
would allow all users to obtain CS-based variables like the Ego Impairment
Index (EII-2; Perry & Viglione, 1991; Viglione, Perry, & Meyer, 2003) that
are too complex for hand scoring.
RT20256_C008.indd 306
9/7/2007 5:07:35 PM
5 or 6 to elderly
Purpose:
Strengths:
Limitations:
Time to Administer:
about 45 minutes
Time to Score:
RT20256_C008.indd 307
9/7/2007 5:07:35 PM
RT20256_C008.indd 308
9/7/2007 5:07:36 PM
RT20256_C008.indd 309
9/7/2007 5:07:36 PM
RT20256_C008.indd 310
9/7/2007 5:07:36 PM
abstract coding criteria are challenging to apply to commonly given responses. For instance, the D1 area on Card VII is very commonly described
as a girl or womans head. Typically, the object is also described as having her
hair sticking up in the air and coders would benefit from specific guidelines
for when inanimate movement should be coded in this common response
(e.g., Viglione, 2002).
Finally, in many instances there is a degree of irreducible uncertainty
associated with scoring because of the ambiguity that is inherent in a
verbalized response. Much like a reversible figure or Necker cube, even
after being adequately inquired, some responses can be interpreted in two
notably different and mutually exclusive ways. This allows for reasonably
trained people to disagree on what exactly was perceived and described by
the client, and thus will lead reasonably trained people to disagree on scoring. At times, coders also can disagree on what is included in a response.
For example, clients sometimes change their perception from the Response
to the Inquiry phase, or examiners may be unsure when multiple objects
are identified if they constitute one combined response or several distinct
responses. Such ambiguities need to be addressed in the future to increase
reliability in the test.
Despite these limitations, the Rorschach offers clinicians a rich sample of
behavior on which to base carefully considered, disciplined, and synthesized
Important References
Exner (2003), Viglione (2002), Exner and Erdberg (2005), and Weiner (2003). Together these
four resources provide the basic information needed to learn standard CS administration, scoring, and interpretation. Exner also provides an overview of evidence for each
CS score, Viglione elaborates on and clarifies basic scoring principles, Exner and Erdberg
review relevant research in the context of an interpretive guide that addresses particular
referral questions, and Weiner complements the latter by providing an easy to read
general interpretive guide.
Meyer (1999b) and Meyer (2001c). These citations reference a special series of eleven articles in
the journal Psychological Assessment. The authors in the series participated in a critical,
structured, sequential, evidence based debate that focused on the strengths and limitations
of using the Rorschach for applied purposes. The debate took place over four iterations,
with later articles building upon and reacting to those generated earlier. This series gives
an overview of all the recent criticisms of the test.
Bornstein and Masling (2005). This text provides an overview of the evidence for seven approaches
to scoring the Rorschach that are not part of the CS. Scores that are covered include the
ROD for assessing dependency, as well as scales to measure thought disorder, psychological defenses, object relations, psychological boundaries, primary process thinking, and
treatment prognosis.
Society for Personality Assessment (2005). Drawing on the recent literature, this document is
an official statement by the Board of Trustees of the Society for Personality Assessment
concerning the status of the Rorschach in clinical and forensic practice. Their primary
conclusion was that the Rorschach produces reliability and validity that is similar to other
personality tests, such that its responsible use in applied settings is justified.
RT20256_C008.indd 311
9/7/2007 5:07:36 PM
Research Findings
In earlier sections we described the evidence base for the Rorschach in some
detail. We documented how meta-analyses have shown its scores can be
reliably assigned, are reasonably stable, and, when evaluated globally, are as
valid as those obtained from other personality assessment instruments. We
also documented how the Rorschach can validly assess a range of personal
characteristics that have meaningful utility for applied clinical practice, including diagnosing psychotic difficulties, planning treatment, and monitoring
the outcome of intervention. Here we focus on some of the relatively unique
challenges that are associated with documenting the construct validity of its
scores and validly interpreting them in clinical practice.
Foundation for Interpretive Postulates
Authors over the years have discussed challenges associated with validating
Rorschach-derived scales (e.g., Bornstein, 2001; Meehl, 1959; Meyer, 1996;
Weiner, 1977; Widiger & Schilling, 1980). One challenge arises because some
scores do not have an obvious or self-evident meaning. In other words, the
behavioral or experiential foundation for the response is not completely
obvious. Examples of these scores include diffuse shading (Y), use of the
white background (S), or the extent to which form features are primary
versus secondary in determinants (e.g., FC vs. CF; see Table 8.1 for score
descriptions). These are largely the scores we described above as being based
on clinical observation. Historically, these response characteristics have been
observed and studied in psychiatric settings with disturbed individuals where
the base rates of serious symptoms and failures in adaptation are high. As a
result, the standard interpretive algorithms (Exner, 2003) may be skewed or
biased toward negative and pathological inferences rather than toward the
positive or healthy inferences that may be relevant when such responses are
present in nonpsychiatric settings.
Unique Assessment Methodology
Another challenge relates to the uniqueness of the method itself. Because
of its uniqueness, the correlation between one Rorschach scale and another
Rorschach scale is rarely put forward as evidence for validity. For instance,
RT20256_C008.indd 312
9/7/2007 5:07:37 PM
both the MOA (Mutuality of Autonomy Scale) and the HRV (Human Representation Variable) assess the quality of object relations and theoretically
should be related to each other. However, researchers have not tried to
validate either scale by showing that they are correlated. Although this type
of research is rare with the Rorschach, it is a pervasive practice with other
assessment methods, where, for example, the correlation between two selfreport scales or two performance tasks of cognitive ability are regularly put
forward as validity evidence.
Instances when two scales from the same assessment method (e.g., two
Rorschach scales or two self-report scales) are correlated with each other
are known as monomethod validity coefficients (Campbell & Fiske, 1959)
and they are contrasted with the heteromethod validity coefficients obtained
when scales from two different assessment methods are correlated (e.g., when
a Rorschach scale is correlated with ratings of observed behavior). It has
been well-documented for the past half-century that monomethod validity
coefficients are substantially larger than heteromethod coefficients. This is
because method-specific sources of systematic error inflate the monomethod
coefficients (Campbell & Fiske, 1959; Meyer, 2002b).
For instance, consider self-report questionnaires to assess depression. To
document convergent validity, depression scales on the MMPI-2 and PAI
have been correlated with each other and scales on both instruments have
been correlated with the Beck Depression Inventory (BDI; Beck, Steer, &
Brown, 1996). Several factors conspire to artificially inflate these correlations,
and these factors are forms of systematic error. First, and most importantly,
there is an issue of what is known as criterion contamination in these studies.
Standard psychometric texts (e.g., Anastasi & Urbina, 1997) define criterion
contamination as instances in which knowledge of a predictor variable can
potentially influence the criterion variable (e.g., IQ scores are to be validated
by teacher ratings of intelligence but teachers see their students scores before making their ratings). These texts also document how it is essential to
avoid this problem in validity research to ensure validity coefficients are not
falsely inflated. In the case of two self-report scales, not only can knowledge
of what is reported on one scale influence what is reported on the other, but
in fact the same personthe respondentdetermines the scores that will
be present on both the predictor scale and the criterion scale. This circularity where the same person determines the data on all measures is a serious
methodological confound. Exacerbating the difficulty, people also strive for
consistency when answering similar items on two different inventories. Thus
people will strive to give consistent answers regarding sadness, tearfulness,
or lack of energy on two different depression scales.
It is also the case that self-ratings on two measures of depression (or any
other construct) are artificially equated by virtue of psychological defenses,
RT20256_C008.indd 313
9/7/2007 5:07:37 PM
RT20256_C008.indd 314
9/7/2007 5:07:37 PM
RT20256_C008.indd 315
9/7/2007 5:07:37 PM
states and experiences for various reasons (e.g., because they lack intrapersonal sophistication and insight or because they have defenses that push
these threatening feelings from awareness). The notion that clinicians should
not infer that a score necessarily implies a conscious and self-reportable
experience applies to a long list of constructs often considered in the course
of CS interpretation (Exner, 2003), including affective distress, depression,
sadness, stress, overloaded coping resources, inability to concentrate, needs
for closeness, loneliness, introspectiveness, self-criticism, emotional deprivation, emotional confusion, interest in or discomfort with affective stimuli,
oppositionality, hypervigilance, suicidality, passivity, dependence, inflated
sense of personal worth, negative self-esteem, bodily concerns, pessimism,
interest in others, or the expectation that relationships will be cooperative
and/or aggressive. Even though validity data indicate Rorschach variables
actively influence perception, behavior, and thought, research also indicates
these experiences may not be consistently accessible in consciousness and
available to self-report. Recognizing this constraint when interpreting data
and writing test reports will help ensure inferences are consistent with the
Rorschachs methodology and the evidence about its locus of effectiveness.
RT20256_C008.indd 316
9/7/2007 5:07:38 PM
RT20256_C008.indd 317
9/7/2007 5:07:38 PM
& Exner, 2001). In addition, Allen and Dana (2004) provided a thorough
review of existing evidence, as well as a detailed discussion of methodological
issues associated with cross-cultural Rorschach research.
Presley et al. (2001) compared CS data from 44 African Americans (AA) to
44 European Americans (EA) roughly matched on demographic background
using the old CS nonpatient reference sample norms. They examined 23 variables they thought might show differences, though found only 3 that differed
statistically (the AA group used more white space, had higher SCZI scores,
and had fewer COP scores). While preparing this chapter, we examined ethnic
differences in the new CS reference sample of 450 adults (Exner & Erdberg,
2005). This sample contains data from 39 AAs and 374 EAs, with the remaining 37 participants having other ethnic heritages. We could not replicate the
findings of Presley et al. Although there were small initial differences on the
number of responses given by each group (AA M = 21.4, SD = 3.5; EA M =
23.8, SD = 5.9), once we controlled for overall protocol complexity, ethnicity
was not associated with any of the 82 ratios, percentages, or derived variables
on the Structural Summary (i.e., the variables found in the bottom half of
the standard CS structural summary page). Across these 82 scores, ethnicity
did not produce a point biserial correlation larger than |.09|.
Meyer (2002a) compared European Americans to a sample of African
Americans and to a combined sample of ethnic minorities that also included
Hispanic, Asian, and Native American individuals using a sample of 432
patients referred to a hospital based psychological assessment program.
He found no substantive association between ethnicity and 188 Rorschach
summary scores, particularly after controlling for Rorschach complexity
and demographic factors (gender, education, marital status, and inpatient
status). In addition, CS scores had the same factor structure across majority
and minority groups and in 17 validation analyses there was no evidence to
indicate the test was more valid for one group than the other.9 These data
clearly support using the CS across ethnic groups.
Meyer (2001a) contrasted Exners (1993) original CS adult normative
reference sample to a composite sample of 2,125 protocols taken from nine
sets of adult CS reference data that were presented in an international symposium (Erdberg & Shaffer, 1999). Although the composite sample included
125 (5.8%) protocols collected by Shaffer et al. (1999) in the United States,
the vast majority came from Argentina, Belgium, Denmark, Finland, Japan,
Peru, Portugal, and Spain. Despite diversity in the composite sample due
to selection procedures, examiner training, examination context, language,
culture, and national boundaries, and despite the fact that the original CS
norms had been collected 2025 years earlier, relatively few differences were
found between the two samples. Across 69 composite scores, the average difference was about four tenths of a standard deviation (i.e., equivalent to about
RT20256_C008.indd 318
9/7/2007 5:07:38 PM
RT20256_C008.indd 319
9/7/2007 5:07:38 PM
partial rather than full human images, and showed a bit more disorganization in thinking.
As such, changes seen within the CS norms over time are very similar to
the differences that had been found when comparing the original CS norms
to the composite international sample. However, the new CS reference sample
does not eliminate differences with the composite international sample. In
particular, the current CS norms continue to show less use of unusual detail
locations, better form quality, and more color responding than is seen in the
reference samples collected by others.
To understand the factors that may account for this, we compared the
quality of administration and scoring for protocols in Exners (Exner &
Erdberg, 2005) CS norms relative to Shaffer et al.s (1999) sample from
Fresno, CA (FCA; preliminary findings were reported in Meyer, Viglione,
Erdberg, Exner, & Shaffer, 2004). Two sets of results are notable. First, the
FCA protocols were less adequately administered and inquired, with more
instances when examiners failed to follow up on key words or phrases. This
is not surprising given that graduate student examiners collected all the
protocols, though it does indicate that some of the seeming simplicity in
the FCA records was an artifact of less thorough inquiry. Second, we found
that many of the seeming differences between the FCA and CS samples were
reduced or eliminated when 40 protocols from each sample were rescored by
a third group of examiners. This indicates that the Shaffer et al. records and
Exner protocols were coded according to somewhat different site-specific
scoring conventions. In general, the new scoring split the difference between
the CS and Shaffer et al. samples, making the CS protocols look a bit less
healthy than before and making the Shaffer et al. protocols look a bit more
healthy than before. There were two exceptions to this general trend. For
complexity, the rescored protocols resembled the CS norms more than the
FCA scores. In contrast, for form quality the rescored protocols resembled
the FCA scores more than the CS norms. The overall findings suggest that
site-specific administration and coding practices may contribute in important
and previously unappreciated ways to some of the seeming differences across
normative approximation samples (also see Lis, Parolin, Calvo, Zennaro, &
Meyer, in press).
Although this research has been conducted with adults, the issues appear
to be similar with children. For instance, Hamel, Shaffer, and Erdberg (2000)
provided reference data on 100 children aged 6 to 12. Although rated as
psychologically healthy, a number of their Rorschach scores diverged from
the CS reference norms for children; at times dramatically. Many of the
differences were similar to those found with adults (e.g., lower form quality values, less color, more use of unusual blot locations, less complexity),
though the values Hamel et al. reported tended to be more extreme. At least
in part, this appears due to the fact that all protocols were administered
RT20256_C008.indd 320
9/7/2007 5:07:39 PM
and scored by one graduate student who followed atypical procedures for
identifying inkblot locations. This in turn led to a very high frequency of
unusual detail locations and consequently to lower form quality codes (see
Viglione & Meyer, 2007). However, other child and adolescent samples in
the United States, France, Italy, Japan, and Portugal (Erdberg, 2005; Erdberg
& Shaffer, 1999) suggest clinicians should be cautious about applying the old
CS norms for children. The CS normative data for children have not been
updated recently like they have for adults.
Based on the available evidence, we recommend that examiners use the
new CS sample as their primary benchmark for adults, but adjust for those
variables that have consistently looked different in international samples,
including form quality, unusual locations, color, texture, and human representations (for specific recommendations see Table 8.2). The Shaffer et al.
sample can be viewed as an outer boundary for what might be expected from
reasonably functioning people within the limits of current administration,
inquiry, and scoring guidelines.
For children, we recommend using the available CS age-based norms
along with the adjusted expectations given in Table 8.2 for adults. Although
we do not recommend using the Hamel et al. sample as an outer boundary
for what could be expected for younger United States children, the data for
that sample illustrate how ambiguity or flexibility in current administration
and scoring guidelines can result in one obtaining some unhealthy looking
data from apparently normal functioning children. Besides Hamel et al.
(2000), child and adolescent reference samples have been collected by other
examiners in the United States, France, Italy, Japan, and Portugal (Erdberg &
Shaffer, 1999; Erdberg, 2005). Although these samples vary in age, they also
show unexpected variability in a number of scores, particularly Dd (small
or unusual locations), Lambda (proportion of responses determined just by
form), and form quality scores. These scores differ notably from sample to
sample. It is unclear if these differences reflect genuine cultural differences
in personality and/or in childrearing practices or if they are artifacts due to
variability in the way the protocols were administered, inquired, or scored.
However, the composite of data suggest that the adjustments offered above
for adults should be made for children too.
In addition, clinicians working with children should consider developmental trends. Wenar and Curtis (1991) illustrated these trends for Exners
(2001) child reference data across the ages from 5 to 16. Although limited,
the available international data suggest similar developmental trends are
present, including age-based increases in complexity markers like DQ+,
Blends, and Zf, as well as increases in M and P. In addition, as children age
there is a decrease in WSum6 and to a lesser extent in DQv. Unlike Exners
CS reference samples, however, the alternative reference samples for children
generally show that as children get older there is a decrease in Lambda and
RT20256_C008.indd 321
9/7/2007 5:07:39 PM
3 or 4
.15.25
.45.60
.70.90
.80.90
1 or 2
.09.14
.65.70
.80.95
.85.95
2 or 3 of 10 people
1 of 10 people
2 or 3
H+1 = Non pure H
1
1 in 2 people
Between 3:2 and 1:1 ratio
3 or 4
H > Non pure H
2
1 per person
2:1 ratio
FC = or < CF+C
2.53.5
.45.55
1 or 2 of 10 people
3 or 4 of 10 people
68
FC > CF+C +1
4.5
.55.65
3 of 10 people
2 of 10 people
9
5 to 7 of 10 people
2 or 3 of 10 people
1 or 2 of 10 people
2 of 10 people
6 of 10 people
2 of 10 people
an increase in healthier form quality scores. The field would benefit from
additional carefully designed studies that examine developmental processes
as expressed on the Rorschach.
Although the research evidence reviewed in this section supports the validity of the Rorschach across ethnic groups in the United States and across
languages and cultures around the world, this does not mean that culture
and ethnicity are unimportant when using the Rorschach. To the contrary,
it is important for clinicians to recognize the ways in which culture and
acculturation influence the development, identity, and personality of any
particular individual. It is as important to take these issues into account when
interpreting the Rorschach as it is with any other personality test.
RT20256_C008.indd 322
9/7/2007 5:07:39 PM
Current Controversies
The Rorschach has been controversial almost since its publication. Historically, clinicians have found it useful for their applied work, while academic
psychologists have criticized its psychometric foundation and suggested
that clinical perceptions of its utility are likely the result of illusory biases.
An early and prominent critique by Jensen (1965) gives a flavor of the sharp
tone that has characterized some of the criticisms. Jensen asserted that the
Rorschach is a very poor test and has no practical worth for any of the
purposes for which it is recommended (p. 501) and scientific progress in
clinical psychology might well be measured by the speed and thoroughness
with which it gets over the Rorschach (p. 509). Although Exners (1974,
2003) work with the CS quelled many of these earlier criticisms, over the
past decade there has been a renewed and vigorous series of critiques led by
James Wood, Howard Garb, and Scott Lilienfeld, including arguments that
psychology departments and organizations should discontinue Rorschach
training and practice (see e.g., Garb, 1999; Grove, Barden, Garb, & Lilienfeld,
2002; Lilienfeld, Wood, & Garb, 2000). Counterarguments and rejoinders
also have been published and at least seven journals have published a special
series of articles concerning the Rorschach.10
The most thorough of these special series was an 11-article series published
in Psychological Assessment (Meyer, 1999b; 2001c). Authors participated in a
structured, sequential, evidence based debate that focused on the strengths
and limitations of using the Rorschach for applied purposes. The debate
took place over four iterations, with each containing contributions from
authors who tended to be either favorable or critical of the Rorschachs evidence base. At each step, authors read the articles that were prepared in the
previous iteration(s) to ensure the debate was focused and cumulative. As
noted earlier, Robert Rosenthal was commissioned for this special series to
undertake an independent evidence based review of the research literature
through a comparative meta-analysis of Rorschach and MMPI-2 validity.
In addition, the final summary paper in the series was written by authors
with different views on the Rorschachs merits (Meyer & Archer, 2001). They
attempted to synthesize what was known, what had been learned, and what
issues still needed to be addressed in future research. We strongly encourage
any student or psychologist interested in gaining a full appreciation for the
evidence and issues associated with the applied use of the Rorschach to read
the full series of articles (Dawes, 1999; Garb, Wood, Nezworski, Grove, &
Stejskal, 2001; Hiller et al., 1999; Hunsley & Bailey, 1999, 2001; Meyer, 1999a,
2001b; Meyer & Archer, 2001; Rosenthal et al., 2001; Stricker & Gold, 1999;
Viglione, 1999; Viglione & Hilsenroth, 2001; Weiner, 2001).
More recently, the Board of Trustees for the Society for Personality Assessment (2005) addressed the debate about the Rorschach. Drawing on
RT20256_C008.indd 323
9/7/2007 5:07:39 PM
the recent literature, their official statement concluded that the Rorschach
produces evidence of reliability and validity that is similar to the evidence
obtained for other personality tests. Given this, they concluded that its responsible use in applied practice was justified.
Nonetheless, as we indicated in previous sections, there are still unresolved
issues associated with the Rorschachs evidence base and applied use. Some
of the most important issues concern recently recognized variability in the
way the CS can be administered and scored when examiners are trying to
follow Exners (2003) current guidelines, the related need to treat normative
reference values more tentatively, the impact of response-complexity on the
scores obtained in a structural summary, and the need for more research
into the stability of scores over time.
Another issue that we have not previously discussed concerns the evidence
base for specific scores. The meta-analytic evidence provides a systematic
review for several individual variables in relation to particular criteria (e.g.,
the ROD and observed dependent behavior; the Prognostic Rating Scale and
outcome from treatment), but much of the systematically gathered literature
speaks to the global validity of the test, which is obtained by aggregating
evidence across a wide range of Rorschach scores and a wide range of criterion variables. It would be most helpful to have systematically organized
evidence concerning the construct validity of each score that is considered
interpretively important. Accomplishing this is a daunting task that initially
requires cataloging the scores and criterion variables that have been examined
in every study over time. Subsequently, researchers would have to reliably
evaluate the methodological quality of each article so greater weight could be
afforded to more sturdy findings. Finally, researchers would have to reliably
classify the extent to which every criterion variable provides an appropriate
match to the construct thought to be assessed by each Rorschach score so
that one could meaningfully examine convergent and discriminant validity.
Although conducting this kind of research would be highly desirable, we
also note how no cognitive or personality test in use today has this kind of
focused meta-analytic evidence attesting to the validity of each of its scales in
relation to specific and appropriate criterion variables. We say this not as an
excuse or a deterrent, but simply as an observation. Because of the criticisms
leveled against the Rorschach having this kind of organized meta-analytic
evidence is more urgent for it than for other tests.
Clinical Dilemma
Dr. A is a 30-year-old unmarried Asian man who has been in the United States
for 5 years and is employed as a university math professor. Two months before
being referred for psychological assessment, he was evaluated psychiatrically
RT20256_C008.indd 324
9/7/2007 5:07:39 PM
for the first time in his life and diagnosed with major depression, for which
he was receiving antidepressants by a psychiatrist and weekly cognitive-behavioral psychotherapy by an outpatient psychotherapist. His depression has
been present for 2 years, with symptoms of weakness, low energy, sadness,
hopelessness, and an inability to concentrate that fluctuated in severity. At
the time of assessment, he taught and conducted research for about 40 hours
per week and spent almost all of his remaining time in bed. He denied any
previous or current hypomanic symptoms, had normal thyroid functions, and
reported no other health problems. In his home country, his father had been
hospitalized for depression, his brother diagnosed with schizophrenia, and
his sister was reported to have problems but had not received psychiatric
care. His father was physically abusive to his mother, his siblings, and him.
Dr. A reported that his father hit him in the face or head on an almost weekly
basis while growing up. He is the only one in his family in the United States
and he has no history of intimate relationships, though sees several friends
for dinner approximately every other week.
Dr. As outpatient therapist requested the evaluation to assess the severity of
Dr. As depression and to understand his broader personality characteristics.
In particular, the therapist wondered about potential paranoid characteristics. Dr. A was primarily interested in whether he had qualities similar to his
father or brother and, if so, what he could do to prevent similar conditions
from becoming full blown in him. The assessment involved an interview,
several self-report inventories (including the MMPI-2, BDI, and a personality
disorder questionnaire), and the Rorschach.
Dr. A produced a very complex Rorschach protocol with 42 responses,
of which only 8 were determined by straightforward form features (i.e., the
percent of pure form responses [Form%] was .19 and the proportion of pure
form to non-pure form responses [Lambda] was .24). As a result, his protocol was an outlier relative to the CS norms. The complexity of his record
appeared to be a function of his intelligence, his desire to be thorough in the
assessment, and also some difficulty stepping back from the task with a consequent propensity to become overly engaged with the stimuli (particularly
to the last three brightly colored cards, to which he produced almost half
of his responses [20 of 42]). After adjusting for the length and complexity
of his protocol, Dr. A exhibited some notable features. First, his thought
processes were characterized by implausible and illogical relationships, with
the weighted sum of cognitive special scores (see Table 8.1) several standard
deviations above what is typically seen in nonpatient or even outpatient
samples. Importantly, however, this occurred in the context of perceptions
that had typical and conventional form features (XA%, which is the percent
of all responses with adequate form quality, was .79 and WDA%, which is
the percent of responses to the whole card or to common detail locations
RT20256_C008.indd 325
9/7/2007 5:07:40 PM
with adequate form quality, was .92). In addition, even though he would be
considered to have extensive assets for coping with life demands (M = 18,
Weighted Color = 14.5, Zf = 33, DQ+ = 22), he saw an unexpectedly large
number of inanimate objects in motion (m = 7), suggesting he was experiencing a considerable degree of uncontrollable environmental stress, internal
tension, and agitated cognitive activity. Finally, he had a marked propensity
to perceive objects engaged in aggressive activity (AG = 8) and to identify
percepts where objects were damaged, decaying, or dying (MOR = 10). This
combination of scores suggested he had an implicit depressive perceptual
filter in which he experienced himself as deficient, vulnerable, and incapable
of contending with a dangerous, menacing, and combative environment.
Although this chapter does not provide the actual inkblot images, we
include his responses from a number of the cards to give a flavor of the characteristics described above. As a general principle, response verbalizations
should be considered after examining the previously presented quantitative
data so as to minimize the prospect for erroneous speculations.
At the bottom of the second card, Dr. A saw, Blood. Yeah, I dont really
want to sayits dirty wordsbut it looks like an asshole with blood coming
out of it . . . spilling over, all over the place. A bit later using the entire card
he saw, the face of a human being . . . looks like its weeping. It may be partly
vomiting The eyes look like theyre teary this is what its vomiting. To
the third card Dr. A saw two people meeting and bowing to each other, but
theyre kind of hating each otherthis red thing signifies the hatred between
the two people. In his next response he saw two ugly waitressesactually
they look like birdswho are bringing some strange plate or dish I mean
gruesome stuff like snakes, spiders, something like that. On the next card he
saw a gruesome monster as tall as a towerits about to come and crush
me out. He looks very angry at me these look like his hands but also like a
weapon and its very, very dangerousthe whole posture makes me feel like
its angry. I dont see any specific maybe the only thing that makes me feel
that way is the hidden expressions. The final response to this card consisted
of a small animal which has been killed on a street by a carflattened
out sometimes you can see small animals dying on the road. On the fifth
card he returned to the same themes, seeing a butterfly which is kind of dyinginjured and dying and a witch with two horns trying to approach
me and catch me some massive, dark object. On the ninth card he saw a
knife thrust into a body and blood is coming out as a result, which was followed by the perception of two monsters who are maybe shaking hands,
and then a new response of three people sitting in a row controlling
from behind the red person controlling the green one and the green one
is controlling the yellow one. On the final card, Dr. A saw an abdomen of
organs which are not functioning because of the various poisons. The organs
RT20256_C008.indd 326
9/7/2007 5:07:40 PM
are poisoned, as you see from the colors weak and not functioning very
bad condition. In another response to the whole inkblot he saw an island as
you see it from the skies. Island where there is a military secret. So its very
secret. And they are hiding the ships and weapons in the very center of the
island. So they make use of the very complicated coastline. And they made
a lot of traps so that you cant very easily approach the center of the island
traps to capture the enemies. This response was followed by interior walls
of some organ, like stomach or heart these look like ulcers this portion
looks deteriorated, somehow damaged. Next he saw a flying monster which
is about to attackattack something with its chisel-like mouth. As his final
response to the task, Dr. A saw two people fighting with weapons they
dont have heads somehow.
Although this is incomplete information, the curious reader could stop
here and ponder several questions. To what extent do the scores and the
images or themes in his responses suggest that Dr. A is depressed? Dr. As
outpatient therapist was concerned about paranoid characteristics. Do the
data suggest that concerns in this regard are warranted? Also, do the results
suggest that Dr. A might have other personality characteristics or personality
struggles that were not part of the initial referral question but that will be
important to consider? Dr. A was concerned about the possibility that he was
like his brother who had a schizophrenic disorder. What features of the data
would be consistent with a psychotic disturbance? Alternatively, are there
features of the data that would contradict a disorder on the psychotic spectrum? These are important questions to address and how they are addressed
will have significant consequences for Dr A. Thus, although we focus in this
chapter on just the Rorschach data, in actual practice the assessment clinician
would need to carefully consider each question while taking into account the
full array of available information from testing and from history.
With respect to the Rorschach data, Dr. As vivid images provide idiographic insight into his particular way of experiencing the qualities suggested
by the relatively impersonal quantitative structural summary variables. We
learn and come to understand his deep fears, fragile vulnerabilities, and
powerful preoccupation with aggression and hostility. As suggested in his last
response, identification with aggression is likely to leave him feeling headless and out of control. Although generally it is not possible to determine
whether clients positively identify with aggressive images or fear them as
dangers emanating from the environment, the extensive morbid imagery
of damaged, decaying, dying, pierced, and poisoned objects all suggest the
latter (as did his denial of anger and aggressiveness on self-report inventories). Depression, at least for some people, can be understood as aggression
turned toward the self rather than directed outward at its intended target.
Given the pervasiveness of aggressive imagery in his Rorschach protocol,
RT20256_C008.indd 327
9/7/2007 5:07:40 PM
Dr. As therapist could pursue this hypothesis in her work with him after he
stabilized at a more functional level.
Paranoid themes were also evident in Dr. As responses (e.g., people bowing
in respect but internally hating each other, bird waitresses serving snakes or
spiders, creatures with weapons for appendages, hidden expressions, secretive
traps guarding weapons, external control by others). In combination with
the disrupted formal thought processes seen on his Rorschach and results
RT20256_C008.indd 328
9/7/2007 5:07:41 PM
from the other tests he completed, Dr. A was considered to be experiencing a severe agitated depressive episode with psychotic features. This was
considered a conservative diagnosis because psychological assessment provides a snapshot of current functioning so it was not possible to determine
whether a major depressive disorder was co-occurring with an independent
and longer standing delusional disorder. However, the latter seemed less
likely, given the pervasiveness of his affective turmoil and the fact that the
form quality of his perceptions remained healthy and conventional despite
such a lengthy and complex protocol. In feedback to Dr. A, his therapist,
and his psychiatrist, it was recommended that Dr. A begin antipsychotic
medication on at least a trial basis and that therapy be ego-supportive rather
than uncovering, with an emphasis on cognitive interventions to evaluate
suspicions and correct his propensity to misattribute aggressive intentions
onto others in the environment.
Chapter Summary
It is not possible to learn Rorschach administration, scoring, and interpretation from a chapter like this. Consequently, our goal was to provide readers
with an overview of the Rorschach as a task that aids in assessing personality. We described the instrument and the approaches that have been used
to develop test scores. We then focused on the psychometric evidence for
reliability, showing that its scores can be reliably assigned, are reasonably
stable over time, and can be reliably interpreted by different clinicians. We
also focused on evidence related to its validity and utility, showing that it is
a generally valid method of assessment that provides unique and meaningful
information for clinical practice. In the process, we pointed out the kinds of
information the test generally can and cannot provide and provided psychometrically based guidelines to aid with interpretation. Next, we reviewed
current evidence associated with its multicultural and cross-national use and
noted a need for tighter guidelines governing administration and scoring to
ensure consistency in the data that is collected across sites around the world.
Finally, we provided a case vignette that illustrated how a persons perceptions
could be meaningfully interpreted in idiographic clinical practice even in
the absence of the inkblot stimuli themselves.
Although additional research and refinement are needed on numerous
fronts, the systematically gathered data indicate there is solid evidence supporting the Rorschachs basic reliability and validity. Overall, we advocate for
an evidence-based, behavioral- representation approach to conceptualizing
the test that attempts to focus on concrete and experience near test-based
inferences at the expense of more elusive abstract ones. We hope readers will
pursue some of the additional readings we have suggested and other studies
RT20256_C008.indd 329
9/7/2007 5:07:41 PM
we have cited. Also, we urge readers to seek out high quality training from
qualified supervisors so they can experience the Rorschachs strengths and
limitations first hand. Doing so will provide important experiential data about
the tests utility that will help when considering the evidence presented here
and the recurrent controversy about this unique instrument.
We close with a final caution to keep in mind when considering some of
the controversy associated with the Rorschach. Consistent with evidence
based principles, we urge readers to attend to the systematically generated
evidence and to be wary of partial reviews or selective citations. On average,
personality and cognitive tests produce heteromethod validity coefficients
that are about equal to a correlation of .30 (Meyer et al., 2001). This means
that about half of the research literature will produce validity coefficients
that are lower than this and about half will produce coefficients that are
higher. Authors who selectively cite the literature or focus on just a subset
of individual studies can (inadvertently or intentionally) make the literature
seem more or less supportive than is actually warranted.
Notes
1. The authors would like to thank Joni L. Mihura and Aaron D. Upton for their helpful comments and suggestions.
2. Historically, the Rorschach was classified as a projective rather than objective test. However,
these archaic terms are global and misleading descriptors that should be avoided because they
do not adequately describe instruments or help our field develop a more advanced and differentiated understanding of personality assessment methods (see Meyer & Kurtz, 2006).
3. There are other inkblot stimuli that have been developed and researched over the years, including a complete system by Holtzman, a series by Behn-Eschenberg that was initially hoped to
parallel Rorschachs blots, a short 3-card series by Zulliger, an infrequently researched set by
Roemer, and the Somatic Inkblots, which are a set of stimuli that were deliberately created
to elicit responses containing somatic content or themes.
4. For ICC or kappa values, findings above .74 are considered excellent, above .59 are considered
good, and above .39 are considered fair (Cicchetti, 1994; Shrout & Fliess, 1979).
5. At the same time, data clearly show that Rorschach scales validly identify psychotic diagnoses
and validly measure psychotic symptoms (Lilienfeld, Wood, & Garb, 2000; Meyer & Archer,
2001; Perry, Minassian, Cadenhead, & Braff, 2003; Viglione, 1999, Viglione & Hilsenroth,
2001; Wood, Lilienfeld, Garb, & Nezworski, 2000). Unlike most other disorders, which are
heavily dependent on the patients self-reported symptoms, psychotic conditions are often
diagnosed based more on the patients observed behavior than on their specific reported
complaints.
6. At present, one or more national Rorschach societies exist in the following countries:
Argentina, Brazil, Canada, Cuba, Czech Republic, Finland, France, Israel, Italy, Japan, The
Netherlands, Peru, Portugal, South Africa, Spain, Sweden, Switzerland, Turkey, United States,
and Venezuela.
7. Fully structured interviews can be differentiated from semistructured interviews. To some
degree, semistructured interviews allow a clinicians inferences to influence the final scores
or determinations from the assessment. However, the inferences and determinations remain
fundamentally grounded in the clients self-reported characteristics. Fully structured interviews are wholly dependent on this source of information.
8. The Rorschachs first factor is a dimension of complexity. The first factor of a test indicates
the primary feature it measures. The Rorschachs first factor typically accounts for about 25%
of the total variance in Rorschach scores. For self-report scales like the MMPI-2 or MCMI,
RT20256_C008.indd 330
9/7/2007 5:07:41 PM
References
Acklin, M. W. (2000). Rorschach Interpretive Assistance Program: Version 4 for Windows [Software
Review]. Journal of Personality Assessment, 75, 519521
Acklin, M. W., McDowell, C. J., & Verschell, M. S. (2000). Interobserver agreement, intraobserver
reliability, and the Rorschach Comprehensive System. Journal of Personality Assessment, 74,
1547.
Allen, J., & Dana, R. H. (2004). Methodological issues in cross-cultural and multicultural Rorschach
research. Journal of Personality Assessment, 82, 189206.
Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). New York: Macmillan.
Arbisi, P. A., Ben-Porath, Y. S., & McNulty, J. (2002). A comparison of MMPI2 validity in African
American and Caucasian psychiatric inpatients. Psychological Assessment, 14, 315.
Aronow, E., Reznikoff, M. & Moreland, K. L. (1995). The Rorschach: Projective technique or psychometric test? Journal of Personality Assessment, 64, 213228.
Atkinson, L., Quarrington, B., Alp, I. E., & Cyr, J. J. (1986). Rorschach validity: An empirical approach
to the literature. Journal of Clinical Psychology, 42, 360362.
Balcetis, E., & Dunning, D. (2006). See what you want to see: Motivational influences on visual
perception. Journal of Personality and Social Psychology, 91, 612625.
Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Manual for the Beck Depression Inventory II. San
Antonio, TX: Psychological Corporation.
Bihlar, B., & Carlsson, A. M. (2001). Planned and actual goals in psychodynamic psychotherapies:
Do patients personality characteristics relate to agreement? Psychotherapy Research, 11,
383400.
Blatt, S. J., Brenneis, C. B., Schimek, J. G., & Glick, M. (1976). Normal development and psychopathological impairment of the concept of the object on the Rorschach. Journal of Abnormal
Psychology, 85(4), 364373.
Bornstein, R. F. (1996). Construct validity of the Rorschach Oral Dependency Scale: 1967-1995.
Psychological Assessment, 8, 200505.
Bornstein, R. F. (1998). Implicit and self-attributed dependency strivings: Differential relationships
to laboratory and field measures of help-seeking. Journal of Personality and Social Psychology, 75, 779787.
Bornstein, R. F. (1999). Criterion validity of objective and projective dependency tests: A meta-analytic
assessment of behavioral prediction. Psychological Assessment, 11, 4857.
Bornstein, R. F. (2001). Clinical utility of the Rorschach Inkblot Method: Reframing the debate.
Journal of Personality Assessment, 77, 3947.
Bornstein, R. F. (2002). A process dissociation approach to objective-projective test score interrelationships. Journal of Personality Assessment, 78, 4768.
Bornstein, R. F., & Masling, J. M. (Eds.) (2005). Scoring the Rorschach: Seven validated systems.
Mahwah, NJ: Erlbaum.
Bornstein, R. F., Hill, E. L., Robinson, K J., Calabreses, C., & Bowers, K. S. (1996). Internal reliability
of Rorschach Oral Dependency Scale scores. Educational and Psychological Measurement,
56, 130138.
Butcher, J. N., & Rouse, S. (1996). Clinical personality assessment. Annual Review of Psychology,
47, 87111.
Butcher, J. N., Dahlstrom,W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989). MMPI-2: Minnesota Multiphasic Personality Inventory-2: Manual for administration and scoring. Minneapolis:
University of Minnesota Press.
RT20256_C008.indd 331
9/7/2007 5:07:41 PM
RT20256_C008.indd 332
9/7/2007 5:07:42 PM
RT20256_C008.indd 333
9/7/2007 5:07:42 PM
RT20256_C008.indd 334
9/7/2007 5:07:42 PM
RT20256_C008.indd 335
9/7/2007 5:07:42 PM
RT20256_C008.indd 336
9/7/2007 5:07:43 PM