Data Analysis and Decision Making PDF
Data Analysis and Decision Making PDF
Alois Geyer
i
6 Describing relationships 51
6.1 Covariance and correlation . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2 Simple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.3 Regression coefficients and significance tests . . . . . . . . . . . . . . 58
6.4 Goodness of fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.5 Estimating the CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.6 Multiple regression analysis . . . . . . . . . . . . . . . . . . . . . . . 63
7 Decision analysis 68
7.1 Elements of a decision model . . . . . . . . . . . . . . . . . . . . . . 69
7.2 Decisions under uncertainty . . . . . . . . . . . . . . . . . . . . . . . 71
7.3 Decisions under risk . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.4 Decision trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
8 References 79
9 Exercises 80
10 Cases 83
10.1 Olson Diversified Marketing Services – Analysis of Receivables . . . 83
10.2 Production System Inc. – Development of a Salary Model . . . . . . 86
10.3 American Motors – Predicting Fuel Economy and Price . . . . . . . 89
10.4 ACME Marketing Strategy – A Multistage Decision Problem . . . . 92
ii
1
1 Introduction
The term statistics often refers to quantitative information about particular sub-
jects or objects (e.g. unemployment rate, income distribution, . . . ). In this text the
term statistics is understood to deal with the collection, the description and the
analysis of data. The objective of the text is to explain the basics of descriptive
and analytical statistics.
The purpose of descriptive statistics is to describe observed data using graphics,
tables and indicators (mainly averages). It is frequently necessary to prepare or
transform the raw data before it can be analyzed. The purpose of analytical statistics
is to draw conclusions about the population on the basis of the sample. This
is mainly done using statistical estimation procedures and hypothesis tests. The
population consists of all those elements (e.g. people, companies, . . . ) which share
a feature of interest (e.g. income, age, height, stock price, . . . ). A sample from the
population is drawn if the observation of all elements is impossible or too expensive.
The sample is used to draw conclusions about the properties of that feature in the
population. Such conclusions may be used to prepare and support decisions.
Excel contains a number of statistical functions and analysis tools. This text
includes short descriptions of selected Excel-functions1 .
The menu ’Tools/Data Analysis’2 contains the item ’Descriptive Statistics’3 .
Upon activating ’Summary Statistics’4 a number of important sample statis-
tics are computed. All results can be obtained using individual functions,
too.
If the entry ’Data Analysis’ is not available, use the add-in manager (available
under ’Tools’) to activate ’Data Analysis’.
Many examples in this text are taken from the book ”Managerial Statistics” by
Albright, Winston and Zappe (AWZ) (www.cengage.com). The title of the third
edition is ”Data Analysis and Decision Making”. This book can be recommended as
a source of reference and for further study. It covers the main areas of (introductory)
statistics, it includes a large variety of (practically relevant) examples and cases, and
is strongly tied to using Excel.
1
Descriptions of the functions are provided in English. Function names will be specified in
English and German.
2
’Extras/Analyse-Funktionen’
3
’Populationskenngrößen’
4
’Statistische Kenngrößen’
2.1 Types of data 2
Mean 52,263
Standard Error 2,098
Median 50,800
Mode 62,000
Standard Deviation 11,493
Sample Variance 132,081,023
Kurtosis 3.56
Skewness 0.64
Range 50,400
Minimum 31,000
Maximum 81,400
Sum 1,567,900
Number of Observations 30
A sample usually consists of variables (e.g. age, gender, state, children, salary, opin-
ion) and observations (the record for each person asked). Samples can be categorized
either as cross-sectional data or time series data. Cross-sectional data is collected
at a particular point of time for a set of units (e.g. people, companies, countries,
etc.). Time series data is collected at different points in time (in chronological order)
as, for instance, monthly sales of one or several products.
Important categories of variables are numerical and categorical. Numerical (car-
dinal or metric) data such as age and salary can be subject to arithmetics. Numerical
variables can be subdivided into two types – discrete and continuous. Discrete
data (e.g. the number of children in a household) arises from counts whereas con-
tinuous data arises from continuous measurements (e.g. salary, temperature).
It does not make sense to do arithmetics on categorical variables such as gender,
state and opinion. The opinion variable is expressed numerically on a so-called
Likert scale. The numbers 1–5 are only codes for the categories ’strongly disagree’,
’disagree’, ’neutral’, ’agree’, and ’strongly agree’. However, the data on opinion
5
Example 2.1 on page 29 in AWZ.
2.1 Types of data 3
implies a general ordering of categories that does not exist for the variables ’Gender’
and ’State’. Thus opinion is called an ordinal variable. If there is no natural
ordering variables are classified as nominal (e.g. gender or state). Both ordinal and
nominal variables are categorical.
Some categorical variables can be coded numerically (e.g. male=0, female=1). For
some types of analyses recoding may be very useful (e.g. the mean of 0-1 data on
gender is equal to the percentage of women in the sample).
A special type of data are returns which are mainly used in the context of financial
economics.6 There are several possibilities to compute returns from stock or bond
prices (or indices). Log returns are computed on the basis of changes in the
logarithm of prices or indices:
pt
log return: yt = ln pt − ln pt−1 = ln .
pt−1
’ln’ is the natural logarithm and pt is the price or the value of the index at time t.
This definition corresponds to continuous compounding.
Simple returns are computed on the basis of relative price changes:
pt − pt−1 pt
simple return: rt = = − 1.
pt−1 pt−1
If this return is used to obtain the value of the investment after one year – but
interest accrues twice a year – we obtain
Using six steps within a year, we obtain r6∗ =0.048989. In general, the implied simple
return for compounding m times within a year is given by
" 1/m #
∗ p1
rm =m −1 .
p0
Table 1 shows the results of computing log and simple returns using one year from
the sample7 . The log return from December 1994 to January 1995 is computed from
m
X
rpt = wi · rit
i=1
where wi is the weight of asset i in the portfolio. For log returns this relation only
holds approximately:
m
X
ypt ≈ wi · yit .
i=1
7
Returns will be expressed in percentage terms. Therefore some statistics based on returns will
be interpreted as percentage or percentage points. However, the percentage sign will typically be
omitted in the rest of the text.
2.2 Measures of location – mean, median and mode 5
n
1X
arithmetic mean: ȳ = yt .
n
t=1
The mean is only meaningful for numerical data. In example 1 the average salary ȳ
equals $52,263.
The median is the value in the middle of a sorted sequence of data.9 Therefore 50%
of the cases are less than (or greater than) the median. The median can be used for
8
MITTELWERT(data range)
9
If there is an even number of cases the median is the mean of the two values in the middle of
the sequence.
2.2 Measures of location – mean, median and mode 6
numerical or ordinal data. The median is not affected by extreme values (outliers)
in the data. For instance, the sequence 1, 3, 5, 9, 11 has the same median as –11, 3,
5, 9, 11. The means of these two samples differ strongly, however.
In example 1 the median is $50,800. Half of the respondents earn more than this
number, and the other half earns less than that. The mean and the median salaries
are very similar in this example. Therefore we conclude that salaries are distributed
symmetrically around the center of the data. Since the median is slightly less than
the mean we conclude, however, that a few salaries are relatively high.
The mode is the most frequent value in a sample. Similar to the median, the mode
is not affected by extreme values. It can be interpreted as a ’typical’ salary under
’normal’ conditions.
The mode is typically applied to recoded nominal data or discrete data. For example,
if each state is coded using a different number, the mode identifies the most frequent
state. If the variable is continuous (e.g. temperature) the mode may not be defined.
In very small samples or when the data is measured very precisely it may be that no
value occurs more than once10 . Such is the case with salaries in the present example.
This can happen because the sample is too small or the accuracy of coding is too
high. This problem may be overcome by computing the mode of rounded values.
The mode of rounded salaries equals $62,000.
The mode can be computed using the function MODE(data range)11 . The
function returns #NV if the data range contains not a single number, that
appears more than once. This can be avoided by using rounded values.
Example 212 : Consider the data on sheet ’Shoes’ – the shoe sizes pur-
chased at a shoe store. We seek to find the best-selling shoe size at this
store. Shoe sizes come in discrete increments, rather than a continuum.
Therefore it makes sense to find the mode, the size that is requested
most often. In this example it turns out that the best-selling shoe size
is 11.
2008. It has received even more attention after the student Thomas Hern-
don had found several mistakes13 in the Excel sheet used by Reinhart
and Rogoff (RR). This example only focuses on one particular aspect:
the potentially misleading results of computing ”means of means”. RR
have excluded some data for some countries in some years without justi-
fication. The impact of this exclusion is ignored for the present purpose
and the reduced dataset is used. The frequently mentioned ”Excel cod-
ing error” in the media refers to the use of a wrong cell range in their
spreadsheet, whereby five countries have been excluded. This mistake is
not evaluated here.
RR have tried to analyze the relation between national debt levels and
GDP growth rates. For that purpose they have classified debt (the ratio
of public debt to GDP) into four categories. They first compute the aver-
age GDP growth rates for each country in each category. Subsequently,
they compute the growth rate for each category by averaging averages
across countries. Thereby, they identify a sharp drop in growth rates
to 0.3% for debt ratios above 90% compared to 3-4% growth rates for
lower debt ratios (see the copy of Figure 2 from RR’s study in Figure 3).
This piece of evidence had a key impact on the policy recommendation
derived from their study.
Figure 2: Key results as reported in the study by Reinhart and Rogoff (2010).
We now consider simple and log returns of the DAX in more detail.14 The arithmetic
mean of DAX log returns using the entire sample is 0.56. This implies that an
investment in the DAX yields – on average – a monthly interest rate of slightly more
than one half percent. The mean using all simple returns in the sample is 0.73,
which implies a much higher average interest rate.
We use the twelve log and simple returns from Table 1 to analyze this discrepancy
more thoroughly. The mean log return is
12
1 X
yt = 0.563228.
12
t=1
12
1 X
rt = 0.648957.
12
t=1
This discrepancy not only holds in the present example but holds in general.
14
The rest of this subsection is only relevant for banking, finance or similar courses.
2.2 Measures of location – mean, median and mode 9
Which average is correct? The mean return over a period should reflect the actual
growth rate of the stock or index. According to financial calculus the internal rate
of return – assuming discrete compounding – can be computed from
1/t
∗ pt
i = − 1.
p0
i∗ = (ln pt − ln p0 )/t.
Over the twelve-month period in this example the rate of return is either given by
1/12
2253.88
− 1 = 0.564817
2106.58
ln 2253.88 − ln 2106.58
= 0.563228
12
if continuous compounding is assumed. The (minor) difference between these two
values is due to the different assumptions about compounding.
This shows that the average of log returns correctly reflects the change in value,
whereas the average of simple returns systematically overstates the actual change in
value. We conclude that the arithmetic mean of log returns is an unbiased measure
of the average, whereas the arithmetic mean r̄ of simple returns (obtained from the
same price series) is biased upwards. To obtain a correct measure for the mean of
simple returns one needs to calculate the geometric mean:
12
!1/12
Y
(1 + rt ) − 1 = 0.564817
t=1
As it turns out the mean, median and mode of both suppliers are identical to 2.5
cm. Based on these measures, the two suppliers are equally good and right on
the mark. Thus we require an additional measure for reliability or variability that
allows Otis to distinguish among the suppliers. A look at the data shows that the
variability of diameters from supplier 2 around the 2.5cm mean is greater than that
of supplier 1. This visual impression can be expressed in statistical terms using
measures of dispersion (around the mean).
The mean (or other measures of location) is insufficient to describe the sample,
since it must be taken into account, that individual observations may deviate more
or less strongly from the mean. The degree of dispersion can be measured with the
standard deviation s . The standard deviation is based on the variance s2 which
is computed as follows:
n
1 X
variance: s2 = (yt − ȳ)2 .
n−1
t=1
The essential feature of this formula is the focus on deviations from the mean. Taking
squares avoids that positive and negative deviations from the mean cancel out (the
sum or average of deviations from the mean is always zero!).
The standard deviation is a measure for the (average) dispersion around the mean.
The advantage of using the standard deviation rather than the variance is the fol-
lowing: s has the same units of measurement as yt and can therefore be more easily
15
GEOMITTEL(Datenbereich)
16
Example 3.3 on page 78 in AWZ.
2.3 Measures of dispersion 11
Variance and standard deviation can be computed using the functions VAR(data
range) and STDEV(data range)17 .
Table 2 shows the computation of variance and standard deviation using data from
supplier 2. The variance is given by 0.0075. This number cannot be easily inter-
preted 2
√ since it is measured in squared units of y (cm ). The standard deviation
s= 0.0075=0.0866 can be interpreted as the average dispersion of yt around its
mean measured in cm. Note however, that this is not a simple average. Because
of the square in the definition of the variance, large deviations from the mean are
weighed more strongly than small deviations.
The coefficient of variation g=s/ȳ – the ratio of standard deviation and mean – is
a standardized measure of dispersion. It is used to compare different samples. The
coefficient of variation is frequently interpreted as a percentage. For the variable
’salary’ in example 1 g=11,493/52,263=0.22: on average, salaries deviate from the
mean by 22%.
To obtain a complete picture of the dispersion of the data it is useful to compute
the minimum, the maximum and the range – the difference between minimum and
maximum. The range for supplier 2 is given by 0.25 which is much larger than the
17
VARIANZ(Datenbereich); STABW(Datenbereich)
2.4 Describing the distribution of data 12
0.1 range of supplier 1. The range, the minimum and maximum again show that
the deliveries of supplier 2 are less reliable.
2.4.1 Histogram
The menu ’Tools/Data analysis’18 contains the item Histogram19 . The in-
tervals are automatically selected if the field ’Bin Range’20 is left empty.
Note that the function computes absolute rather than relative frequen-
cies! Absolute frequencies can also be computed using the function FRE-
QUENCY(data array;bins array)21 .
Example 6: The histogram in Figure 3 shows that the range from -2.5
to 0.0 contains 18.3% and the interval [2.5,5.0] contains 19.1% of monthly
returns. 11.5% of the returns are less than –5.0. 39.8% of all returns
are negative. This percentage is obtained by summing up the relative
frequencies in all intervals from –25.0 to 0.0.
18
’Extras/Analyse-Funktionen’
19
Histogramm
20
’Klassenbereich’
21
HÄUFIGKEIT(Datenbereich;Klassenbereich)
2.4 Describing the distribution of data 13
20.0%
empirical normal
15.0%
10.0%
5.0%
0.0%
and greater
-22.5
-17.5
-12.5
-7.5
-2.5
-25
-20
-15
-10
-5
2.5
7.5
12.5
17.5
22.5
0
10
15
20
25
2.4.2 Skewness and kurtosis
n
1 X (yt − ȳ)3
skewness:
n s3
t=1
n
1 X (yt − ȳ)4
kurtosis: .
n s4
t=1
Example 823 : The sheet ’arrival’ lists the time between customer ar-
rivals – called interarrival times – for all customers in a bank on a given
day. The skewness of interarrival times is given by 2.2. This indicates
a distribution which is positively skewed, or skewed to the right. The
skewed distribution can also be seen from a histogram of the data. Most
interarrival times are in the range from 2 to 10 minutes but some are
considerably larger. The median (2.8) is not affected by extremely large
values. Consequently, it is lower than the mean (4.2).
The distribution of many data sets can be described by the following ”rules of
thumb”.
Example 9: Applying the rules of thumb to the DAX returns (see Fig-
ure 4) shows that only the second rule seems to work. The empirical
(relative) frequencies and the probabilities based on the normal distri-
bution are very close. The discrepancies observed for the first and third
rule may be explained by the leptokurtosis of returns.
Example 11: Consider the variable ’Salary’ from example 1 again. The
empirical 25%-quantile of salaries is given by $44,675. In other words,
25% of the respondents earn less than $44,675. The 75%-quantile is
$59,675, so 25% of the respondents earn more than $59,675.
3.1 Random variables and probability 17
P[yt =i]. A conditional viewpoint does not appear to be necessary in this or simi-
lar cases. An empirical analysis may be used to find out, whether conditional and
unconditional probabilities differ.
The relation between unconditional and conditional probability is used to define
independence. The two random variables Y and X are said to be independent if
P[Y |X]=P[Y ].
n
X
expected value: µ = E[Y ] = pi · yi ,
i=1
n
X
2 2
variance: σ = var[Y ] = E[(Y − µ) ] = pi · (yi − µ)2 .
i=1
As another example we consider two investments where profits are assumed to de-
pend on the so-called ’state of the world’ (or economy). For each of the possible
states (’bad’, ’medium’ and ’good’) a probability and a profit/loss can be specified:
investment 1 investment 2
squared squared
state of deviation deviation deviation deviation
'the world' pi profit/loss from µ from µ profit/loss from µ from µ
bad 0.2 -180 -209 43681 -10 -25.5 650.25
medium 0.5 10 -19 361 5 -10.5 110.25
good 0.3 200 171 29241 50 34.5 1190.25
exp.value µ 29 15.5
variance σ² 17689.0 542.3
std.dev. 133.0 23.3
The variance25 is based on the squared deviations from the expected value:
σ12 = (−180 − 29)2 · 0.2 + (10 − 29)2 · 0.5 + (200 − 29)2 · 0.3 = 17689.
n
X
covariance: cov[Y, X] = E[(Y − µY ) · (X − µX )] = pi · (yi − µY ) · (xi − µX ),
i=1
cov[Y, X]
correlation: corr[Y, X] = .
σY σX
The correlation is bounded between –1 and +1. Mean and (co)variance are also
called first and second moments of random variables.
Consider throwing a pair of dice. There are 36 possible realizations which are all
equally likely: [y1 =1,x1 =1], [y1 =1,x2 =2],. . . , [y6 =6,x6 =6]. As expected, the covari-
ance between the resulting numbers is zero (pi is a constant equal to 1/36):
1
[(1 − 3.5)(1 − 3.5) + (1 − 3.5)(2 − 3.5) + · · · + (6 − 3.5)(6 − 3.5)] = 0.
36
If a pair of dice is thrown very often the empirical covariance (or correlation) between
the observed pairs of numbers should be close to zero.
If two random variables are normally distributed and their covariance is zero, the
two variables are said to be independent. For general distributions a covariance of
zero merely implies that the variables are uncorrelated. It is possible, however, that
(nonlinear) dependence prevails between the two variables.
The concept of conditional probability extends to the definition of conditional expec-
tation and (co)variance, by using conditional (rather than unconditional) probabili-
ties in the definitions above. For instance, if the conditional expected value E[Y |X]
is assumed to be a linear function of X it can be shown that E[Y |X] is given by:
25
√ Note that the variance is measured in units of squared profits. The standard deviation
17689=133 is measured in original (monetary) units.
3.4 Properties of the sum of random variables 20
cov[Y, X]
conditional expectation: E[Y |X] = E[Y ] + · (X − E[X]).
var[X]
This shows that a conditional viewpoint is necessary when the covariance between
Y and X differs from zero. In a regression analysis (see section 6) a sample is used
to determine if there is a difference between the conditional and the unconditional
expected value, and if it is necessary to take more than one conditions into account.
The expected value of the sum of n random variables is Y1 , . . . , Yn the sum of their
expectations:
The expected value of the sum of n random variables with identical mean µ equals
n · µ:
The variance of the sum of two uncorrelated random variables X and Y is the sum
of their variances:
The variance of the sum of n uncorrelated random variables is the sum of their
variances:
3.4 Properties of the sum of random variables 21
The variance of the sum of n uncorrelated random variables with identical variance
σ 2 is given by n · σ 2 :
n
X n X
X
var[Y1 + Y2 + · · · + Yn−1 + Yn ] = var[Yi ] + cov[Yi , Yj ].
i=1 i=1 i6=j
As an example we assume that both investments mentioned above are realized and
we consider the sum of profit/loss in each state of the world. The covariance between
the two investments is given by
(–180–29)·(–10–15.5)·0.2+(10–29)·(5–15.5)·0.5+(200–29)·(40–15.5)·0.3=2935.5.
Since the covariance is not zero, the sum of the variances of the two investments is
not equal to the variance of the sums as shown in the following table:
computing
covariance properties of the sum
&correlation of both investments
product of squared
state of deviations profit/loss deviation
'the world' pi from µ inv1+inv2 from µ
bad 0.2 5329.5 -190 54990.25
medium 0.5 199.5 15 870.25
good 0.3 5899.5 250 42230.25
µ 44.5 24102 <=variance of the sum
covariance 2935.5 18231 <=sum of the variances
correlation 0.948 24102 <=sum of the variances+2xcovariance
If we deal with a weighted sum we have to make use of the following fundamental
property:
4.1 The normal distribution 22
For any constant a (not a random variable) and random variables W, X, Y, Z the
following relations hold:
−(y − µ)2
1
f (y) = √ exp − ∞ ≤ y ≤ ∞.
σ 2π 2σ 2
For a particular range of values – e.g. between y1 and y2 – the area underneath the
density equals the probability to observe values within that range (see Figure 5).
Usually the normal distribution of a random variable Y is denoted by Y ∼N(µ, σ 2 ).
Ψα on the y-axis is the α-quantile under normality. It has the following property:
P[y ≤ Ψα ] = α,
4.2 How likely is a value less than or equal to y ∗ ? 23
where P[ ] is the probability for the event in brackets. The area to the left of Ψα
equals α – the probability to observe values less than Ψα . This implies that Ψα is
exceeded with probability 1−α.
Assuming a normal distribution for a variable y having mean ȳ and standard de-
viation s allows to answer some interesting questions, as shown in the following
subsections.
decision depends partly on the results of the exam. The applicants scored
have been examined closely. They are normally distributed with a mean
of 525 and standard deviation of 55 (see sheet ’personnel’).
The hiring policy occurs in two phases: The first phase separates all
applicants into three categories: automatic accepts (exam score≥600),
automatic rejects (exam score≤425), and ”maybes”. The second phase
takes all the ”maybes” and uses their previous job experience, special
talents and other factors as hiring criteria.
ZTel’s personnel manager wants to calculate the percentage of applicants
who are automatic accepts and rejects, given the current policy.
Now the manager takes a reversed viewpoint. Rather than computing probabilities
he wants to pre-specify a probability and work out the corresponding threshold score
that is exceeded with that probability. These questions can be answered using the
α-quantile of a normal variable.
The 10%-quantile is given by 455. This score is exceed with 90% probability. 10% of
the scores are below this score. To achieve a 15% acceptance rate we need to know
the 85%-quantile. This quantile is equal to 582 points and is exceeded in 15% of all
cases.
P[z ≤ zα ] = α,
The standard normal quantiles can be used to compute quantiles and intervals for a
normal variable y having mean ȳ and variance s2 . The α-quantile of y is given by30
Ψα = ȳ + zα · s.
29
STANDNORMINV(Wahrscheinlichkeit α)
30
Ψα can be computed directly using the function NORMINV.
4.4 Which interval contains a pre-specified percentage of cases? 26
Example 15: The monthly DAX returns have mean ȳ=0.56 and stan-
dard deviation s=5.8. To get some idea about the magnitude of ex-
tremely negative returns one may want to compute the 1%-quantile. As-
suming that returns are normally distributed and using the 1%-quantile
of the standard normal distribution (−2.326) yields
Because of the symmetry of the standard normal (e.g. 1.96 for a 95%-interval) the
absolute value of the α/2-quantile is sufficient. The formula for computing the 95%-
interval for y∼N(ȳ, s2 ) is given by:
ȳ ± 1.96 · s,
4.4 Which interval contains a pre-specified percentage of cases? 27
ȳ ± |zα/2 | · s.
The quantiles of the standard normal distribution are the basis of the rules of thumb
mentioned in section 2.4.3:
Example 16: Consider the Dax returns again. Assuming a normal dis-
tribution we want to compute an interval for returns that contains 95%
of the data.
Under the normal assumption the mean and standard deviation of the
returns are sufficient to compute a 95% interval. Using ȳ=0.56 and s=5.8
95% of all returns can be found in the interval
The standard deviation of the entire duration is based on the variance of the sum
of all individual tasks:
This sum is only correct if the durations of the individual tasks are independent/uncorrelated
among each other. If this is not the case, the covariance among activities must be
taken into account as follows:
m
X m X
X
var[y1 + y2 + · · · + ym−1 + ym ] = s2t = s2i + cov[yi , yj ].
i=1 i=1 i6=j
The standard deviation of the total duration of the project st is the square root of
s2t (the variance of the sum). In other words, it is not appropriate to sum up the
standard deviations of individual tasks.
In practice, it may be questionable to describe the durations of individual activities
by a normal distribution. If only a small number of activities is considered, the
sum of durations cannot be assumed to be normal. Similarly, it may be difficult to
31
We consider (the sum of) activities on the so-called ”critical path”. Any delay in the completion
of such tasks leads to a delayed start of all subsequent activities, and leads to an increase in the
overall duration of the project.
4.5 Estimating the duration of a project 29
provide or estimate the means and standard deviations of activities. It may be easier
for the management to summarize activity durations by specifying the minimum,
maximum and most likely (i.e. mode) duration times. In project management, the
beta distribution is widely used as an alternative to the normal, whereby the
following approximations32 are typically used:
max − min
standard deviation = .
6
The two parameters of the beta distribution α and β are related to mean and variance
as follows:
(mean − min) (mean − min) · (max − mean)
α= ·
(max − min) variance
(mean − min)
β =α· .
(max − min)
Example 17: For the data on the sheet ’project duration’ we obtain
mean ȳt =55 and standard deviation st =5.2. Assuming uncorrelated ac-
tivities and using a normal distribution we find that the probability to fin-
ish the project in less than 60 weeks is 83.3%. Using the beta distribution
the corresponding probability is 81.5%. If the correlations/covariances
among activities are taken into account, the standard deviation of the
sum is 8.6 weeks, and the (normal) probability drops to ≈72%.
32
These
√ approximations
√ can be derived by choosing the parameters of the beta distribution to be
α=3+ 2 and β=3− 2.
33
BETAVERT(y ∗ ; α; β; min; max)
4.6 The lognormal distribution 30
−(ln x − µ)2
1
f (x) = √ exp x ≥ 0,
xσ 2π 2σ 2
where µ and σ 2 are mean and variance of ln X, respectively. Mean and variance of
X are given by
As such, the lognormal assumption is suitable for phenomena which are usually
positive (e.g. time intervals or amounts).
−(ln x − µ)2
1
f (x) = √ exp x ≥ 0,
xσ 2π 2σ 2
34
LOGNORMVERT(x∗ ,ȳ,sy )
35
Only relevant for banking, finance or similar courses.
4.8 Value-at-Risk 31
where µ and σ 2 are mean and variance of ln X, respectively. Mean and variance of
X are given by
We now consider the log return in t and treat it as a random variable (denoted by Yt ;
yt is the corresponding sample value or realization). µ and σ 2 are mean and variance
of the underlying population of log returns. Assuming that log returns are normal
random variables with Yt ∼N(µ, σ 2 ) implies that (1+Rt )=exp{Yt }, the simple, gross
returns are lognormal random variables with
If the simple, gross return is lognormal (1+Rt )∼LN(1+m, v), mean and variance of
the corresponding log return are given by
v
E[Yt ] = ln(1 + m) − 0.5σY2 σY2 = var[Yt ] = ln 1 + . (1)
(1 + m)2
What are the implications for the corresponding prices? Normality of Yt implies
that prices given by Pt =exp{Yt }Pt−1 or Pt =(1+Rt )Pt−1 are lognormal (for given,
non-random Pt−1 ). Thus, prices can never become negative if log returns are normal.
Another attractive feature of normal log returns is their behavior under temporal
aggregation. If single-period log returns are normally distributed Yt ∼N(µ, σ 2 ), the
multi-period log returns are also normal with Yt (h)∼N(hµ, hσ 2 ). This property is
called stability (under addition). It does not hold for simple returns.
Many financial theories and models assume that simple returns are normal. There
are a number of conceptual difficulties associated with this assumption. First, sim-
ple returns have a lower bound of −1, whereas the normal distribution extends to
−∞. Second, multi-period returns are not normal even if single-period (simple) re-
turns are normal. Third, a normal distribution for simple returns implies a normal
distribution for prices, since Pt =(1+Rt )Pt−1 . Thus, a non-zero probability may be
assigned to negative prices which is, in general, not acceptable. These drawbacks
can be overcome by using log returns rather than simple returns. However, empirical
properties usually indicate strong deviations from normality for both simple and log
returns.
4.8 Value-at-Risk
The calculation of Value-at-Risk (VaR) is an important application of the quantiles
of a standard normal distribution. Value-at-Risk is the expected loss in the market
4.9 Value-at-Risk 32
value pt of a risky asset or portfolio that will be exceeded with a given probability
α. Usually α is chosen to be very small (e.g. α=0.01 or α=0.05). Therefore the VaR
refers to a loss over a time-period, typically one day or one week. In statistical terms
VaR is the α-quantile of the distribution of the change in market value pt+1 −pt .
A simplified approach VaR computations is based on assuming that returns of the
asset or portfolio are normally distributed (a highly questionable assumption). It
only relies on the current market value pt , the standard normal quantile zα , and the
standard deviation of returns s. The mean of returns is assumed to be zero, which
makes the resulting measure of risk only relevant in the short term. The VaR is
given by36
VaR(α) = −pt · zα · s.
Example 18: The daily VaR of a DAX investment with a current market
value of 500 units can be derived from the monthly sample statistics.
Assuming 21 trading
√ days per month the daily standard deviation is
given by s=5.8/ 21=1.266.37 Using α=0.05 the 5%-VaR is given by
4.9 Value-at-Risk38
The calculation of Value-at-Risk (VaR) is an important application of the quantiles
of a standard normal distribution. Value-at-Risk is the expected loss in the market
value of a risky asset or portfolio that will be exceeded with a given probability α:
P[(pt+1 − pt ) ≤ −VaR] = α.
pt is the market value of the asset at time t. Since α is usually very small (e.g.
α=0.01 or α=0.05) the VaR refers to a loss over a time-period, typically one day. In
statistical terms VaR is the α-quantile of the distribution of the change in market
value pt+1 −pt .
36
Note that s in this formula has to be a decimal number.
37
Important note: When returns have been measured in percentage terms, as in the present case,
s has to be divided by 100!
38
Only relevant for banking, finance or similar courses.
4.9 Value-at-Risk 33
ȳ und s are mean and standard deviation of the returns of the asset, –1.645 is the
5%-quantile of the standard normal distribution, and exp{x}=2.718x . The term
0.5s2 in the exponent is due to the properties of the lognormal distribution.39 In
general
Example 19: The daily VaR of a DAX investment with a current market
value of 500 units can be derived from the monthly sample statistics. As-
suming 21 trading days per month the daily√ mean and standard deviation
are given by ȳ=0.56/21=0.027 and s=5.8/ 21=1.266. In computations
involving exp{} decimal numbers have to be used in the exponent! When
returns have been measured in percentage terms – as in the present case
– ȳ and s have to be divided by 100! Using α=0.05 the 5%-VaR is given
by
2. each trial has two possible outcomes. These are usually called success or
failure.
n y
f (y) = p (1 − p)(n−y) ,
y
where
n n!
= n! = 1 · 2 · · · n, 0! = 1
y y!(n − y)!
that (a) more than 205 passengers will show up, (b) more than 200 pas-
sengers will show up, (c) at least 195 seats will be filled, and (d) at least
190 seats will be filled. The first two of these are ”bad” events for the
airlines while the last two are ”good” events.
In order to answer the questions in this example we consider individual
customers. For each ticket sold we carry out a binomial experiment. We
recall the three necessary conditions for a binomial experiment:
Consider the following example. The probability for a stock price in-
crease (u) is 0.6 and 0.4 for a decrease (d). Assume that successive price
changes are independent. After three periods the following sequences are
possible: (u, u, u), (u, u, d), (u, d, u), (u, d, d), (d, u, u), (d, u, d), (d, d, u),
(d, d, d). The probability for three consecutive increases (u, u, u) is given
by (y=3, n=3, p=0.6):
3
0.63 (1 − 0.6)0 = 0.216.
3
The probability for two increases in three periods – (u, u, d), (u, d, u) or
(d, u, u) – is given by (y=2, n=3, p=0.6):
3
0.62 (1 − 0.6)1 = 0.432.
2
Note that the probability for one decrease in three periods is not equal
to (1−0.432) but 0.432, too! Considering decreases implies changing the
meaning of y. Now a decrease is treated as success with probability
p=0.4. Thus the probability for one decrease in three periods is given by
(y=1, n=3, p=0.4)
3
0.41 (1 − 0.4)2 = 0.432.
1
Now suppose that more than one event is considered – e.g. one or two
decreases. This requires to compute the probability of each event and to
sum these probabilities. The probability for one or no decrease in three
periods is given by
3 0 3 3
0.4 (1 − 0.4) + 0.41 (1 − 0.4)2 = 0.216 + 0.432 = 0.648.
0 1
This is equal to one minus the probability for two or three decreases in
three periods.
5.1 Samples and confidence intervals 37
Example 22: We consider the data from example 1 and focus on the
average salaries of respondents. The purpose of the analysis is three-
fold. First, we want to assess the effects of sampling errors. Second, we
ask whether the average of the sample is compatible with a population
mean of $47500 or strongly deviates from this reference. Third, the av-
erage salaries of females and males will be compared to see whether they
deviate significantly from each other.44
n
1X
ȳ = yt .
n
t=1
The means is said to be estimated from the sample, and it is a so-called estimate.
The estimate is a random variable – using a different sample results in a different
estimate ȳ. It should be distinguished from the population mean µ – which is also
called expected value.45 The symbols µ and σ 2 are used to denote the population
43
The population consists of all elements which have the feature of interest.
44
This rather loose terminology will subsequently be changed, whereby questions and answers
will be formulated in a statistically more precise way.
45
The contents of sections 5.1 and 5.3 is explained in terms of the mean of a random sample.
Similar considerations apply to other statistical measures.
5.1 Samples and confidence intervals 38
mean and variance. The expected value µ can be considered to be the limit of the
empirical mean, if the number of observations tends to infinity:
n
1X
µ = E[Y ] = lim yt .
n−→∞ n
t=1
Usually ȳ will differ from the true population value µ. However, it is possible to
compute a confidence interval that specifies a range which contains the unknown
parameter µ with a given probability α. When a confidence interval is derived one
has to take into account the sample dependence and randomness of ȳ. In other
words, the sample mean is a random variable and has a corresponding (probability)
distribution.
The distribution of possible estimates ȳ is called sampling distribution. For large
samples the central limit theorem states that the sample mean ȳ is normally
distributed with expected value µ and variance46 s2 /n: ȳ∼N(µ, s2 /n). The theo-
rem holds for arbitrary distributions of the population provided the sample is large
enough (n>30); if the population is normal it holds for any n.47
Using the properties of the normal distribution a confidence interval which contains
the true mean µ with (1−α) probability can be derived. More precisely, (1−α) per-
cent of all samples (randomly drawn from the same population) will contain µ. In
general, the (1−α) confidence interval of µ is given by
√
ȳ ± |zα/2 | · s/ n.
√
ȳ ± 1.96 · s/ n.
√
The function CONFIDENCE(α; s; n)48 computes the value |zα/2 | · s/ n.
√
52263 ± 1.96 · 11493/ 30 = 52263 ± 4113 = [48151, 56376].
Based on the sample, we conclude that the actual average µ can be found
in the interval [48151, 56376] with 95% probability. Note that this is not
an interval for the data, but an interval for the mean of the population.
Average salary
mean 52263
standard deviation 11493
number of observations 30
√ √
[−1.96 · s/ n, +1.96 · s/ n]
√
±|zα/2 | · s/ n.
√
s/ n is also called standard error (standard deviation of the estimation error).
This formula is valid if the population has infinitepsize. If the size of the population
√
is known to be N the standard error is given by (N −n)/(N −1)s/ n.
The boundaries of the interval can be used to make statements about the magnitude
of the absolute estimation error. Using α=0.05 the boundaries of the interval in this
example are given by
√
±1.96 · 11493/ 30 = ±4113.
In words: there is a 95% probability that the absolute estimation error for the
average salary in the population is less than $4113.
5.2 Sampling procedures 40
The confidence interval for the estimation error can be used as a starting point to
derive the required sample size.49 For that purpose it is necessary to fix an acceptable
magnitude of the (absolute) error; more specifically the absolute error which can be
exceeded with probability α. This value α corresponds to the boundaries of the
(1−α) confidence interval for the estimation error :
√
|zα/2 | · s/ n = α .
This expression can be rewritten to obtain a formula for the corresponding sample
size:
2
zα/2 · s
n= .
α
Suppose that a precision of α =$500 is required and α=0.05 is used. This means
that the (absolute) error in the estimation of the mean is accepted to be more than
$500 in five percent of the samples. In this case the required sample size is given by
2
1.96 · 11493
n= ≈ 2030.
500
Drawing a sample from a population can be done on the basis of several princi-
ples. We consider three possibilities: random, stratified and clustered sampling.
Random sampling – which has been assumed in previous sections of the text – col-
lects observations from the population (without replacement) according to a random
mechanism. Each element of the population has the same chance of entering the
sample. The objective of alternative sampling methods is to reduce the standard
49
Note: These considerations are based on the assumption, that the standard deviation of the
data s is known, before the sample has been drawn.
50
Bortz J. and Döring N. (1995): Forschungsmethoden und Evaluation, 2. Auflage, Springer,
p.390.
5.2 Sampling procedures 41
errors compared to random sampling and to obtain smaller confidence intervals. Al-
ternative methods are chosen because they can be more efficient or cheaper (e.g.
clustered sampling).
A random sample can be obtained by assigning a uniform random number to each
of the N elements of the population. The required sample size51 n determines the
percentage α=n/N . The sample is drawn by selecting all those elements whose
associated random number is less than α. The number of actually selected elements
will be close to the required n if N is large. Exactly n elements are obtained if the
selection is based on the α-quantile of the random numbers as shown on the sheet
’random sampling’.
Stratified sampling is based on separating the population into strata (or groups)
according to specific attributes. Typical attributes are age, gender, or geographical
criteria (e.g. regions). Random samples are drawn from each stratum. Stratified
sampling is used to ascertain that the representation of specific attributes in the
sample corresponds (or is similar) to the population. If the distribution of an at-
tribute in the population is known (e.g. the proportion of age groups or provinces in
the population), the sample can be defined accordingly (e.g. each age group appears
in the sample with about the same frequency as in the population).
In the present example stratified sampling can be based on the type of a thesis
(empirical, theoretical, etc.) or the field of study (law, economics, engineering, etc.).
Stratified sampling is particularly important in relatively small samples to avoid
that specific attributes (e.g. fields of study) do not appear at all, or are incorrectly
represented (too few or too many cases). The subject of the analysis (number of
pages) should be related to the stratification criterion (type of thesis).
The ratio of the number of observations nj in stratum j and the sample size n
defines weights wj =nj /n (j is one out of m strata; n is the sum of all nj ). If the
proportions of the attributes in the population are known (e.g. the percentage of
empirical theses in the population), the weights wj should be determined such that
the proportions of the sub-samples correspond exactly to the proportions of the
attributes in the population. If such information is not available and the sample is
large, the proportions in a random sample will approximate those in the population.
The (overall) mean of a stratified sample is the weighted average of the means of
each stratum ȳj :
m
X
ȳ = wj · ȳj .
j=1
This mean is equal to the mean obtained from all observations in the sample. If
51
As shown in section 5.1 n can be chosen on the basis of the required precision (or absolute
estimation error).
5.2 Sampling procedures 42
m
X
s2ȳ = wj2 · s2ȳj .
j=1
If the weights deviate from those in the population, the standard error cannot be
reduced, or can even increase, compared to random sampling. At the same time, the
mean ȳ will be biased and will deviate from the mean of the population and from
random sampling.
If the dispersion in each stratum is rather small (i.e. individual strata are rather
homogeneous), the standard error can be lower compared to a random sample. This
will be the case if the stratification criterion is correlated with the subject of the
analysis (e.g. if the distribution of the number of pages depends on the type of the
thesis). For example, to analyze the intensity of internet usage, strata could be
defined on the basis of age groups. If the dispersion in sub-samples is about the
same as in the overall sample, or the means in each stratum are rather similar, there
is no need for stratification (or, another attribute has to be considered).
In the present example on the sheet ’stratified sampling’ two strata based on the
type of a thesis are used. A sample of n1 =34 empirical and n2 =16 theoretical theses
is drawn from a population consisting of 136 and 64 theses, respectively; i.e. the
proportions in the sample correspond to those in the population. The means and
standard deviations of the two strata are given by ȳ1 =68, sȳ1 =2.8 and ȳ2 =123,
sȳ2 =12. Empirical theses have less pages and less dispersion than theoretical theses.
The stratified mean is given by 0.68·68+0.32·123≈86. Its standard error is given by
p
sȳ = 0.682 · 2.8 + 0.322 · 122 = 4.3.
computing the standard error, the number of theses supervised by each professor is
taken into account.
Clustered sampling can be more easily administered than other sampling procedures.
For example, the analysis of grades is based on only a few schools rather than
choosing students from many schools all over the country. The random element in
the sampling procedure is the choice of clusters. The procedure only requires a list
of all schools rather than a list of all students from the population. A list of all
students is only required for each school.
The ratio of the number of elements nj in each cluster (e.g. the number of theses
supervised by each professor) and the sample size n defines the weights wj =nj /n (j
is one of m clusters; n is the sum over all nj ). The coefficient of variation of n̄, the
mean over all nj , should be smaller than 0.2. The means ȳj of each cluster (e.g. of
each supervisor) are treated as the ”data”. The mean across all observations ȳ is
the weighted average of the cluster means ȳj :
m
X
ȳ = wj · ȳj .
j=1
m
X
s2ȳ = wj2 · (ȳj − ȳ)2 .
j=1
Since all observations of a cluster are sampled (which does not imply any estimation
error) the standard error only depends on the differences among clusters. There-
fore clusters should be rather similar whereas the dispersion within clusters can be
relatively large.
The computation of the standard error can be based on a more exact formula which
takes the ratio of selected clusters m and available clusters M into account (in the
present example 15/450):
m
X m m
s2ȳ = 1− · · wj2 · (ȳj − ȳ)2 .
M m−1
j=1
Figure 8 shows data and results for the present example. Compared to stratified
sampling the confidence interval can be substantially reduced.
5.3 Hypothesis tests 44
Figure 9: Acceptance region and critical values for H0 : µ=µ0 using α=5%.
..................
..... ....
.... ...
... ...
.... ...
..
. ...
.... ...
...
... ...
... ... ... ...
... ... ... ...
... ... ...
... ...
... ... ... ...
√ ...
...
.
..
.. .
... ...
...
√
µ0 <ȳ−1.96·s/ n ... .
..
.
..
...
.
... µ >ȳ+1.96·s/ n
... 0
... . .
... ...
............................................................. ... ... .............................................................
... ... .
... .
.
... ... ... ....
reject H0 ...
...
..
..
...
...
...
...
reject H0
...
...
..... .... ..
..................... acceptance region ...........................
.... ..... ... ...
... ..
....... ......
.....
.
...... .
..... .. .... ..........
...... .... ... ...........
............... .
. ... ...........
.. .
............................................. .... .... ...
..........................................
ȳ, µ0
√ √
ȳ−1.96·s/ n ȳ ȳ+1.96·s/ n
One possible decision rule is based on the confidence interval. For (1−α) percent of
all samples the population mean µ lies within the bounds of the confidence interval.
If the mean under the null hypothesis µ0 lies outside the confidence interval, the null
hypothesis is rejected (see Figure 9). In this case it is too unlikely that the sample
at hand comes from a population with mean µ0 . If µ0 lies outside the confidence
interval, the estimated mean ȳ is said to be significant (or significantly different
from µ0 ) at a significance level of α.
The test is based on critical values – these are the boundaries of the (1−α) confidence
interval – which are given by
√
ȳ ± |zα/2 | · s/ n,
where s ist the estimated standard deviation from the sample. Then µ0 is compared
to the critical values. Using α=0.05 the null hypothesis is rejected, if µ0 is less
√ √
than ȳ−1.96·s/ n or greater than ȳ+1.96·s/ n (see Figure 9). If µ0 lies in the
acceptance region, H0 is not rejected. In a two-sided test H0 is rejected, if µ0
is above or below the critical values. In one-sided tests53 , only one of the two
critical values is relevant.
In example 22, the objective is to find out whether the sample mean is
consistent with the target average $47500. For that purpose a two-sided
test is appropriate since the kind of deviations (ȳ is above or below µ0 )
is not relevant. The 95% confidence interval is given by [48151,56376].
Using a significance level of 5% the null hypothesis is rejected, since
53
Details on one-sided tests can be found in section 10.2.2 of AWZ, 3rd edition.
5.3 Hypothesis tests 46
the target average $47500 is outside the confidence interval. The data
does not support the assumption that the sample has been drawn from
a population with an expected value of $47500. In other words: the
sample supports the notion that the average salary of respondents differs
significantly from the target mean.
standard.
two-sided test lower upper test critical
µ0 α bound bound statistic value p-value
H0 47500 0.05 48151 56376 reject 2.270 1.960 reject 0.023 reject
47500 0.01 46859 57668 accept 2.270 2.576 accept 0.023 accept
Instead of determining the bounds of the confidence intervals and comparing the
critical values to µ0 , the standardized test statistic
ȳ − µ0
t= √
s/ n
can be used. In this formula the difference between ȳ and µ0 is treated relative to the
√
standard error s/ n. When the null hypothesis is true, there is a (1−α) probability
to find the standardized test statistic within ±|zα/2 | (in a two-sided test). The null
hypothesis is rejected when the difference between ȳ and µ0 is too high relative to
the standard error. A decision is based on comparing the absolute value of t to the
absolute value of the standard normal α/2-quantile (see Figure 10).
The decision rule in a two-sided test is: If |t| is greater (less) than |zα/2 | the null
hypothesis is rejected (accepted).
Using the data from example 22 the standardized test statistic is given
by
52263 − 47500
t= √ = 2.27.
11493/ 30
Figure 10: Acceptance region and critical values for H0 : µ=µ0 using a standardized
test statistic.
...................
..... ....
.... ...
... ...
...
. ...
.... ...
...
... ...
... ...
.... ... ...
... ....
... ... ... ...
... ... ... ...
... ... .
... ...
... ... ...
t < zα/2 ...
... .
.
..
..
.
...
...
.
...
...
...
t > z1−α/2
.
. ... ... ...
............................................................. ... .
... ............................................................
.... ... .... ....
... ..
. ...
reject H0 ...
... .
.
...
.
.
....
...
.
...
reject H0
...
...
........................ .
..........................
.
... . ....
... ...
acceptance region ... ..
... ..
.....
. ......
..... .....
.... .. .. .....
..... .. ... .........
.........
.
...... .... ... ........
........... ȳ − µ0
....................................... .......... .
.
...
.
...
...
..
..........................................
t= √
zα/2 0 z1−α/2 s/ n
H0 : µ0 = 47500; α = 0.05
ȳ = 52263, s = 11493
52263 − 47500
√ = 2.27
11493/ 30
5. compare the absolute value of the test statistic to the critical value and draw
a conclusion:
.....................
.... ....
.... ...
... ...
.... ...
... ...
.. ...
−1.96 .
... ...
...
+1.96
.... .... ... ....
... ... ... ...
... ... ... ...
... ... ...
... ...
... ... ... ...
..
t < −1.96 ...
... .
.
..
.
...
...
...
t > +1.96
...
.
. ... .
... ...
.............................................................. ... .
... .............................................................
... ... ... ...
... ... . ...
reject H0 ...
... .
.
.
.
..
.
...
....
.
...
reject H0 ...
...
........................ ..........................
.
.. . ... accept H0 ... ..
....
... ....
.....
.
... ..
......
......
p-value: this area×2
....
...
....
..
.......... . ....... ....
. .
.....
.
. . ... .
...... ... ..
.... .........................
.......
......... .... .. ... ... ..... ..................................................
............................................. ... ... ... ...
0 t=2.27
P-value
For a given value of the test statistic the chosen significance level α determines
whether the null hypothesis is accepted or rejected. Changing α may lead to a
change in the acceptance/rejection decision. The p-value (or prob-value) of a test
is the probability of observing values of the test statistic that are larger (in absolute
54
A type II error occurs, if a null hypothesis is not rejected, although it is false. This type of
error and the aspect of the power of a test are not covered in this text.
5.3 Hypothesis tests 49
terms) than the value of the test statistic at hand if the null hypothesis is true (see
Figure 11). The more the standardized test statistic differs from zero the smaller
the p-value. The p-value can also be viewed as that level of α for which there is
indifference between accepting or rejecting the null hypothesis. The significance level
α is the accepted probability to make a type I error. H0 is rejected, if the p-value is
less than the pre-specified significance level α.55
Decision rule: if the p-value is less (greater) than the pre-specified significance level
α the null hypothesis is rejected (accepted).
The value of the standardized test statistic based on the sample (ȳ=52263,
s=11493) is given by 2.27. The associated p-value is 0.023. Rejecting
the null hypothesis in this case implies a probability of 2.3% to com-
mit a type I error. Given a significance level of 5% this probability is
sufficiently small and H0 is rejected.
Frequently, there is an interest to test whether two means differ significantly from
each other. Examples are differences between treatment and control groups in med-
ical tests, or differences between features of females and males. Two situations can
be distinguished: (a) a paired test applies when measurements are obtained for the
same observational units (e.g. the blood pressure of individuals before and after
a certain treatment); (b) the observational units are not identical (e.g. salaries of
females and males); this is referred to as independent samples.
In a paired test the difference between the two available observations for each element
of the sample is computed. The mean of the differences is subsequently tested
against a null hypothesis in the same way as described above. For example, the
effectiveness of a drug can be tested by measuring the difference between medical
parameter values before and after the drug has been applied. If the mean of the
differences is significantly different from zero, whereby a one-sided test will usually
be appropriate, the drug is considered to be effective.
If data has been collected for two different groups, the summary statistics for the two
groups will differ, and the number of observations may differ. It is usually assumed,
55
The conclusions based on the three approaches to test a hypothesis must always coincide.
56
NORMVERT
57
STANDNORMVERT
5.3 Hypothesis tests 50
that the elements of each sample are drawn independently from each other.58 Sup-
pose the means of the two groups are denoted by ȳ1 and ȳ2 , the standard deviations
for each group are s1 and s2 , and the sample size of each group are n1 and n2 .
For the null hypothesis that the difference between the means in the population is
µ1 −µ2 the standardized test statistic is given by
The test statistic is compared to |zα/2 |, as described in the context of the standard-
ized test statistic.
(55083 − 48033) − 0
t= r = 1.5636.
119722 121822
+
18 12
This test statistic is less than the 5% critical value 1.96 and the p-value
is 11.8%. Although the difference between $48033 and $55083 is rather
large, it is not statistically significant (different from zero). Thus, the
sample provides insufficient evidence to claim that the salaries of females
and males are different. This can be explained by the small sample, but
also by that fact that other determinants of salaries are ignored.
58
The independence assumption does not hold in case of a paired test situation. Therefore, the
subsequently derived test statistic cannot be applied in this case.
6.1 Covariance and correlation 51
Example 24: We consider the data from example 1 and focus on the
relation between salaries and age of respondents. The purpose of the
analysis is to estimate the average increase in salaries over the lifetime
of an individual.
55 $58,100
48 $56,000 $30,000
51 $53,400 Age (x)
39 $39,000
45 $61,500 correlation coefficient 0.59
43 $37,700
syx
correlation: ryx = .
sy sx
n
1 X
covariance: syx = (yt − ȳ) · (xt − x̄).
n−1
t=1
Note that the correlations (and covariances) are symmetrical: the correlation be-
tween y and x (ryx ) is identical to the correlation between x and y (rxy ).
60
KORREL; KOVAR
6.1 Covariance and correlation 53
Table 4 illustrates the computation of the correlation coefficient using data from 10
regions (xt denotes ’age’ and yt denotes ’salary’). First, the means of the data are
estimated. Next, the means are subtracted from the observations and the product
of the resulting deviations from the mean is calculated. Dividing the sum of these
products (585080) by 9 (=n−1) yields the covariance syx =65009. The correlation ryx
is computed by dividing the covariance by the product of the standard deviations:
ryx =65009/(11257.4·10.9)=0.53. The covariance is measured in [units of y]×[units
of x]. The correlation coefficient has no dimension. The the correlation coefficient
using all available data in the present example is 0.59.
The correlation coefficient can also be computed from standardized values or scores.
The standardization
transforms the original values such that yt0 has mean zero and variance one. The
covariance between yt0 and x0t is equal to the correlation between yt and xt .
If more than two variables are considered, the covariances and correlations among
all pairs of variables are summarized in matrices. For example, the variance-
covariance matrix C and the correlation matrix R for three variables yt , xt
and zt have the following structure:
observation in the sorted sequence of yt and xt is determined (see Table 4). The
rank correlation is computed using the differences among ranks dt :
n
6 X
rr = 1 − d2t .
n(n2 − 1)
t=1
If the rank of both variables are identical rr =1. If the ranks are exactly inverse
rr =−1. In the present case the rank correlation hardly differs from the ’regular’
(linear) correlation.
◦ .
......
......
.◦
.
......
. .
.
.
. ◦ .
.
.
...
......
...
. .
.
.
.
b .
.
...
..... .
........
. .
.
. .
.
............ ....... ....... ....... ....... ................ .
◦ ◦ .
...
......
...
....... ..
1 .
.
.
...
.....
..
......
. .
.
......
...
......
...
....... ◦ .
.
...... .
.
...
.... .
...........
...
......
.
...
.
◦ .
.
.
.
.
.
c .
.
.
..xt
..
x
0
ŷt is the fitted value (or the fit) and depends on xt . et is the error or residual and
is equal to the difference between the observation yt and the corresponding value on
the line ŷt =c+b·xt .
The coefficients c and b determine the level and slope of the line (see Figure 13).
A large number of similar straight lines can approximate the scatter of points. The
least-squares principle (LS) can be used to fix the exact position of the line.
This principle selects a ’plausible’ approximation. The LS criterion states that the
coefficients c and b are determined such that the sum of squared errors is minimized:
n
X
least-squares principle: e2t −→ min .
t=1
Using this principle it can be shown that the slope estimate is based on the covariance
between yt and xt and the variance of xt and can also be computed using the
correlation coefficient:
sy syx
slope: b = ryx = 2 .
sx sx
6.2 Simple linear regression 56
intercept: c = ȳ − b · x̄.
This definition guarantees that the average error equals zero. c has the same dimen-
sion as yt .
Errors et =yt −ŷt occur for the following reasons (among others): (a) X is not the only
variable that affects Y . If more that one variable affects Y a multiple regression
analysis is required. (b) A straight line is only one out of many possible functions
and can be less suitable than other functions.
The coefficients c and b can be used to determine the conditional mean ŷ under the
condition that a particular value of xt is given:
ŷt replaces the (unconditional) mean ȳ, which does not depend on X. In other words,
only the mean ȳ is available if X is ignored in the forecast of Y . Using the mean
ȳ corresponds to approximating the scatter of points by a horizontal line. If the
regression model turns out to be adequate – if X is a suitable explanatory variable
and a straight line is a suitable function – the horizontal line ȳ is replaced by the
sloping line ŷt =c+b·xt .
Example 25: We consider the data from example 1 and run a simple
regression using ’salary’ as the dependent variable and ’age’ as the ex-
planatory variable. The scatter of observations and the regression line
are shown in Figure 14. The regression line results from a least-squares
estimation of the regression coefficients. Estimation can be done with
suitable software. The results in Figure 15 are obtained with Excel.
The resulting output contains a lot of information which will now be
explained using the results from this example.
Figure 14: Scatter diagram of ’age’ versus ’salary’ and regression line.
90000
60000
50000
40000
30000
25 30 35 40 45 50 55 60 65 70
age (x)
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 20610 8457 2.44 0.021 3286 37933
Age 634 166 3.82 0.001 294 974
6.3 Regression coefficients and significance tests 58
The estimated coefficients are 634 (b) and 20610 (c). In order to interpret these
values assume that the current age of a respondent is 58 (this is the first observation
in the sample). The estimated regression equation
yt = 20610 + 634 · xt + et
can be used to compute the conditional mean of salaries for this age: 20610+634·58=
57377. This value is located on the line in Figure 14. The observed salary for
this person is 65400. The error et =yt −ŷt is given by 65400−57377=8023; it is the
difference between the observed salary (yt ) and the (conditional) expected salary
(ŷt ). The discrepancy is due to the fact that the regression equation represents an
average across the sample. In addition, it is due to other explanatory variables which
are not (or cannot be) accounted for.
If age increases by one year the salary increases on average by $634 (or, the condi-
tional expected salary increases by $634). If we consider a person who is five years
older, the conditional mean increases to 20610+634·(58+5)=60547; i.e. its value in-
creases by 634·5=3170. Thus, the slope b determines the change in the conditional
mean. If xt (age) changes by ∆x units, the conditional mean increases by b·∆x
units. Note that the (initial) level of xt (or yt ) is irrelevant for the computed change
in ŷ.
The intercept (or constant) c is equal to the conditional mean of yt if xt = 0. The
estimate for c in the present example is 20610 which corresponds to the expected
salary at birth (i.e. at an age of zero). This interpretation is not very meaningful if
the X-variable cannot attain or hardly ever attains a value of zero. It may not be
meaningful either, if the observed values of the X-variable in the sample are too far
away from zero, and thus provide no representative basis for this interpretation.
The role of the intercept can be derived from its definition c=ȳ−b·x̄. This implies
that the conditional expected value ŷt is equal to the unconditional mean of yt if xt is
equal to its unconditional mean. The sample means of yt and xt are 52263 and 49.9,
respectively, which agrees with the regression equation: 20610+634·49.9=52263.
If the sample mean ȳ is used instead of the population mean µy an estimation error
results. For the same reason the position of the regression line is subject to an error,
since c and b are estimated coefficients. If data from a different sample was used,
the estimates c and b would change. The standard errors (the standard deviation of
6.3 Regression coefficients and significance tests 59
estimated coefficients) take into account that the coefficients are estimated from a
sample.
√
When the mean is estimated from a sample the standard error is given by s/ n.
In a regression model the standard error of a coefficients decreases as the standard
deviation of residuals se (see below) decreases and the standard deviation of xt
increases. The standard error of the slope b in a simple regression is given by
s
sb = √e .
sx n − 1
The standard errors of b and c are 166 and 8457 (see Figure 15). These standard
errors can be used to compute confidence intervals for the values of the constant
term and the slope in the population. The 95% confidence interval for the slope is
given by
b ± 1.96 · sb .
This range contains the slope of the population β with 95% probability (given the
estimates derived from the sample). The confidence interval can be used for testing
the significance of the estimated coefficients. Usually the null hypothesis is β0 =0;
i.e the coefficient associated with the explanatory variable in the population is zero.
If the confidence interval does not include zero the null hypothesis is rejected and
the coefficient is considered to be significant (significantly different from zero).
The boundaries of the 95% confidence interval for b are both above zero. Therefore
the null hypothesis for b is rejected and the slope is said to be significantly different
from zero. This means that age has a statistically significant impact on salaries.
The constant term is also significant because zero is not included in the confidence
interval. Note that the mean of residuals equals zero if a constant term is included
in the model. Therefore the constant term is usually kept in a regression model even
if it is insignificant.
If the explanatory variable in a simple regression model has no significant impact
on yt (i.e. the slope is not significantly different from zero), there is no significant
difference between conditional and unconditional mean (ŷt and ȳ). If that was the
case one would need to look for another suitable explanatory variables.
Significance tests can also be based on the t-statistic
b − β0
t= .
sb
The t-statistic corresponds to the standardized test statistic in section 5.3. The null
hypothesis is rejected, if t is ’large enough’, i.e. if it is beyond the critical value at
6.4 Goodness of fit 60
the specified significance level. The critical values at the 5% level are ±1.96 for large
samples.
The t-statistics for b is 3.82. The null hypothesis for b is rejected and the slope
is significantly different from zero. The constant term c is significant, too. These
conclusions have to agree with those derived from confidence intervals.
Significance tests can be based on p-values, too.65 As explained in section 5.3 the
p-value is the probability of making a type I error if the null is rejected. For a given
significance level, conclusions based on the t-statistic and the p-value are identical.
For example, if a 5% significance level is used, the null is rejected if the p-value is
less than 0.05.
In the present case the p-values of the slope coefficient is almost zero. In other
words, if the null hypothesis ”the coefficient equals zero” is rejected, there is a very
small probability to make a type I error. Therefore the null hypothesis is rejected
and the explanatory variable is kept in the model.
If the null hypothesis was rejected for the constant term the probability for a type I
error would equal 2.1%. Since this is less than α the null hypothesis is rejected and
the constant term is considered to be significant.
n
1 X
s2e = e2t ,
n−k−1
t=1
The multiple correlation coefficient measures the correlation between the observed
value yt and the conditional mean (the fit) ŷt . The multiple correlation coefficient
approaches one as the fit improves. The number 0.59 (see Figure 15) indicates an
acceptable, although not very high explanatory power of the model.
Coefficient of determination R2
(n − k − 1) s2e
R2 = 1 − 0 ≤ R2 ≤ 1.
(n − 1) s2y
R2 ranges from zero (the errors variance is equal to the variance of yt ) and one (error
variance is zero). The number 0.34 (see Figure 15) shows that 34% of the variance
in salaries can be explained by the variance in age.
Note, however, that high values of the multiple correlation coefficient and R2 do not
necessarily indicate that the regression model is adequate. There exist further crite-
ria to judge the adequacy of a model, which are not treated in this text, however.
µi = rf + βi (µm − rf ).
66
Bestimmtheitsmaß
67
Only relevant for banking, finance or similar courses.
6.5 Estimating the CAPM 62
yti = αi + βi ytm + et ,
sim
βi = ,
s2m
where sim is the (sample) covariance between the returns of asset i and the market
return and s2m is the (sample) variance of the market return. This formula can also
be derived on the basis of financial theory.
As an example we use 80 monthly observations of ATX index values and stock prices
of Bank-Austria (BA) from January 1986 until August 1992. Using log returns the
estimated regression equation is (standard errors in parentheses):
The estimated coefficient βi is 0.883. The estimate is highly significant. The stan-
dard error of the coefficient equals 0.067 which implies a t-statistic of 13.2.
The coefficient 0.883 can be interpreted as follows. A change in the market return
by 10 percentage points implies a change in the asset’s expected return by 8.83
percentage points. However, if the market return equals 10% the expected return of
the asset is given by 8.91% (0.076+0.883 · 10.0=8.91).
R2 equals 0.693 which implies that almost 70% of the variance in the asset’s return
can be explained by – or is due to – the market return. Based on the market model
6.6 Multiple regression analysis 63
the total variance of an asset can be split into market- and firm-specific variance as
follows:
Since R2 can also be written as (βi2 s2m )/s2i it measures the proportion of the market-
specific variance to total variance. The R2 of the example shows that about 70% of
the asset’s total variance is systematic or market-specific (not diversifiable risk) and
about 30% are unsystematic or firm-specific (diversifiable risk).
Using the data from the present example the 5%-VaR of the BA stock can be com-
puted from
or
y = c + b1 · x1 + · · · + bk · xk + e
The intercept c is the fitted value for yt if all X-variables are equal to zero. At the
same time it is the difference between the mean of yt and the value of ŷt that results
if all X-variables are equal to their respective means:
The coefficients from simple and multiple regressions differ when the explanatory
variables are correlated. A coefficient from a multiple regression measures the effect
of a variable by holding all other variables in the model constant (c.p. condition).
Thus, by taking into account the simultaneous variation of all other explanatory
variables, the multiple regression measures the ’net effect’ of each variable. The
effect of variables which do not appear in the model cannot be taken into account
in this sense. A simple regression ignores the effects of all other (ignored) variables
and assigns their joint impact on the single variable in the model. Therefore the
estimated coefficient (slope) in a simple regression is generally too small or too large.
Example 26: Obviously, a person’s salary not only depends upon age,
but also on factors like ability and qualifications. This aspect can be
measured (at least roughly) by the education time (schooling). A multi-
ple regression will now be used to assess the relative importance of age
and schooling for salaries.
The results of the multiple regression between salary and the explanatory variables
’age’ and ’schooling’ are summarized in Figure 16. By judging from the p-values we
conclude that both explanatory variables have a significant effect on salaries.
An increase in schooling by one year leads to an increase in expected salaries by
$1501, if age is held constant (ceteris paribus; i.e. for individuals with the same
age). The coefficient 723 for ’age’ can be interpreted as the expected increase in
6.6 Multiple regression analysis 65
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept -2350 10064 -0.23 0.817 -23000 18299
Age 723 145 4.98 0.000 425 1021
School 1501 455 3.30 0.003 567 2434
salaries induced by getting older by one year, if education does not change (i.e. for
people with the same education duration). Note that this effect is stronger than
estimated in the simple regression (see Figure 15). The estimate 723 in the current
regression can be interpreted as the net-effect of one additional year on expected
salaries accounting for schooling. If salaries are only related to age (as done in the
simple regression) the effects of schooling on salaries are erroneously assigned to the
variable ’age’ (since it is the only explanatory variable in the model).
To measure the joint effects from several variables we use the general formula
For example, comparing two individuals with different age (10 years) and schooling
(2 years) shows that expected salaries differ by 723·10+1501·2=10232.
The variable ’G01’ is a dummy-variable. The 0-1 coding allows for a meaningful
application and interpretation in a regression. Adding the variable G01 to the
multiple regression equation and estimating the coefficients yields the results in
Figure 17. The coefficient of G01 is −5601. It shows that women earn on average
$5601 less than men, holding everything else constant (i.e. compared to men with
the same age and education). This negative effect is not as significant as the effects
of age and schooling as indicated by the p-value 6.6%. Using a significance level of
5% the gender-specific difference in salaries is not statistically significant.
6.6 Multiple regression analysis 66
Figure 17: Estimation results for the multiple regression model including a dummy
variable ’G01’ for gender.
Regression Statistics
Multiple R 0.77
R Square 0.59
Adjusted R Sq. 0.54
Standard Error 7771
Observations 30
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 1367 9790 0.14 0.890 -18756 21490
Age 694 139 4.98 0.000 408 980
School 1500 434 3.46 0.002 609 2392
G01 -5601 2915 -1.92 0.066 -11592 390
Adding the third explanatory variable to the regression equation leads to a reduction
in the standard error from 9480 in the simple regression to 7771. Accordingly, the
coefficient of determination increases from R2 =0.34 to R2 =0.59. Note however, that
R2 always increases when additional explanatory variables are included. In order to
compare models with a different number of X-variables the adjusted coefficient
of determination R̄2 should be used:
s2e
R̄2 = 1 − .
s2y
Out of several estimated multiple regression models the one with the maximum
adjusted R2 can be selected. Note, however, that there is a large number of other
criteria available to select among competing models.
Given that a regression model contains some insignificant coefficients the following
guidelines can be used to select variables.
7 Decision analysis
Example 28: NEWDEC has developed and carefully tested a new elec-
tronic device. The marketing department is currently discussing a suit-
able marketing strategy. NEWDEC is aware of the fact that a selected
strategy need not necessarily achieve the desired results. Consumer
tastes and short-terms fashions are hard to predict. In addition, the
main competitor – known to set prices well below NEWDEC – may also
be about to introduce a new device.
NEWDEC considers a range of marketing activities which may be char-
acterized in terms of three strategies:
An aggressive strategy entails substantial advertising expenditures and
aggressive pricing. This strategy includes hiring additional staff and
investing in additional production facilities to cope with the increased
demand associated with a successful marketing campaign.
In the basic strategy advertising levels will be moderately increased for
a few weeks. This will be supported by reduced prices during the intro-
ductory phase. Existing production facilities are planned to be modified
and slightly expanded. Only a limited number of additional staff is re-
quired in this case.
A cautious strategy is mainly based on the use of existing production
facilities and does not require to hire new personnel. Advertising would
be mainly done by local sales representatives.
The current market situation – which refers to the actual but unknown
state of the market – is unknown to NEWDEC. However, to facilitate the
search for a suitable marketing strategy, the following three possibilities
are considered: the readiness of the market to accept the new product
is considered to be high, medium or low. These categories are mainly
based on sales forecasts whereupon probabilities can be assigned to each
category.
The management at NEWDEC carefully evaluates each possible case and
determines its monetary consequences (in terms of present values of the
coming two years). Expected payoffs (positive or negative cash-flows)
are summarized in Table 5.
This problem – the optimal choice of a marketing strategy given uncertainty about
the market conditions – can be solved using a decision analysis. One out of m
mutually exclusive alternatives (Ai , i=1, . . . , m) is chosen by taking into account
n uncertain, exogenous states (Zj , j=1, . . . , n). For each pair Ai -Zj the monetary
consequences (payoffs) must be specified. The choice is based on a suitable decision
criterion.
Note the sequence of steps associated with this approach. The decision is made (e.g.
7.1 Elements of a decision model 69
Table 5: Payoff-matrix.
acceptance level
high medium low
strategy Z1 Z2 Z3
aggressive A1 120 50 –40
basic A2 80 60 20
cautious A3 30 35 40
alternative A3 is chosen). Then one of the anticipated states is actually taking place
(e.g. Z2 ). Finally, the monetary consequences associated with the combination A3
and Z2 are realized. If the choice has an effect on the states of nature or the number
of possible states, the decision problem can be solved using a decision tree.
Z1 Z2 Z3 Z1 Z2 Z3
”Out of the previous 120 months sales dropped in 36 months.” The relative
frequency 36/120=0.3 can be used to set the probability of the state ’sales
down’ at 30%.
The probabilities in the present example are based on historical data: 25% for
state ”low”, 40% for state ”medium”, and 35% for state ”high”.
Probabilities describe the degree of available information of a decision maker
and can be used to characterize the decision problem as follows:
(a) Decisions under certainty: The probability for one of the states is one
and zero for all others.
(b) Decisions under uncertainty: It is not possible to specify probabilities at
all. Such cases are solved by referring to decision rules.
(c) Decisions under risk : Probabilities (different from one) can be assigned
to each state. The riskiness of a decision problem can be characterized on
the basis of the probability distribution. If all probabilities are about the
same – as in Figure 18(a) – the involved risk is larger than in a situation
where one of the states has a relatively high probability (e.g. state Z2 in
Figure 18(b)).
According to the maximax-criterion the maximum payoff (across states) for each
alternative is determined first. The optimal decision is the one which points at the
maximum out of these. That is why this criterion is considered to be optimistic.
The choice can be formalized by
The Laplace-criterion assigns the same importance to each state. The decision
with the maximum (unweighted) average payoff is chosen:
n
1X
max Cij .
i n
j=1
These criteria have been criticized by using examples which lead to unacceptable
solutions. Consider the payoff-matrix in Table 7. According to the maximin-criterion
alternative A2 is optimal. However, it is very plausible that even pessimists would
chose A1 because it dominates A2 in almost every case. In addition, the payoff
associated with A1 in state Z5 (in which A2 is preferable) is not much worse. Given
that a criterion leads to questionable results in such simple cases it is hard to justify
its use in more complex situations.
7.3 Decisions under risk 72
n
X
µi = pj Cij .
j=1
One important aspect is ignored if decisions are made on the basis of expected
values: the variability of outcomes across states. In the present example the payoffs
of the cautious strategy are rather similar across states. The aggressive strategy is
characterized by a substantial variation of payoffs.
This fact can be accounted for by the variance (or standard deviation) of payoffs.
The variance of alternative Ai is defined as
n
X
σi2 = pj (Cij − µi )2 .
j=1
A frequently used decision criterion can be defined in terms of mean and variance
of payoffs. The optimal decision is based on the value of
µi − lσi2 .
l is a factor which determines the trade-off between expectation and variance, and
depends on the risk aversion of the decision maker. More risk aversion is taken into
account by higher values of l. Thereby the subjective risk attitude of the decision
maker is taken into account.
The value assigned to l may be derived from interviewing the decision maker. The
questions are formulated in such a way that the personal trade-off between µ and σ 2
can be estimated. A frequently used approach is to present the decision maker with a
game. For example, he may win 30 or 60 units with 50% probability. The magnitude
of these amounts should correspond to the relevant magnitudes in the current, actual
decision problem. The decision maker is asked to specify a certain amount which
generates the same utility as playing the game (with uncertain outcomes). Suppose
7.3 Decisions under risk 73
the decision maker states an amount equal to 40. This amount is the so-called
certainty equivalent. The expected value of the game is given by 45 (0.5 · 30+0.5 ·
60=45). It is higher than the certainty equivalent and the difference is due to the
risk associated with the game. l can be derived from the equation µ−lσ 2 =40. Since
the game implies µ=45 and σ=15 we obtain l=0.02̇.
Figure 19 presents the results from applying various criteria to NEWDEC’s decision
problem. According to the pessimistic maximin-criterion the cautious strategy is
chosen, the optimistic maximax-criterion chooses the aggressive marketing strategy,
and applying the Laplace-criterion yields the basic strategy.
If probabilities are taken into account, decisions can be based on maximizing the
expected payoff µ. It turns out that the basic strategy has the maximum expected
value of 57. If NEWDEC considers this criterion as appropriate, choosing the basic
strategy can be viewed as being indifferent between a certain amount of 57, and
receiving uncertain outcomes of 80, 60 or 20 (with the associated probabilities). The
expected value of the aggressive strategy has similar order of magnitude, whereas
the expectation of the cautious strategy is much lower.
Maximizing the expected value does not take into account the risk aversion of a
decision maker. In fact, it can be shown that this criterion is only appropriate if she
or he is risk neutral. Accounting for mean and variance (and thereby for the risk
attitude) overcomes this deficiency. In the present case, the basic strategy is chosen
according to the µ-σ criterion. The cautious strategy would only be chosen for a
higher degree of risk aversion (e.g. λ=0.05).
A second possibility to account for the risk attitude of a decision maker is to max-
imize expected utility. According to the Bernoulli-criterion all payoffs are
replaced by utility units as shown below. Utility can be viewed as a measure of the
7.3 Decisions under risk 74
attractiveness of an outcome.
A utility function reflects the risk attitude of a decision maker. It can be derived in
a similar way as the trade-off parameter λ. The units of measurement are arbitrary.
It is useful to assign a value of one to the maximum payoff and zero to the minimum
payoff. From Table 5 we find U (120)=1 and U (−40)=0 (see Figure 20). Consider the
utility of the payoff C22 =60, for example. The decision maker is asked to specify a
probability p (between zero and one) such that he is indifferent between the following
alternatives: receive a certain payment 60, or play a game which yields either 120
with probability p or −40 with probability 1−p.
Suppose the decision maker is indifferent if p=0.75. In this case the expected value
of the game is 80 (0.75·120+0.25·−40=80). This amount is larger than the certain68
payoff 60. The decision maker requires a compensation for playing the risky game.
The probability p=0.75 is the utility assigned to the payoff C22 : U (60)=0.75. Using
the same procedure, utilities can be assigned to each payoff from Table 5. Based
on the utility function in Figure 20, the maximum expected utility is found for the
basic strategy (see sheet ’decision analysis’).
The concept of a utility function introduced above can also be applied by using
specific mathematical functions. One example is the exponential utility function
U (W ) = − exp{−W/T }.
W is the monetary outcome, typically the profit or wealth associated with a decision
problem. It is a random variable which is affected by the decisions and the uncertain
68
60 can be viewed as the certainty equivalent of a game with expectation 80.
7.3 Decisions under risk 75
outcomes. T is a coefficient which reflects the risk aversion of the decision maker. In
fact, it is a parameter of risk tolerance, which is inversely related to risk aversion.
As a rough guideline for the choice of T we can refer to empirical evidence (see AWZ,
p.359). Companies were found to choose T approximately as 6% of net sales, 120%
of net income, and 16% of equity.
A decision criterion based on this (or any other) utility function is to maximize the
expected utility (of wealth). This expectation depends on the statistical properties
of W . If W is assumed to be normally distributed it can be shown that maximizing
the expected value of the exponential utility function is equivalent to maximizing
For the current situation and a risk tolerance of T =0.125, the investor
should invest X ∗ =0.156. Using the numerical example with simulated
returns we obtain X ∗ =0.16.
SciTools believes that the possible bids from the competition (if there is
competition) and the associated probabilities are:
bid probability
less than $115,000 20%
between $115,000 and $120,000 40%
between $120,000 and $125,000 30%
greater than $125,000 10%
There are three elements to SciTools’ problem. The first element is that
they have two basic strategies – submit a bid or do not submit a bid. If
they decide to submit a bid they must determine how much they should
bid. The bid must be greater than $100,000 for SciTools to make a profit.
Given the data on past bids, and to simplify the subsequent calculations,
SciTools considers to bid either $115,000, $120,000, or $125,000.
The next element involves the uncertain outcomes and their probabili-
ties. The only source of uncertainty is the behavior of the competitors
– will they bid and, if so, how much? From past experience SciTools is
able to predict competitor behavior, thus arriving at an estimated 30%
for the probability of no competing bids.
The last element of the problem is the value model that transforms deci-
sions and outcomes into monetary values for SciTools. The value model
in this example is straightforward. If SciTools decides right now not to
bid, then its monetary values is $0 – no gain, no loss. If they make a
bid and are underbid by a competitor, then they lose $5000, the cost of
preparing the bid. If they bid B dollars and win the contract, then they
make a profit of B minus $100,000; that is, B dollars for winning the
bid, less $5000 for preparing the bid, less $95,000 for supplying the in-
struments. The monetary values are summarized in the following payoff
table (all entries in $1000):
competitors’ bid
strategy no bid <115 115–120 120–125 >125
no bid 0 0 0 0 0
bid 115 15 –5 15 15 15
bid 120 20 –5 –5 20 20
bid 125 25 –5 –5 –5 25
8 References
(more recent editions may be available)
comprehensive, many examples, uses Excel: Albright S.C., Winston W.L., und
Zappe C.J. (2002): Managerial Statistics, 1st edition, Wadsworth; the title of
the third edition is Data Analysis and Decision Making.
comprehensive, many examples: Anderson D.R., Sweeney D.J., William T.A., Free-
man J., und Shoesmith E. (2007): Statistics for Business and Economics,
Thomson.
not too technical; for social sciences and psychology: Bortz J. und Döring N. (1995):
Forschungsmethoden und Evaluation, 2. Auflage, Springer.
covers sampling procedures on a basic level and (very) advanced methods; many
examples: Lohr S.L. (2010): Sampling: Design and Analysis, 2nd edition.
I like this one: Neufeld J.L. (2001): Learning Business Statistics with MS Excel,
Prentice Hall.
9 Exercises
The data for these exercises can be found in the file ’exercises.xls’.
Exercise 170
The Spring Mills Company produces and distributes a wide variety of manufactured
goods. Due to its variety, it has a a large number of customers. Spring Mills
classifies these customers as small, medium and large, depending on the volume of
business each does with them. Recently they have noticed a problem with accounts
receivable. They are not getting paid by their customers in as timely a manner as
they would like. This obviously costs them money.
Spring Mills has gathered data on 280 customer accounts (see sheet ’receive’). For
each of these accounts the data set lists three variables: ’Size’, the size of the cus-
tomer (coded 1 for small, 2 for medium, 3 for large); ’Days’, the number of days
since the customer was billed; ’Amount’, the amount the customer owes.
Consider the variables ’Days’ and ’Amount’, carry out the following calculations and
describe your findings. You may want to distinguish your analysis by the variable
’Size’.
Exercise 2
A team of physiotherapists wants to test the effectiveness of a new treatment. For
that purpose, a sample of 34 people is randomly selected. A key parameter, which
should respond to the treatment, is measured immediately before and one hour after
treatment.
70
Example 3.9 on page 95 in AWZ.
9 EXERCISES 81
1. Compute the difference if the measurements before and after (before minus
after).
3. Use the sample to compute a 95% confidence intervals for the average difference
in the population!
4. Use the sample to test the null hypotheses µ0 =0 (no effect) using the signifi-
cance level α=0.05. For that purpose use
Exercise 371
The sheet ’costs’ lists the number of items produced (items) and the total cost (costs)
of producing these items.
2. Compute the fitted values (expected costs) and the residuals (errors).
4. Comment on the goodness of fit of this model. What can you say about the
magnitude of errors?
Exercise 472
The sheet ’car’ contains annual data (1970-1987) on domestic auto sales in the
United States. The variables are defined as ’quantity’: annual domestic auto sales
(in thousand units); ’price’: real price index of new cars; ’income’: real disposable
income; ’interest’: prime rate of interest (in %).
1. Estimate a multiple regression model for ’quantity’ using ’price’, ’income’ and
’interest’ as explanatory variables.
71
Example 13.4 on page 696 in AWZ.
72
Example 13.5 on page 703 in AWZ (slightly modified).
9 EXERCISES 82
2. Compute the fitted values (expected cars sold) and the residuals (errors).
4. Comment on the goodness of fit of this model. What can you say about the
magnitude of errors?
8. Suppose income drops by 100 units and the interest rate increases by two per-
centage points. What is the required change in the price index to compensate
for the associated effect on the expected car sales?
Exercise 5
O’Drill Inc. plans to drill for oil in a promising area. O’Drill is using a new drilling
station which has a new drilling head already built in. A drilling head has to be
replaced after drilling for 2000 meters. O’Drill does not know how deep it has to
drill until oil is found. According to the estimates, there is a 30% probability to find
oil between 0m and 2000m. The probability to find oil between 2000m and 4000m is
considered to be 50%. 20% is the probability to find oil between 4000m and 6000m.
O’Drill rules out the case of finding oil below 6000m. Given this uncertainty it is
unclear how many additional drilling heads – in addition to the one already built in
– are required.
Drilling heads can be ordered from two different suppliers. Supplier A charges 60
for each head ordered right now (a special deal). If ordered at a later date, supplier
A charges 100 for an additional head. Drilling heads which have not been used can
be sold back to supplier A for 20. Installing an additional head costs 40.
Supplier B offers an all-inclusive contract, and charges 120 for delivering and in-
stalling any additional head.
1. Determine the states of nature and the set of decisions (strategies) O’Drill can
choose from.
2. Which costs are associated with each pair of decisions and states of nature?
3. Which strategy should O’Drill choose? Use a suitable criterion to support its
decision!
10 CASES 83
10 Cases
The following three cases are taken from the book ”Business Statistics: For Con-
temporary Decision Making” by Ken Black (Wiley). The fourth case is taken from
AWZ (Example 7.5).
ing a new computerized billing system. This system would require project managers
to enter all purchase orders and staff hours as soon as the services are completed.
Suppliers would be asked to submit their charges electronically, and these charges
would be added to the charges data file as soon as received. At the end of the
project, a list of outstanding charges from suppliers would be prepared, and the
project manager would direct an effort to obtain all remaining supplier charges.
Previously, senior management argued that developing such a system would be too
expensive and that operating it would necessitate too much extra work. After your
meeting, you and George agree that you will focus on the potential cost savings
and then determine whether these are sufficient to justify the computerized system
proposed by George.
Technical background
Accounts receivable are funds owed by clients for goods or services provided. In ac-
counting terms, accounts receivable are recognized when the services are performed.
In addition to the actual charges, the receivables include a markup to cover profit,
management, and allocated overhead. Receivables impose a cost on Olson because
the company must borrow funds to cover payments to its suppliers before payment
is received from the client. Olson maintains a line of credit with a local bank at
an annual interest rate of 10 percent to cover these receivables. Thus, every $1,000
dollars held as receivables for the year results in an interest charge of $100. If Ol-
son can reduce the time for a $1,000,000 client charge by one day, the net savings is
$274 [0.10·$1,000,000/365], and a savings of ten days is worth $2,740. In general, the
amount of savings obtainable by reducing the billing time for each client contract is
equal to the dollar amount of the client bill multiplied by the 0.10 bank interest rate,
with the result then multiplied by the reduction in billing time (in days) divided by
365.
You have been asked to study the pattern of bill preparation times cross-referenced
by the size of client bills. From this research, you are to indicate the potential savings
that could result from reducing billing time. You might, for example, recommend
hat efforts be concentrated on bills for amounts above a certain level. Or you might
recommend that efforts be applied uniformly to all bills. Generally, larger bills
entail substantially more cost items and thus necessitate more work to accumulate
all charges.
The data file supplied by George Reale contains 445 observations on two variables.
The first column indicates the total charges billed for the project, and the second
column indicates the number of days required to prepare and submit the bill to the
10.1 Olson Diversified Marketing Services – Analysis of Receivables 85
client. The data are stored in a file named ”olson.xlsx”. Your case project includes
the following tasks.
1. Describe the statistical properties of the number of days and the amount of
the client’s bill.
2. What are the chances (i.e. percentage of cases) that the company reaches its
goal of preparing the bill within 45 days?
Sally Parsons, president of Production System Inc., has asked you to assist in ana-
lyzing the company’s salary data. She has recently received a series of complaints
that women employees are receiving lower wages for comparable jobs. The com-
plaints came as a surprise to Sally because she thought salary increases were based
on experience and Performance. Sally was aware that the average wages of women
staff were less overall than those of men. However, she also knew that women staff
had less experience and thus would be expected to have lower wages. Given the
complaints Sally understands that she must have objective information.
Production Systems Inc. is a regional computer systems development company hat
specializes in work with banks and insurance companies. The company started as
a service department for a regional accounting company in the 1960s. A few of the
current employees actually came from the accounting company. In the early 1970s,
Production Systems Inc. became an independent company. The company has been
quite successful, experiencing steady growth over its entire life. Employment has
not grown as rapidly as have the company’s total billings because of productivity
innovations introduced during the past ten years.
The company has tended to hire experienced professionals with masters degrees in
technical fields including business, management science, computer science, engineer-
ing, economics, and mathematics. Most of the employees have come with significant
experience. The youngest employee in the professional group is 29, and the most
senior is 65. Experience with Production Systems varies over a wide range, with
one-fourth having less than seven years and one-fourth having more than 22 years.
Most of the women employees have less experience with the company.
The professional staff has only three levels: systems analyst, team systems analyst,
and project systems analyst. Salary ranges are quite wide within each of the levels.
Project systems analyst is the highest level, followed by team systems analyst. Pro-
motion to the higher ranks is awarded by a committee of project systems analysts,
and advancement to each level usually requires a minimum of six years’ experience
and significant project work. In general, persons at the higher levels are more pro-
ductive and tend to direct projects. It is possible, however, for a group that includes
several project systems analysts to be directed by a team systems analyst. For ev-
ery project under contract, a team of the best available people is created to carry
out the work. It is also well known that some persons at the highest level are less
productive than others at lower levels. Thus higher status and salary is a reward
for past performance and not a reliable measure of present contribution.
Salary adjustments are sometimes made to recognize certain specialty skills that
demand a high price in the labor market. Persons who work in database systems
10.2 Production System Inc. – Development of a Salary Model 87
programming have unique skills that are highly sought by other companies. Another
special category is technical systems developers – people who prepare specialized
high-performance software for key parts of large systems. People with either of
these skills are in great demand, so they must be paid a premium if they are to
be retained. Such specialists work at all three professional staff levels, depending
on their experience with the company, but the company has not provided premium
salaries merely by promoting the specialists. The personnel policy has been to base
promotions on a wide range of work and project management skills. Special skills
are compensated by a separate adjustment. Because promotions to higher levels
are related to experience, the company has sought to avoid confusing levels and
specialized skills that have a market premium.
Problem analysis
Your project analysis begins with a series of meetings you have with Sally Parsons
and the director of human resources, Gilbert Chatfield. Both Sally and Gilbert in-
dicate that wages tend to increase with experience in the company. The managers
conduct an annual employee review, which relies heavily on input from project lead-
ers who are directing teams at various remote locations. Project leaders shift as
projects are completed and new teams are assigned. Thus, obtaining consistent
information to provide the basis for a high-quality employee evaluation is difficult.
Most of the employees at Production Systems enjoy their independence and challeng-
ing work; salary levels have not been a major concern for most employees. Certain
people are recognized as strong performers, and their increases and promotions are
generally accepted by the professional staff.
In recent years, however, concerns have been raised about the fairness of the system
of awarding salary increases. The complaint by women employees is the most serious,
but other complaints have been made over the past several years. In view of these
concerns, you recommend that a salary regression model be developed. This model
would use data based on the current salaries paid to professional staff and important
variables that define the experience and skill levels of the staff. Such a model would
indicate the effect of various factors that contribute to salary level, and it would
identify persons whose salaries are above and below the expected salary. The model
could also be used to determine whether an employee’s gender indicates a salary
that is higher or lower than would be expected on the basis of experience and
qualifications.
After some discussion, Sally and Gilbert agree that this model should be developed.
It would be useful for answering the present complaint, and it would provide a
tool for reviewing the complete professional staff salary structure. After reviewing
the employee data records, you select a candidate set of variables for the model
development. These variables, which are contained in a file named ”prodsys.xlsx”,
10.2 Production System Inc. – Development of a Salary Model 88
are described in the table below. To protect the confidentiality of each employee’s
salary record, there is no variable to identify individual employees in this file. At
the completion of the study, you will provide Gilbert with a list of employees who
are substantially below or above the standard predicted by the model. Since he
has the identification key for each employee, and has access to other performance
information, he can decide whether certain persons’ salaries should actually be above
or below the standard.
file: prodsys.xlsx
variable name description
age age of the employee
yearsexp years of experience
yearslv2 number of years as a team systems analyst
yearslv3 number of years as a project systems analyst
gender gender: 1=female; 0=male
spec1 specialty: 1=database systems development skill; 0=else
spec2 specialty: 1=technical systems development skill; 0=else
salary annual salary (in dollars)
Your final discussion concludes with your agreeing to include the following tasks in
your study.
2. Does the average compensation paid to women differ from that paid to men?
Answer that question
In each case carry out statistical tests in order to judge whether the observed
difference is statistically significant! Discuss the reasons for obtaining (poten-
tially) different results from (a) and (b) regarding the difference in average
salary.
3. Evaluate the salary effects associated with specialty skills. Derive an estimate
of the premium that has to be paid for these skills, and judge the (statisti-
cal) significance of the premium for specialty skills. Compare the results you
obtain from the complete regression to another one which ignores all other
characteristics (i.e. only includes the dummy variables for specialty skills).
10.3 American Motors – Predicting Fuel Economy and Price 89
After years of declining sales and facility downsizing American Motors has begun
to experience increased demand for its vehicles. Design, performance, and quality
measures indicate that its vehicles compare favorably with those produced by other
foreign and domestic companies.
Reflecting on the years of major business problems, senior management realized that
during the 1970s it had not been sensitive to the business environment. Consumer
tastes changed, and foreign competitors produced cars that responded better to
consumer needs. In addition American Motor’s quality had decreased so that per-
formance was substandard for the industry. To prevent future problems, American
developed a long-range market planning group responsible for monitoring the mar-
ket environment and recommending change to help maintain American’s competitive
position.
The long-range market planning group prepares alternative strategies for developing
future cars, and it monitors American’s performance compared to the rest of the in-
dustry. Each company in the industry has various automobile models that resemble
those in American’s product mix. However, the number of units sold for each of var-
10.3 American Motors – Predicting Fuel Economy and Price 90
ious models differs for each company. Therefore, it is difficult to compare companies
by the product mix. Instead the market planning group has developed compar-
isons based on important vehicle performance characteristics, including number of
cylinders, horsepower, acceleration, engine displacement, and vehicle weight.
You have been asked to develop mathematical models to describe fuel economy
and vehicle selling price as a function of these performance characteristics. These
models will be used to estimate which performance characteristics have the greatest
effect on purchasing decisions. In addition, the effect of various combinations of
characteristics will be estimated. A representative sample of the various automobiles
in the present national vehicle mix has been obtained for your analysis.
The analysis will determine which performance variables are significant factors and
will estimate the importance of those variables. Before beginning the analysis, you
meet with a number of experienced engineers to learn more about the vehicle per-
formance characteristics and their relationship to fuel economy. You also talk with
a number of experienced members of the marketing staff to learn how various con-
sumer groups rate the importance of the different performance characteristics.
The fuel economy variable has become increasingly important for vehicle market-
ing. Initially, fuel economy represented an important national policy objective after
major producer countries restricted supply and increased price. The national policy
contained a number of energy conservation measures, including fuel economy (miles
per gallon) minimums, imposed on each manufacturer. A manufacturer can meet
these standards either by selling a greater proportion of small fuel-efficient cars or
by improving the fuel economy of its larger and higher-priced cars. The company
would prefer the latter strategy, because larger cars provide higher revenue and a
larger contribution to overhead and profit. Fuel economy improvements can be ob-
tained by reducing vehicle size and weight, by reducing engine size and horsepower,
or by providing a combination of weight and horsepower reductions. Having smaller
size and weight usually reduces comfort, while the lower horsepower reduces vehicle
performance. Fuel economy can also be improved by instituting better engine design
and engine operating control. For example, the combination of sensors and a com-
puter processor control can provide the ideal fuel/air mixture for different vehicle
load conditions. The present overall vehicle mix results from a number of consumer
choices. Thus, analysis of the present mix offers a way of measuring consumer pref-
erences.
After meeting to define specific objectives for the study, the planning staff decides
that the study should identify key factors that affect fuel economy and key factors
that affect vehicle price. These factors could then be used as planning parameters
for developing new vehicle designs. American Motors also wants to know how it
compares with the rest of the industry as gauged by the driving factors, and it
wants to know the relationship of these factors to fuel economy and vehicle price.
10.3 American Motors – Predicting Fuel Economy and Price 91
The planning group has asked you to perform an analysis to answer these questions.
The variables in the file ”motors.xlsx” are described in the table below. Your analysis
should include the following steps.
1. Run a regression to describe fuel economy (miles per gallon) using all available
explanatory variables (don’t use the columns ’country’ and ’company’ which
are only included in the sheet to derive the regressors ’US’ and ’AmMo’).
2. Identify which factors have a significant effect on fuel economy using a signif-
icance level of 10%.
3. Determine whether the fuel economy for American Motors vehicles is signifi-
cantly above or below – or roughly matches – the fuel economy of its competi-
tors.
4. Run a regression to describe the vehicle selling price using all available ex-
planatory variables.
5. Indicate what factors have a significant effect on the selling price (using a
significance level of 10%).
6. Determine whether the selling price for American Motors vehicles is signifi-
cantly above or below – or roughly matches – the industry level.
7. Determine whether the selling price of cars built in Europe or Japan differs
significantly or roughly matches the price of U.S. cars.
file: motors.xlsx
variable name description
price price in dollars
milpgal miles per gallon (measure for fuel economy)
cylinder number of cylinders in the car
displace cubic inches of engine displacement
horspwr horsepower generated by the engine
accel acceleration (in seconds) to 60 miles per hour
year model year of the vehicle
weight vehicle weight (in pounds)
country country of origin (1=U.S., 2=Europe, 3=Japan)
company manufacturer (1=American Motors; 2=Apex Motors;
3=Smith Motors; 4=other)
10.4 ACME Marketing Strategy – A Multistage Decision Problem 92
The three basic elements of this decision problem are: the possible strategies, the
possible outcomes and their probabilities, and the value model. The possible strate-
gies are clear: ACME must first decide whether to conduct the test market. Then
it must decide whether to introduce the product nationally. If ACME decides to
conduct a test market they will base the decision to market nationally on the test
market results. In this case its final strategy will be a contingency plan, where it
conducts the test market, then introduces the product nationally if it receives suffi-
ciently positive test market results, and abandons the product if it receives negative
test market results. The optimal strategies from many multistage decision problems
involve similar contingency plans.
10.4 ACME Marketing Strategy – A Multistage Decision Problem 93
Conditional Probabilities
Approaching this problem requires to know the probabilities of the test market
outcomes and the conditional probabilities of national market outcomes given
the test market outcomes. However, suppose ACME decides not to run a test
market and then decides to market nationally. Then what are the probabilities of
the national market outcomes? You cannot simply assess three new probabilities.
These probabilities are implied by the given probabilities. This follows from the rule
of conditional probability. Let T1 , T2 , and T3 be the test market outcomes and N be
any national market outcome, then the probability for a particular national market
outcome is given by