01 Introduction1
01 Introduction1
Introduction
1. Model specification
Model: a summary of the theory under consideration, into a set of empirically measurable
and testable hypotheses. It aapproximates the phenomena to be studied
2. Facts
Facts: refer to the events in the real world relating to the phenomena under investigation.
The two ingredients are combined by estimation of an econometric model using the facts.
Employ econometric techniques to estimate parameters. The estimated model provides a
way of measuring and testing relationships suggested by economic theory.
Thus, the econometric approach combines theory and facts in a particular way. It can be
seen as
a) an application of real-world data to economic theory.
b) a systematic way of studying economic theory using facts.
1
1.1.2 The purpose of econometrics
Three principal purposes of econometrics; a study may target any one or more of them
structural analysis,
forecasting and
policy evaluation.
Structural analysis: the use of estimated results for quantitative measurement of
economic relationships. It
a) facilitates the comparison of rival theories proposed for the same phenomena.
b) is a means of understanding real-world phenomena by quantitatively measuring,
testing and validating economic relationships.
Forecasting: use estimated parameters to predict outcomes outside the samples data.
Policy evaluation: use estimated values to choose between alternative policies.
a) Verbal/logical models
This approach uses verbal analogies and the result is sometimes called a paradigm. Such
models often treat the system ‘as if’ it were in some sense purposeful. The earliest and
still two of the best paradigms developed in economics were those by Adam Smith—that
of division of labour and the invisible hand in markets.
b) Physical models
An appropriate scaling down or up of the item investigated. Model airplane is a scale
down of a real airplane while in molecular biology they usually scale up the protein
molecules to be able to manipulate them.
c) Geometric models
Geometric models are used to represent relationships geometrically. Usually two
dimensional--to indicate the principal relationships between major variables representing
the phenomena under investigation.
2
C=a +
C C=f(Y)
Y
d) Algebraic models
Both the verbal and geometric models have to be expressed algebraically before they can
be transformed into an econometric model. For example, the macroeconomic income
determination graph
can be expressed algebraically as follows:
= ( ) 1
≡ + 1
2
0
Substituting Y from Equation 2 into Equation 1.a. we get equilibrium consumption as:
≡ ( )
This is a solution to the system of (structural) equations and is known as the reduced
equation.
Both Equations 2 and 3 are identities because they define Y0 and C0, respectively. The
advantages of the algebraic over the geometric model from
1. Ease of manipulation. For instance we differentiate Equation 2 with respect to a to get
≡ ( )+ → = + 1 → − =1
3
→ 1− = 1 → (1 − ) =1→ =( )
4
( )
is known as the multiplier.
2. Ease of adding new variables into the equation.
A model usually consists of several equations.
Behavioural, equi) librium conditions, production functions, etc can be included
to give us what we call the structural equations of the model.
Variables determined by the model are called endogenous variables. Variables
determined outside the model but influence the endogenous variables are known
as exogenous variables.
The model also contains certain parameters, which are usually estimated by
econometric methods.
1.2.2. Econometric models: is a stochastic model that includes one or more random
variables in a mathematical model. It will either be linear or non-linear in
parameters. The linearity assumption is important for two basic reasons:
a) computational ease
b) proving mathematical and statistical theorems.
For instance we can represent the consumption function we introduced earlier.
( )= + 5
In this case a and b are the relevant parameters and b is the marginal propensity to
consume, assumed to be constant in this case, and the multiplier is
=( )
=(
)
6
The assumption of linearity in parameters results in a convenient computation.
The most important and common assumption utilised in econometric work is the linearity
assumption. The importance of this assumption should not be exaggerated for the
following two reasons:
1. many of the relationships in economics and related fields are linear: ex., equilibrium
conditions, definitions of expenditure, revenue, costs and profits are linear.
2. this assumption applies only to parameters, and not to variables in the model. Thus
we can introduce, say a quadratic form for the consumption function, as follows:
( )= + + Thus, while this function is linear in parameters it is non-
linear in variables.
3. Often non-linear models can be transformed into linear ones. For example, the Cobb-
Douglas production function of the form
( , )= ↔ ln ( , ) = ln + ln + ln
can be transformed into a linear one by taking a logarithmic transformation as follows.
( , )= + + 8
which becomes linear in parameters.
4
4. any smooth function can be reasonably approximated in an appropriate range by a
linear function (ex., use of Taylor expansion). Consider the following general
production function:
( , ) 9
If this function is continuously differentiable, it can be approximated by a linear function
in an appropriate range by taking the linear portion of the Taylor’s series expansion.
Expanding this equation about the base levels of (K0, L0) yields
( , )= ( , )+ ( , )( − )+ ( , )( − ) 10
where the function and the partial derivatives are all evaluated at the base level. Thus, in
the neighbourhood of the point (K0, L0) we may approximate the function as
( , )≅ + + 11
where, denoting the partial derivatives by marginal products, MPk and MPl, we obtain
= ( , )− ( , ) − ( , )
= ( , )
= ( , ) 12
X Y
0 2500
20 4100
50 5000
100 2500
5
= −500 with probability 0.5
Thus case we cannot determine the values of y given x exactly. The schedule would be
x y Y* (a possible realization)
0 2000 or 3000 2000
20 3600 or 4600 4600
50 4500 or 5500 5500
100 2000 or 3000 2000
indicates the level above or below the average value such that with a degree of
confidence, sales falls in the defined interval. Thus, the actual realisation of y can be any
of the eight possible outcomes. For instance, the actual realisation may y*
The value of can be determined by assuming that y is itself a random variable, with a
particular density function. If the error term, , has a continuous distribution, say normal
distribution, with mean 0 and variance 1, then for each value of x we have a normal
distribution for y, and the observed values of y will be from this distribution. If the
relationship is given by
= + + ; ~ (0,1) 14
f(X)
90% 5%
5%
a + bX - a + bX a + bX + x
then for each value of x, y will have a normal distribution in the interval of the
distribution. The relationship between y and x in such cases is known as a stochastic
relationship.
Average here means the mean or the expected value
[ ]= [ + + ]= + + [ ]= + as x is assumed given
= +
Given the distribution, could be chosen such that a given percentage (in the example,
90%) of the distribution is included in the confidence interval and each of the tails
contain 5% of the distribution.
An econometric model specifies the probability distribution of each endogenous variable,
given the values of
a) the exogenous variables and
6
b) all the parameters of the model.
The distribution we discussed so far was for one y. Now consider all possible values of y.
Each level of y will be have its appropriate distribution, say normally as follows.
f(Y|X)
Distribution of Y
given X
Y = a + bX
90%
confidence
interval
Connecting the upper and lower levels of the 90% confidence intervals leads to a 90%
confidence interval for the entire sales function.
What is known about the stochastic and the linear relationship between sales and
advertisement expediture is summarised by
1. The resulting band representing the confidence levels for sales,
2. The point values on the function, representing the mean values of sales.
The deterministic (non-stochastic) part can then be interpreted as one in which the
variance of the relevant probability distribution vanishes. But this assumption is
unwarranted since, as stated earlier, not all is known as about the relationship.
Algebraically, the stochastic nature of the relationship is represented as
= + + 15
where is an additive stochastic disturbance term. This term plays the role of a chance
mechanism. Each equation in an econometric model, other than definitions, equilibrium
conditions, and identities, is assumed to contain an additive stochastic disturbance term.
The stochastic terms are unobservable random variables: values taken by these terms are
not known with certainty. We assume certain properties about them (namely, their mean,
variance and covariance)
1.2.4. Static and dynamic models
a) Static model: such models involve no explicit dependence on time.
b) Dynamic model: models where time plays an important role in them.
7
Note that the simple addition of time subscripts into the variables under consideration
does not make a model dynamic. It is only if lagged variables and/or differences of
variables over time are part of the model that we say we have a dynamic model.
= + + + is not a dynamic model; while
= + + + is.
1.3 Different types of data for econometric analysis
Economic theory can only go up to the analysis of what is known as comparative static
analysis. It may be able to tell the sign of a change in a variable caused by exogenous
shocks in the system. If we want to answer by how much the variable changes, then we
need to know the parameters (in the reduced or the structural form equations).
To do so, we need relevant data. If one is thinking of estimating the parameters in the
prototype macro model in order to estimate the MPC or the multipliers in the model, one
will need data set on income, consumption, and government expenditure.
The data relevant to a particular study summarise the facts concerning the phenomena
under investigation. These facts may be of different type and derived from different
sources. They may be quantitative or qualitative or a mixture of both. Whatever their
form or source, however, they have to be expressed quantitatively in order to be used in
econometrics. The set of such quantitatively expressed facts is the data of the study.
An econometric model requires data on all the variables in the model in order to be
estimated. One of the most serious problems in making econometric estimates is the
availability of relevant data. Sometime the data may not be available in the form needed
for the econometric model. This forces practitioners in the field to use proxies for certain
variables. Example, the use of time trend as a proxy for changing tastes or technology.
1.3.1. Quantitative versus qualitative data
Data, as a matter of definition, are quantitative. Thus facts, which are already expressed
as numbers, lead directly to data in the form of these numbers or suitable transformation
thereof.
However, some facts may be available only in qualitative expressions. Often such
qualitative facts refer to either or situations.
a) Something either happened or not; an attitude or position was adopted or not; a job
was taken or not, etc.
b) Variables that are qualitative by nature: male or female; married or unmarried; old or
young, etc
c) Variables which show qualitative shifts over time or space: war-time or peace-time;
industrialised or developing countries etc.
Such qualitative information is usually quantified by what are known as dummy variables
that take values of 1 when the qualitative variable has certain characteristics and 0
otherwise.
1.3.2. Time series versus cross section data
8
We amalgamate economic theory, data and statistical methods in econometrics: the
challenge is this mixing of different subjects that makes econometrics challenging:
"Econometrics is much easier without data" a presenter in a seminar according to
Verbeek!.
Cross-section data: these measure a given variable at a point in time for different entities.
The entities can be different countries, regions in a country, firms, industries, families,
households, or individuals. Cross-sectional data is a random sample. If sample is not
random, we have a sample-selection problem
Time-series data: has a separate observation for each successive time periods (at different
dates); yearly, monthly, quarterly, weekly even daily these days we even have hourly. It
is preferable if the spacing among periods is even.
Pooled, cross-section-time-series data: these are data sets that pool the cross-section data
sets over time. Thus, the same sample is observed over successive periods. Will just need
to account for time differences.
In general, the results one obtains from time-series and cross-section data are different,
and results are not comparable. For instance, the income elasticities obtained from time-
series data are less than those obtained from cross-section data. Note, however, that
neither of these estimates is ‘wrong’ as such, and which to use depends on the purpose. If
one wants to see at long-run elasticities it may be appropriate to use those from cross-
section data, while if one is interested to look into short-run forecasting it may be
appropriate to use time-series data.
1.3.3. Nonexperimental versus experimental data
Nonexperimetal data are obtained from observations of a system that is not subject to
experimental control, while experimental data are obtained from controlled experiments.
Distinction between the social and natural sciences is based on the type of data they use.
In natural sciences most phenomena of interest are amenable to experimental control,
which is not usually the case for the former.
Note, however, that most of the data used in economics, such as the national accounts as
well as household surveys are not experimental data. But all can be usefully used in the
construction of models. The fact that most data in economics are nonexperimental in
nature is not free of problems however.
1. degrees-of-freedom problem – the data may include insufficient number of
observations to allow an adequate estimate of the model. It would then be impossible
to replicate data and increase the degrees-of-freedom in nonexperimental situations.
2. multicollinearity problem – variables in a data set may tend to move together rather
than spread out. The most notorious problem in this regard is that observed in time-
series data, where variables tend to exhibit the same trend—cyclical and secular—
over time.
9
3. serial-correlation – when one uses time series data, underlying changes occur very
slowly over time. Thus, the conditions in time periods that are close together tend to
be similar.
4. structural-change problems – this happens when there is a discontinuous change in
the real world and thus the data refer to different populations.
5. errors-in-measurement – variables in a data set may be measured subject to various
inaccuracies and biases. The most fundamental problem in this regard is the error that
arises due to lack of precision in conceptualising the variable. For example, what is
included as consumption may differ not only across countries but also over time in a
country. This necessitates refining the data to make them comparable and consistent.
These and related problems will be treated in the course as the time comes.
10