Chapter1
Chapter1
2
Assessment
• Performance: 10%
• Mid term test + project: 30%
• Final term test : 60%
3
Instruction for your project
• Each group should write and present a short report (max. 15 pages all
included) based on the data and introduction given during the course.
• The report should be organized as follows:
1. Introduction
Give a brief statement about the purpose of the study.
2. Literature Review
- Summarize the main published work concerning your research question.
- It should be a synthesis and analysis of the relevant published work, linked at all times
to your research question.
3. Methodology and data
- An introduction of your model (dependent and independent variables)
- A description of the data must be provided here. You should discuss the
data sources and the definition of variables and report in a table
summary statistics such as minimum and maximum values, means,
standard deviations for each variable.
4. Results: Estimation results are provided in a table and discussed in this
section.
5. Conclusion: you should summarize the results here.
4
Outline
• Chapter 1: Introduction to Econometrics
• Chapter 2: Simple Regression
• Chapter 3: Multiple Regression
• Chapter 4 : Statistical Inference
• Chapter 5: Diagnosing Model Problems
• Reading papers + Replicating empirical Research +
Presentation
7
Examples of empirical research
❑ This thesis examines the relationship between the
probability of financial distress and some specific financial
ratios in order to identify internal factors causing distress
for firms. (Phu Kim Yen, K49 CLC)
• Findings: Size has negative coefficients which are
statistically significant at significance level of 1% in all
estimations. This finding is consistent with previous study
of Ohlson (1980). The author concludes that size affect the
probability of financial distress of Vietnamese listed firms,
especially those on HOSE. In reality, large-cap companies
often have more power in its trading position with
counterparties as well as more approaches to financing
resources. Therefore, it is easier for them to weather
unexpected downturns.
8
Introduction to Econometrics
The Nature and Purpose of Econometrics
1. Why do you need to learn Econometrics?
2. What is Econometrics? What will you
learn from the course?
3. How do you learn? Methodology of
Econometrics
4. Terminology and notation
5. Types of data
9
Why do you need to learn Econometrics?
Economics suggests important relationships, often with policy
implications, but virtually never suggests quantitative
magnitudes of causal effects.
• What is the quantitative effect of reducing class size on
student achievement?
• How does another year of education change earnings?
• What is the price elasticity of cigarettes?
• What is the effect on output growth of a 1 percentage point
increase in interest rates by the Fed?
• What is the effect on housing prices of environmental
improvements?
10
What is Econometrics?
• Econometrics = “economic measurement”.
• “Econometrics may be defined as the social science in
which the tools of economic theory, mathematics, and
statistical inference are applied to the analysis of
economic phenomena” (Goldberger 1964).
11
In this course you will:
• Learn methods for estimating causal effects using
observational data
• Focus on applications – theory is used only as needed to
understand the “why”s of the methods;
• Learn to evaluate the regression analysis of others – this
means you will be able to read/understand empirical
economics papers in other econ courses;
• Get some hands-on experience with regression analysis in
your problem sets.
12
1.2. Methodology of Econometrics
1. Statement of theory or hypothesis .
2. Specification of the mathematical model of the
theory
3. Specification of the statistical, or econometric,
model
4. Collecting the data
5. Estimation of the parameters of the econometric
model
6. Hypothesis testing
7. Forecasting or prediction
8. Using the model for control or policy purposes.
Example
1. Statement of Theory or Hypothesis
• Geometrically,
Example
3. Specification of the Econometric Model of Consumption
• Other variables can affect consumption expenditure: size of family,
ages of the members in the family, family religion → the inexact
relationships between economic variables
• To allow for the inexact relationships between economic variables,
(1) is modified as follows:
• Y = β1 + β2X + u (2)
• where u = the disturbance, or error, term, a random (stochastic)
variable that has well-defined probabilistic properties.
• u may well represent all those factors that affect consumption but
are not taken into account explicitly.
Example
• (2) is an example of a linear regression model, i.e., it hypothesizes
that Y is linearly related to X, but that the relationship between the
two is not exact; it is subject to individual variation. The econometric
model of (2) can be depicted as shown in Figure 2.
Example
4. Obtaining Data
• Y = personal consumption expenditure (PCE)
• X = gross domestic product (GDP)
Example
5. Estimation of the Econometric Model
• Regression analysis is the main tool used to obtain the
estimates. We obtain the estimates
β1 = −184.08 and β2= 0.7064
Yˆ = −184.08 + 0.7064Xi (3)
24
Terminology and notation
• In the literature the terms dependent variable
and explanatory variable are described
variously. A representative list is:
25
1.3. Types of data
• There are three types of data empirical
analysis: time series, cross-section, and panel
data.
• Time series data: a set of observations on the
values that a variable takes at different times.
It is collected at regular time intervals, such
as daily, weekly, monthly, quarterly, annually.
Ex: weekly stock return, monthly interest rate,
GDP growth, CPI and so on.
26
1.3. Types of data
• Cross-section data: data on one or more
variables collected at the same point in time.
Ex: the census of population conducted by
the Vietnam General Statistics Office every 10
years. Profits of listed firms in 2014.
• Panel data/ Pooled data: set of combination of
time series and cross-section.
27
Example of panel data
28
The accuracy of data
The results of research are only as good as
the quality of the data.
29
Measurement Scales of Variables
• Four broad categories: ratio scale, interval scale,
ordinal scale and nominal scale.
• Ratio scale: GDP growth rate, interest rate, ROE.
Most economic variables belong to this category.
• Interval scale: the distance between two time
periods, say (2000-1995)
• Ordinal scale: income class (upper, middle,
lower), grading systems (A,B, C grades)
• Nominal scale: gender (male, female), marital
status (married, unmarried, divorced, separated)
30
1.4 Review of statistics
• Emperical problem: Class size and
educational output
– Policy question: What is the effect on test
scores (or some other outcome measure) of
reducing class size by one student per class
– We must use data to find out (is there any way
to answer this without data?)
31
Example: The California Test Score Data Set
32
33
Do districts with smaller classes have higher test scores?
the two types of districts are the same, against the “alternative”
36
a. Estimation
38
Lehangmyhanh.cs2@ftu.edu.vn 39
c. Confidence interval
40
1.5 Review of probability
a. Population, random variable, and distribution
b. Moments of a distribution (mean, variance,
standard deviation of a deviation, covariance,
correlation)
c. Conditional distributions and conditional
means
d. Distribution of a sample of data draw
randomly from a population: Y1, …, Yn
Lehangmyhanh.cs2@ftu.edu.vn 41
Lehangmyhanh.cs2@ftu.edu.vn 42
Population distribution of Y
• The probabilities of different values of Y that occur
in the population, for ex. Pr (Y=650) (when Y is
discrete)
• Or: The probabilities of sets of these values, for ex.
Pr(640<=Y<=660) (when Y is continuous)
Lehangmyhanh.cs2@ftu.edu.vn 43
Lehangmyhanh.cs2@ftu.edu.vn 44
Lehangmyhanh.cs2@ftu.edu.vn 45
Lehangmyhanh.cs2@ftu.edu.vn 46
Lehangmyhanh.cs2@ftu.edu.vn 47
Lehangmyhanh.cs2@ftu.edu.vn 48
Lehangmyhanh.cs2@ftu.edu.vn 49
Lehangmyhanh.cs2@ftu.edu.vn 50
Lehangmyhanh.cs2@ftu.edu.vn 51
52
Lehangmyhanh.cs2@ftu.edu.vn 53
Distribution of Y1,…, Yn under simple random sampling
Because individuals #1 and #2 are selected at random, the value of
Y1 has no information content for Y2. Thus:
55