Introduction To Stata: Li-Pin Juan
Introduction To Stata: Li-Pin Juan
Li-Pin Juan
Contents
1
Basics of Stata
1.1
Example Commands
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
1.3
Group Average
1.4
1.5
1.6
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
11
1.7
1.8
. . . . . . . . . . . . . . . . . . . .
11
1.9
12
1.9.1
12
1.9.2
1.9.3
. . . . . . . . . . . . . . . . . . .
15
15
16
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.11.1 From wide to long format: listing your data in country-year order
1.11.2 From long to wide format
. .
16
. . . . . . . . . . . . . . . . . . . . . . . .
19
. . . . . . . . . . . . . . . . . . . . . .
21
22
1.14 Documenting your code, adding breaking lines for readibility and so on
. . .
25
25
13
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
26
. . . . . . . . . . . . . . . . . . . . . . . .
27
Graph
27
2.1
Example Commands
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.2
Scatter Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
2.3
Bubble Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
2.4
29
2.5
. . . . . . . . . . . . . . . . . . . . .
29
2.6
31
2.7
. . . . . . . . . . . . . . . . . . . . . . . . . .
32
2.8
33
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9
. . . . . . . . . . . . . . . . . . . . . . .
34
. . . . . . . . . . . . . . . . . . . . . . . . . .
34
. . . . . . . . . . . . . . . . . . . .
36
36
37
. . . . . . . . . . . . . . . . .
Programming
39
3.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.2
Looping commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
3.3
41
3.4
41
. . . . . . . . . . . . . . . . . . . . . . . .
Reference
41
Basics of Stata
1.1
Example Commands
use
save
oldle.
option means to discard the existing le with the same lename.
exists, use
compress
save
newdata .
If no le already
Convert all variables to their most ecient storage types to save memory
browse
The replace
save command.
Open the spreadsheet-like Data Browser for viewing the data (no editing
allowed).
browse
price mileage
if
year
year
price
and
mileage
later.
list
Lists the data in table format. Use list, display for displaying large datasets.
list x y in 10/50
Lists the
and
and
list x y in -4/l
Lists the
observations.
1 In the Data Editor or Data Broswer, string variable values appear in red, distinguishing them from
numeric (black) or labeled numeric (blue) variables.
tabulate x if y < 65
tabulate var , gen(dum ) will generate whole sets of dummy variables dum1 , dum2 ,
and so on.
You can nd out what the values the dummy variables stand for from
describe.
table
var1 var2 ,
c(mean
for
year , c(mean var3 ) shows a one-way table with means of var3 for each year
tabulate variable , missing counts the number of missing obersvations under the
column variable .
table
describe
codebook
examines the variable names, labels, and data to produce a codebook de-
scribing the dataset. It is a convenient way to get information about string variables.
label variable
drop _all
dropmiss
dropmiss, obs
Drop from the dataset in memory any observation that have missing
drop in 12/13
drop
format
price
stringvar.
8-digit width, with two digits always shown after the decimal.
plus
unemp mlife ife Gives the correlation between unemp mlife and ife.
3
relational operator
meaning
==
logical operator
is equal to
&
and
!=
or
>
is greater than
<
is less than
>=
<=
drawnorm m1 ,
generate
newvar
form distribution
sample 10
Drops all the observations in memory except for a 10% random sample.
replace oldvar = 10 *
oldvar
n = 40.
previous values.
clear
sort
x.
place unemp mlife ife pop Controls the order of variables within a dataset.
drop if regexm( var1 , "E.S.") > 0 drops observations that contains specic string
variables. In this case, we drops those observations whose variable var1 contains E.S.
order
1.2
list if
unemp
mlife
< 8 | (
>=75 &
ife
>= 81)
The condition
drop if
2 Stata permits up to 27 dierent missing values codes. The other 26 codes are represented internally as
numbers even larger than .
canada2 , clear3
tabulate type
tabulate type , generate(type ) The tabulate command will create dummy variables
use
list
generate option.
generate
mlife2
= 0 if
mlife
replace
mlife2
= 1 if
mlife >= 75 & mlife <. Add the option mlife <. to retain
generate mlife3 = autocode(mlife , 3,70,76) Provides automatic grouping of measurement variables. It creates a new ordinal variable
list
1
2
3
4
5
6
7
8
1.
2.
3.
4.
5.
9
10
11
12
13
14
6.
7.
8.
9.
10.
15
16
17
18
19
11.
12.
13.
++
|
place
mlife
mlife2
mlife3 |
||
|
Canada
75.1
1
76 |
|
Newfoundland
73.9
0
74 |
| Prince Edward Island
74.8
0
76 |
|
Nova Scotia
74.2
0
76 |
|
New Brunswick
74.8
0
76 |
||
|
Quebec
74.5
0
76 |
|
Ontario
75.5
1
76 |
|
Manitoba
75
1
76 |
|
Saskatchewan
75.2
1
76 |
|
Alberta
75.5
1
76 |
||
|
British Columbia
75.8
1
76 |
|
Yukon
71.3
0
72 |
| Northwest Territories
70.2
0
72 |
++
mlife2
and ordinal
mlife.
mlife3
variables correspond
1.3
Group Average
year,
1.
sort
2.
by
egen
==1
1.4
age,
year
<=60 &
sex
age by
sec.
and sex by
year
year :
inc,
inc_male
inc )
= mean(
if
age
>= 25
by .
& age
Example 1:
1.
2.
3.
ife - mlife
label variable gap Female minus male life expectancy gap
format gap %4.1f
generate
gap
Example 2:
1.
replace
age
age
for observation
number 1453.
2.
replace
age
born
if
age
>=. |
age
< 2008 -
variable
age
= 2008 -
born
Replaces values of
age
wbdata , clear
generate type = 1
replace type = 2 if place == USA | place == Canada
replace ty pe = 3 if place == USA
label variable type Country Name
label dene typelab 1 Others 2 Canada 3 USA
label values type typelab Label values specify to which variable these labels apply
use
type ).
8.
list
typelab,
1.
1
2
3
4
5
6
7
8
9
10
11
2.
tabulate novint
novint ,
value labels without displaying value labels. It is shown that the missing value codes
are 98 and 99 for Don't know and No answer. These missing values essentially skew
the statistics of the dataset.
1
2
3
4
5
6
7
8
9
10
11
12
3.
Interest in |
Nov 2006 |
election |
Freq.
Percent
Cum.
+
1 |
102
19.81
19.81
2 |
174
33.79
53.59
3 |
171
33.20
86.80
4 |
60
11.65
98.45
98 |
5
0.97
99.42
99 |
3
0.58
100.00
+
Total |
515
100.00
generate
novint2
novint
novint2
mvdecode
novint2 ,
novint2
into missing values code, .c and .d In other cases, you may use a list of variables in
novint2 . 5
tabulate novint2 , missing
place of
5.
1
2
3
4
5
6
7
novint2 |
Freq.
Percent
Cum.
+
1 |
102
19.81
19.81
2 |
174
33.79
53.59
3 |
171
33.20
86.80
4 |
60
11.65
98.45
.c |
5
0.97
99.42
5 These may stand for dierent reason the values are missing, such as responses of Redused to answer
or Don't know, and Not applicable on a questionnaire.
8
9
10
.d |
3
0.58
100.00
+
Total |
515
100.00
In Stata, the missing values do not enter into calculations of statistics, such as means
or correlations, which solves the inated mean caused by original data.
6.
1.5
novint.
generate
expinc
the exponential of
= exp(
income.
income )
expinc,
equal to
function
abs(x)
exp(x)
normal(z)
int(x)
ln(x)
date(s1 , s2 [, y])
mdy(M, D, Y )
normalden(x, m, s)
absolute value of
sum(x )
exponential (e to power)
cumulative standard normal
truncating
natural (base
toward zero
e)
logarithm
display exp(2)+10
Returns
if
if false.
Creates a variable
as the maximum
return list
from
summarize
summarize
and
code(x ,a ,b )
s1
M , D,
unemp
Retrieves the list of statistics after
summarize.
generate
unempdi
unemp
- r(mean)
egen
stdpop
dardized values of
bysort
place :
egen
mlifeMed
= median(
stdpop,
mlifeMed, equal to the median of each subgroup with the same place
egen
avg
value.
= rowmean(
w , x, y ,
and
z.
total = rowtotal(w ,x ,y ,z )
egen xrank = rank(x ) Creates a new variable xrank , holding ranks corresponding
egen
to values of
x: xrank = 1 for the observation with highest x, xrank = 2 for the second
1.6
Labeled-
2.
list
make foreign
foreign.
foreign
make,
and also a
remains a numeric
make foreign , nolabel Shows the underlying numbers beneath the labels.
encode make , generate(makenum ) Generates a labeled-numeric variable makenum
list
decode
foreign ,
make.
generate(
labeled-numeric variable.
6.
7.
8.
list
string variables do not matter. Only labeled-numeric and numeric variables enter into
calculation.
9.
drop if regexm(
var1 ,"E.S.")
string comparison.
10.
generate
generate
tostring
year month , replace converts variables from numeric to string format. See
generate
if length(
gen
var2
var1 ,"[.\}\)\*a-zA-Z]+","")
= regexr(
from var1 below use the following command (in which we are replacing all string and
special characters with nothing)
Example 1 (Converting string variables embedded with numeric characters):
race
6.
7.
8.
9.
10.
9
10
11
12
13
14
15
11.
12.
13.
14.
15.
race
++
| id
gender
race
schtyp
read
science |
||
| 113
m
1
pub
44
63 |
| 50
m
3
pub
50
53 |
| 11
m
2
pub
34
39 |
| 84
m
1
pub
63
. |
| 48
m
3
pub
57
50 |
||
| 75
m
1
pub
60
53 |
| 60
m
X
pub
57
63 |
| 95
m
1
pub
73
61 |
| 104
m
1
pub
54
55 |
| 38
m
3
pub
45
31 |
++
generate
numeric values.
destring
exible way for coverting string variables to numeric. This line accomplishes the same
thing as above.
destring, replace
variables except for
Alternatively, you may go all out and reach the same result for all
race , gender
and
schtyp .
10
real(
real() encounters a non-numeric value, it sets the variable equal to missing in that
case and moves on. Without advanced instruction, destring removes the specied nonnumeric characters and move on, which means that, for example, a4 can be converted
to 4. This property of
destring
1.7
obsID
sort obsID If the data is later rearranged in aother order, we can return to the earlier
order as listed in
obsID
identication numbers that store the order of observations at an early stage of dataset
development greatly facilites later data management.
display mpg[3]
generate
divar1
var1
var [_n-1]
var1
1.8
inle
variable-list
ASCII le, such as lename.raw, in which the values are separated by space(s). variablelist is optional, and applied when you want to assign a list of names to imported variables. inle can only handle string variables, whether spaces embedded within them
or not, enclosed by double quotes.
inle str18
exists, a
variable
inle
using
str# statement needs to proceed its variable name. Here, str18 indicates the
make
using
gender ),
Table 2: Append
id
var1
var2
1991
var3
one.dta
1992
1993
+
id
var1
1994
var2
var3
two.dta
1995
1996
insheet
variable-list
using
load comma-delimited spreadsheet-like data with the rst row of the le containing
a single-words variable name for each column (i.e. the column headings). Note that
insheet could not handle a le that uses a mixture of commas and tabs as delimiters.
insheet variable-list using lename.raw , tab Issue this command to load tab-delimited
data written by spreadsheet programs, such as Excel. If no variable name is shown in
the rst row, Stata automatically assigns variable names
inx str make 1-13 mpg 15-16 weight 18-21 price 23-26 using
auto.raw, clear
If the dataset is created in xed-column format, where the values are not necessarily
delimited at all, but occupy predened column positions. For example,
1
2
AMC Spirit
22 2640 3799
Buick Century 20 3250 4816
The
make
compress
Once data are loaded in memory, issue this command to ensure no variable
1.9
1.9.1
As long as the variables in two les are the same and the only thing you need to do is to
add observations from one le to the other le, this is vertical combination.
use
one
append using
two
(see Table 2)
12
var1
var2
var3
one.dta
2
.
.
.
id
1.9.2
2
.
.
.
var4
var5
var6
two.dta
use
one , clear
two
(see Table 3)
two.dta
id
two.dta.
one.dta
matches perfectly
one.dta
is
is Mary.
variable
one , clear
merge 1:1 id using two
use
id
variables
use
id sex
by id sex : assert _N==1
sort
13
id
variables is
If the two dataset have more than one variable in common, it is desired to introduce more
variables as the id variable to avoid the problem that Bob in one dataset, however he is
identied, means Mary in the other. In this case, we can introduce the common variable
one , clear
merge 1:1 id sex
sex.
use
variables.
using
two
It is undesired to code
merge 1:1
id sex
using
two ,
keep(matched)
sort id
// test 3:
by
id :
assert _N==1
keep if _merge==3
generate
sort
list in 1/5
drop
or
sort dif f
list in 1/5
di
list in -5/1
Example: many-to-one merge
use
personid
use
that appreas in
sample.dta
also appreas in
payroll.dta.
division date
by division date: keep if _n==1
merge 1:m division date using payroll , keep(master match) Checks to see if
sort
every
division
that appreas in
sample.dta
also appreas in
payroll.dta.
With three key variables, the possible pairs are (personid, date), (personid, division), and
(division, date). We may follow the same procedure to examine which pair leads to the data
that loses the least information due to the merge, and to spot potential inconsistency in the
les.
sample , clear
sort division date
by division date : keep if _n==1
merge 1:m division date using payroll , keep(master match)
use
1.9.3
egen
c = count(_n), by(id year month ) Counts the sample size for each combi-
nation of
c>1
browse if c > 1
egen tag = tag(id year month ) Creates an indicator (a dummy variable) which
list if
will be 1 for only one observation per station, and 0 for all other observations of the
same station.
tag
drop tag
keep if
1.10
return list
ereturn list
display 3*_b[x ]
15
When programming, it can be useful to remember that Stata saves the results of the
latest
summarize command. Among the ones easily accessible are: _result(1), the
number of observations;
sult(4), the variance; _result(5), the minimum observation; and _result (6), the
maximum observation. Typing display _result(3) after running the summarize command on var 1, for example, would tell Stata to display the mean of var1. Besides,
when using count command, we can use
couting number.
1.11
In this section, we like to reshape datasets prepared by the economic organizations, such
as the World Bank, and the International Monetary Fund. Figure 1 shows the sample le.
1.11.1
Before putting this data in Stata, we need to (1) add a character to the column headings,
and (2) replace non-numerical record (such as .. indicating a missing value) in any numerical variable with a blank. Both of the missions can easily be completed through Excel's Find
and Replace dialogue window with highlighting the cells taken by the column headings.
Step 1: Make sure the numbers are numbers. Go to Format Cells, select Number in
the Number tab and click OK. Then save the le as *.csv format. We name it as gdp.csv
(see Figure 2).
insheet using
and
gdp.csv It is clear that due to the inserted blanks under column x1995
and
ax1995 = real(x1995 )
generate ax1996 = real(x1996 )
drop x1995 x1996
rename ax1995 x1995
generate
16
x1996
17
rename
variable
mixed with
var1
and
var2
ax1996 x1996
Now, to reshape our data from the wide form to the long one, the procedure is as what
follows:
generate
id
= _n
reshape long
should be put in variables called year. If you have more than one variable you can
list them as follows:
reshape long
var1
var1
and
and
var2
var2, respectively.
x y z , i(id ) j(year ).
variable
year
and
country.
The
encode
of each variable.
2.
label save
numvar
using
varname ,
created.
18
varname.do
is
varname.do from
var1
and
var2 , separately.
numvar
Step 3:
1.
2.
3.
4.
5.
6.
1.11.2
Below shown is a more complex example where the data is in the long form and we like
it to be changed to the wide form (see Figure 6).
1.
2.
use
generate
if length(
creates the date variable holding the value of, for example, 2003_01, to imply that the
data is dated in January, 2003.
4.
replace
if date==""
19
20
5.
6.
7.
year month
order id date
reshape wide r i , i(id ) j(date ) str We add 'str' because date is a string variable.
drop
[Figure
??]
The resulting dataset shows that returns and interest rate are together for the same
period. If you want to have all returns and interst rates together, you need to take one
more step:
1.
xpose, clear
name.
varname
sort _
3.
xpose, clear
4.
order
5.
outsheet
id r*
outsheet
i*
1.12
_var-
varname
2.
6.
id
testr.csv.
using
using
testr.csv ,
Sometimes you have data les that need to be collapsed to be useful to you. For example,
you might have student data but you really want classroom data, or you might have weekly
data but you want monthly data, etc.
1.
2.
list
1
2
3
4
5
6
7
8
9
10
3.
1.
2.
3.
4.
5.
6.
7.
8.
9.
famid
1
1
1
2
2
2
3
3
3
collapse
kidname
Beth
Bob
Barb
Andy
Al
Ann
Pete
Pam
Phil
birth
1
2
3
1
2
3
1
2
3
age
9
6
3
8
6
2
6
4
2
wt
60
40
20
80
50
20
60
40
20
sex
f
m
f
m
m
f
m
f
m
age , by(famid ) Collapses across all the observations to make a single record
21
4.
avgage
collapse (mean)
5.
famid
1
2
3
1.
2.
3.
collapse (mean)
age
is named
average and
avgage
6
5.333333
4
(count)
After reloading the same dataset, this command gets the average for
the number of kids
numkids
age
and
wt, and
Suppose we want a count of the number of boys and girls in the family. The procedure is as
follows: (1) creates respective dummay variable for boy and girl, which holds value of 1 (0)
if true (if not). (2) The sum of the boy (girl) dummy variable is the number of boys (girls).
1.
2.
tabulate
1
2
3
4
5
6
7
8
9
10
3.
1.
2.
3.
4.
5.
6.
7.
8.
9.
3
4
1.13
famid
1
1
1
2
2
2
3
3
3
collapse (count)
1
2
1.
2.
3.
famid
1
2
3
sex
f
m
f
m
m
f
m
f
m
sexdum1
1
0
1
0
0
1
0
1
0
sexdum2
0
1
0
1
1
0
1
0
1
girls
2
1
1
numkids
3
3
3
The trick to inputting dates in Stata is to forget they are dates, and treat them as character strings, and then later convert them into a Stata date variable. For example, you have
dates1.raw
1
2
3
4
looking like
1.
inx str
name
1-4 str
bday
6-17 using
bday is a string variable, you cannot do any kind of date computations with it until
you make a date variable from it. You can generate a date version of bday using the
date() function.
2.
generate
bday.
birthday
Jan 1, 1960 which is convenient for the computer storing and performing date computations.
generate
the change in centuries and indicates the last year of the series.
3.
format birthday %d Tells Stata that birthday should be displayed using the %d
format to make it easier for humans to read.
4.
generate
to quarterly data.
Even for datasets with messy dates, such as the one below, Stata can hadle them well:
1
2
3
4
John
Mary
Kate
Mark
Jan 1 1960
07/11/1955
11.12.1962
Jun/8 1959
2.
3.
list
4.
format
1.
inx str
birthday
%d
4121990
4.12.1990
Apr 12, 1990
Apr12,1990
April 12, 1990
4/12.1990
Apr121990
2.
3.
list
4.
format
1.
inx str
birthday
%d
23
1
2
3
4
5
6
7
8
bday
4121990
4.12.1990
Apr 12, 1990
Apr12,1990
April 12, 1990
4/12.1990
Apr121990
1.
2.
3.
4.
5.
6.
7.
birthday
12apr1990
12apr1990
12apr1990
12apr1990
12apr1990
12apr1990
.
Note that Stata was able to handle Apr12,1990 even though there was not a delimiter
between the month and day. The only date that did not work was Apr121990 and that
is because there was no delimiter between the day and year. As you can see, the
date(
) function can handle just about any date as long as there are delimiters separating
the month day and year.
On the other hand, we may have the month, day, and year stored as numeric variables in a
dataset. For example, look at the dataset
1
2
3
4
dates4.raw
below:
7 11 1948
1 1 1960
10 15 1970
12 10 1971
1.
2.
3.
What if the year data is stored in the form that the rst two digits of 1970, for example, is
omitted? For example, the dataset
dates5.raw
7 11 48
1 1 60
10 15 70
12 10 71
3.
4.
list
5.
gen
1.
2.
inx
= month(
birthday ) Conversely, we can have the month, day, year, and the
gen
birthday )
= day(
24
7.
8.
9.
y = year(birthday )
gen weekday = dow(birthday )
gen age2000 = (mdy(1,1,2000)-birthday )/365.25 Calculates everyone's age on
gen
January 1, 2000.
10.
gen
- [1 < day(
1.14
Comments can begin with an asterisk (*) and end with a carriage return, or they can
begin with two slashes (//) and end with a carriage return, or they can be bracketed
by (/*) and (*/) and span as many lines as needed.
In a do-le (the detail is given later), Stata assumes that each command is no more
than 1 line long, end that each line ends with a carriage return (when you press the
Enter key, a text editor inserts a carriage return symbol).
If you want to type a command more than one line long, you need to tell Stata to look
for a semi-colon with
you must end each command, whether one or more lines long, with a semi-colon. To
switch back to carriage return, use
#delimit cr.
There are other ways to continue a single command across more than one line. One
way is to comment out the carriage return - type
set more o
1.15
1.
2.
3.
4.
ENEU_xxxx.dta
tabe skill , gen(skd ) creates a set of dummy variables for skill
reg inc skd * edad runs regression analysis
predict r , resid assigns prediction residuals to r
use
25
sd_regression = sd(r )
gen Educontr = _b[skd1 ]*skd1 +_b[skd2 ]*skd2 +_b[skd3 ]*skd3 +_b[skd4 ]*skd4
egen
5.
6.
calculates the tted values of income when education level is controlled for
egen
7.
1.16
1.
2.
3.
4.
5.
6.
7.
8.
9.
sdEducontr
Educontr )
= sd(
ENEU_xxxx.dta
sort year
by year : gen totp1_f=1 if edad >=25 & edad <=60 & sex ==2
by year : gen empl_f = 1 if edad >=25 & edad <=60 & inc >0 & sex ==2
by year : egen sum_tot_f = sum(totp1_f )
by year : egen sum_empl_f = sum(empl_f )
by year : gen empl_rate_f = sum_empl_f /sum_tot_f
gen vmp = empl_rate_f
table year , c(mean vmp p90 vmp p50 vmp p10 vmp ) f(%9.3f ) creates a table
use
10.
1
2
3
4
5
6
7
8
9
10
11
12
13
Year of
|
Survey
| mean(sex)
+
1987 |
1.523618
1988 |
1.524052
1989 |
1.523699
table
sd
vmp ) f(%9.3f )
Gender
|
and Year |
of Survey | mean(inc)
+
1
|
1987 |
.1715026
1988 |
343.8696
1989 |
455.3309
+
2
|
1987 |
.0520051
1988 |
94.3484
26
14
15
1989 |
123.9082
1.17
Graph
2.1
Example Commands
histogram
historgram
10 units wide, starting from 0. Adds a normal curve based on the sample mean and
standard deviation.
historgram
includes a
kdensity
price ,
total
x,
by (
region , total )
region,
generate(
xpoints , xdensity )
which the
by overlaying two
with
yx
|| scatter
yx
country.
on
ltci. To add the condence interval on the basis of the standard error of the
forcast, substitute
it is called the ()-binding notation. It doesn't matter which notation you use.
make.
7 The tag subcommand along with the generate() option ags duplicate observations by assigning 1 to
duplicacy in the variable duple
27
structs scatterplot of y against x, with x axis labeled at 0, 10, ..., 100. y axis is labeld
at -3, -2, ..., 6, with labels written horizontally instead of vertically (the default).
x2.
foreign ) can
be written as graph twoway qtci mpg weight , stdf by(foreign ) || scatter mpg
weight
graph twoway scatter y x1 [fweight = population ], msymbol(Oh) Draws a
graph twoway
scatterplot of
qtci
against
x1.
population.
Type
y time
A time plot of
against
time.
y1 y2 time
have the same scale, with connected data points without data point markers shown.
scatter
y1 time ,
1
2
3
4
5
y2 time ,
variables that have dierent scales, by overlaying two individual line plots.
mpg weight ,
foreign ,
by(
2.2
yaxis(1) || line
Scatter Plot
2.3
Bubble Plot
1
2
3
4
5
6
2.4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
2.5
1
2
3
4
5
6
#delimit ;
line le_wm year, yaxis(1 2) xaxis(1 2)
29
30
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
The attribute
axis(2)
because Stata does not like to draw grid lines too close to the axis.
angle(horizontal)
The attribute
ylabel() assumes.
Discards the x-axis title that is associated with the right y-axis.
le_bm.
2.6
1
2
3
4
5
6
7
8
9
10
11
12
le_wm
and
13
14
15
16
;
#delimit cr
label(1 acutal)
position(2)
2
3
4
5
6
7
8
9
10
11
12
13
ring(0)
rows(2)
2.7
14
15
16
17
18
19
20
21
22
23
24
25
26
27
pos( )
scatter
position( ).
line
hat gnppc , sort If the data are already in the order of gnppc, the sort is unnec-
essary.
xsca(log)
2.8
1
2
3
4
2.9
1
2
3
4
5
6
7
8
9
2.10
1
2
3
4
5
6
7
8
9
34
35
10
11
12
2.11
1
2
3
4
5
6
7
8
9
10
To give an informative illustration, it is desired to sort total spending in descending or ascending order, and thus we issue the command
descending) stack.
2.12
1
2
3
4
country ,
over(
sort(
total )
5
6
7
8
9
10
11
12
over(region)
bargap(30) nofill
ytitle("Degrees Fahrenheit")
legend( label(1 "July") label(2 "January") )
title("Average July and January temperatures")
subtitle("by region and division of the United States")
note("Source: U.S. Census Bureau, U.S. Dept. of Commerce") ;
#delimit cr
tempjan ,
over(
division )
region )
over(
The comparison of
tempjuly and tempjan is made in each combination of division and region. Variable
region provides the uppermost groups which is further decompsed into its subgroups,
division, thus being written in the end of the command.
region.
temperature,
tempjuly
and
tempjan
1
2
3
and
month,
bargap(-30) noll by
2.13
division
4
5
6
7
8
9
10
38
Programming
3.1
A macro is a string of characters that stands for another string of characters. For example,
you can use the macro
xlist
price weight .
local macro can be accessed only within a given do-le or in the interactive session.
Global macros are dened with the
global macro, put the character
global
Possible senarios where we may call for the use of global macros include:
1. When tting several dierent models with the same regressor list is to be undertaken,
substituting the list with a global macro makes a single change for all instances easier.
2. When a key parametres used commonly in several models, we may need to change its
value back and forth for many times. For example, in the early stages of our analysis,
exploratory data analysis might set the parameter to a small value such as 5 to save
computational time, whereas nal results set the parameter to an appropriately higher
value such as 100.
A macro can be used in place of a scalar so that the macro is not dropped after the execution
of the
local
command.
local macro, enclose the macro name in single quotes. For example, consider a regression on
several regressors. We dene the local macro
enclosing the name in single quotes as
`xlist'.
We can also dene a local macro through evaluation of a function. For example,
39
1
2
local z = 2+2
display `z'
Local macros apply only to the current program and have the advantage of no potential conict with other programs.
3.2
Looping commands
struct loops over items in a list, where the list can a list of variable names (possibly given
in a macro) or a list of numbers. The
numbers. A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
while loop continues until a user-specied condition is not met. For example,
40
3.3
gives the information on IV commands available both within Stata and on the Internet.
Left-clicking on the highlighted text st0030_3 you will see a new window for details in
installation. By left-clicking on the
ado-directory.
3.4
http://www.stata.com/statalist/archive/2008-01/msg00837.html
Reference
Microeconometrics Using Stata, A. Colin Cameron and Pravin K. Trivedi, Stata Press
Statistics with Stata version 10, Lawrence C. Hamilton, Brooks/Cole, Cengage Learning, 2009
Stata Tutorial. Carolina Population Center, the University of North Carolina at Chapel
Hill. from http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial
41