A+Short+Guide+to+stata+commands,+M +elsherif
A+Short+Guide+to+stata+commands,+M +elsherif
commands
DR. MOHAMED ELSHERIF
1
Stata language
The standard Stata syntax
[by:] [command] [variablename(s)] [if] [in] [, options]
Variablename(s): can be one variable: [varname], or multiple variables [varlist] based on the command
In: to run the analysis in selected of a range of observations (rows) in the dataset
2
Stata language, helpful commands
by [varlist]
Prefix to run the commands separately in the categories of the specified variables. It comes after
sorting the data using that variable.
by [varlist]: [command] [variablename(s)] [if] [in] [, options]
by Sex: sum Age
sort [varlist]
Sorts the observations in the dataset from low to high based on the variables listed.
sort Sex
bysort [varlist]
It performs sorting and running the commands separately in one step (sort+by)
bysort Sex: sum Age
3
Stata language, helpful commands
[if] condition
to run the analysis only on a subset of the data based on a specific condition
Operators that can be used with if:
4
Stata language, helpful commands
display
To use Stata as a calculator.
display 2+3
help [command/text]
To get information about that command, or search for the text
help display
5
Exploring data
describe [varlist]
It gives name of the dataset, number of variables and observations, list of all variables with the
names, labels, and storage type.
describe
describe Age Drug
inspect [varlist]
It shows the number of missing and non-missing values and the number of unique values.
inspect
inspect Age Drug
6
Exploring data
codebook [varlist]
It shows the name, label, type, range, number of observations and number of missing values of
each variable in the dataset.
codebook
codebook Age Drug
codebook [varlist] , compact
This option shows the number of observations (obs), number of different values (unique),
mean, lowest value (min), highest value (max) and the variable label (label).
codebook , compact
codebook Age Drug , compact
7
Exploring data, listing observations
list [varlist] if
It shows a list of cases with data of all variables mentioned in [varlist] that meet the condition
list Height Weight Marital if Age == 35
list Height Weight Marital
list if Age == 35
8
Summarizing data, numeric variables
summarize [varlist]
It shows the number of observations, mean, SD and range.
sum
sum Age Height Weight
summarize [varlist] , detail
It gives extra details as percentiles, largest 4 values, smallest 4 values, and others.
sum , detail
sum Age Height Weight , detail
mean [varlist]
it gives the mean , standard error, and 95% CI of the mean.
mean Age
9
Summarizing data, numeric variables
tabstat [varlist] , statistics ( name of required statistics )
10
Summarizing data, numeric variables in
groups
bysort [varlist] : summarize [varlist]
It shows the summary statistics for each category of the variables listed after by or bysort.
bysort Sex: sum Age Height Weight
It shows the summary statistics for each category of the variables listed after by or bysort.
bysort Sex: tabstat Age Height Weight , s( mean sd median iqr ) col(s)
12
Summarizing data, categorical variables
(two way)
tabulate [varname1] [varname2]
It gives a two-way table for the TOW specified variables
tab Sex Edu
tabulate [varname1] [varname2] , row
It gives a two-way table for the TOW specified variables with percentages per rows
tab Sex Edu , row
It gives a two-way table for the TOW specified variables with percentages per columns
tab Sex Edu , col
13
Testing for normality of distribution
Shapiro-Wilk test for normality
swilk [varlist]
It is used for normality testing of specified variables
swilk Age Height
14
Testing for equality of variance
F test for equality of variance (homogeneity of variance)
sdtest [varname] , by ([groupingvariable])
It is used for testing equality of variance between groups for a specified variable
sdtest Age , by (Sex)
Levene’s test for equality of variance (homogeneity of variance)
robvar [varname] , by ([groupingvariable])
It is used for testing equality of variance between groups for a specified variable
robvar Age , by (Sex)
Bartlett's test for equality of variance
oneway [varname] [groupingvariable]
It is used for testing equality of variance, and it is part of the oneway command output
oneway Age Sex
15
One-sample t-test
ttest [varname] =[value]
To test if the mean of a variable is equal to a specified value (for a single group)
ttest Age =35
16
Paired samples t test
ttest [varname1] = [varname2]
To test if the mean of the two variables is equal (difference is equal to zero). Data is presented in
pairs
ttest Exampre = Examafter
17
Independent samples t-test
With equality of variance
ttest [varname] , by ([groupingvariable])
To compare the mean of two groups (testing if the mean in the two groups is equal)
Before running the test, we need to check for the normality of distribution and equality of
variance.
ttest Age , by (Sex)
To compare the mean of more than two groups (testing if the mean in the groups is equal)
Before running the test, we need to check for the normality of distribution and equality of
variance. Equality of variance is part of the output “Bartlett's test”
oneway Age Edu
20
Correlation
Pearson’s correlation
pwcorr [varlist]
Spearman’s correlation
spearman [varlist]
To get the Spearman’s correlation coefficient between two variables.
spearman Age Height Weight
22
Mann Whitney test =
Two-sample Wilcoxon rank-sum
ranksum [varname] , by ([groupingvariable])
23
Wilcoxon signed-rank test
It is the non-parametric equivalent of paired samples t-test to compare two variables (paired
data)
signrank Exampre = Examafter
24
Kruskal Wallis test
Kruskal Wallis test
kwallis [varname] , by ([groupingvariable])
It is the non-parametric equivalent of one-way ANOVA to compare more than two groups
kwallis Age , by (Edu)
25
Chi-square test
Chi-square test
tabulate [varname1] [varname2] , chi2
It is the non-parametric test to study the association between two categorical variables
tab Sex Edu , chi2
Chi-square test, with percentages per row
tabulate [varname1] [varname2] , chi2 row
28
Logistic regression
Logistic regression
logit [dependent var] [independent varlist]
logit StatsTraining Age Height Weight
Logistic regression reporting OR
logit [dependent var] [independent varlist] , or
29