0% found this document useful (0 votes)

7 views

ML 2024 Part6 Classification Unsupervised

Uploaded by

jfang1

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

ML 2024 Part6 Classification Unsupervised

Uploaded by

jfang1

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Machine Learning for Microeconometrics

Part 6: Classi…cation and Unsupervised

A. Colin Cameron
Univ.of California - Davis
.

April 2024

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 1 / 43
Course Outline
1. Variable selection and cross validation
2. Shrinkage methods
I ridge, lasso, elastic net
3. ML for causal inference using lasso
I OLS with many controls, IV with many instruments
4. Other methods for prediction
I nonparametric regression, principal components, splines
I neural networks
I regression trees, random forests, bagging, boosting
5: More ML for causal inference
I ATE with heterogeneous e¤ects and many controls.
Part 6. Classi…cation and unsupervised learning
I classi…cation (categorical y ) and unsupervised learning (no y ).
A. Colin Cameron Univ.of California - Davis .ML
() Part 6: Classi…cation and Unsupervised April 2024 2 / 43
1. Introduction

1. Introduction

To date considered supervised learning with a continuous measure (or

a count or binary where model probabilities).
Now consider very brie‡y classi…cation and unsupervised learning.
Classi…cation is supervised learning with y categorical
I The loss function is the number of misclassi…cations rather than MSE.
I Traditional methods select the category with the highest predicted
probability.
I Some ML methods instead directly select the category.
Unsupervised learning there is no y , only x
I Principal components.
I k-means clustering.
Good reference is ISL2.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 3 / 43
1. Introduction

Overview

1 Classi…cation (categorical y )
1 Loss function
2 Logit
3 Local logit regression
4 k-nearest neighbors
5 Discriminant analysis
6 Support vector machines
7 Regression trees and random forests
8 Neural networks
2 Unsupervised learning (no y )
1 Principal components analysis
2 Cluster analysis

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 4 / 43
1. Classi…cation

1. Classi…cation: Overview

Regression methods
I predict probabilities based on log-likelihood rather than MSE
I assign to class with the highest predicted probability (Bayes classi…er)
F in binary case yb = 1 if p
b 0.5 and yb = 0 if p
b < 0.5.
I parametric: logistic regression, multinomial regression
I nonparametric: local logit, nearest-neighbors logit
Discriminant analysis
I additionally assumes a normal distribution for the x’s
I predict probabilities
I use Bayes theorem to get Pr[Y = k jX = x] and Bayes classi…er.
I used in many other social sciences

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 5 / 43
1. Classi…cation

1. Classi…cation: Overview (continued)

Support vector classi…ers and support vector machines

I directly classify (no probabilities)
I machine learning methods developed in the 1990’s
I are more nonlinear so may classify better
I use separating hyperplanes of X and extensions.
Random forests
I in simplest case minimize the classi…cation error rate rather than the
MSE
I in practice better is to use the Gini index or entropy.
Neural networks
I can work very well for complex classi…cation such as images.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 6 / 43
1. Classi…cation 1.1 Loss Function

1.1 A Di¤erent Loss Function: Error Rate

Instead of MSE we use the error rate
I the number of misclassi…cations
1 n
Error rate =
n ∑ i = 1 1 [ yi 6= ybi ],

F where for K categories yi = 0, ..., K 1 and ybi = 0, ..., K 1.

F and indicator 1[A ] = 1 if event A happens and = 0 otherwise.

The test error rate is for the n0 observations in the test sample
1
∑i =1 1[y0i 6= yb0i ].
n0
Ave(1[y0 6= yb0 ]) =
n0
Cross validation uses number of misclassi…ed observations. e.g.
LOOCV is
1 1
∑i =1 Erri = n ∑i =1 1[yi 6= yb(
n n
CV(n ) = i ) ].
n
A. Colin Cameron Univ.of California - Davis .ML
() Part 6: Classi…cation and Unsupervised April 2024 7 / 43
1. Classi…cation 1.1 Loss Function

Classi…cation Table

A classi…cation table or confusion matrix is a K K table of counts

of (y , yb)
In 2 2 case with binary y = 1 or 0
I sensitivity is % of y = 1 with prediction yb = 1 (true positive)
I speci…city is % of y = 0 with prediction yb = 0 (true negative)
I receiver operator characteristics curve (ROC) curve plots sensitivity
against 1 sensitivity as threshold for yb = 1 changes.
F given tradeo¤s between sensitivity and speci…city may choose the
preferred threshold.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 8 / 43
1. Classi…cation 1.1 Loss Function

Bayes classi…er
The Bayes classi…er selects the most probable class
I the following gives theoretical justi…cation.
b (x)) = 1[yi 6= ybi ]
L(G , G
b (x)) is 0 on diagonal of K
I L(G , G K table and 1 elsewhere
I b is predicted categories.
where G is actual categories and G
Then minimize the expected prediction error
EPE b (x))]
= EG ,x [L(G , G
h i
= Ex ∑ k = 1 L ( G k , G
K
b (x)) Pr[Gk jx]

Minimize EPE pointwise (for each value of x)

h i
b (x) = arg ming 2G ∑K
G k =1 L ( G k , g ) Pr [ G k j x ]
= arg ming 2G [1 Pr[g jx]] given 0-1 loss
= maxg 2G Pr[g jx]
So select the most probable class.
A. Colin Cameron Univ.of California - Davis .ML
() Part 6: Classi…cation and Unsupervised April 2024 9 / 43
1. Classi…cation 1.2 Logit

1.2 Logit

Directly model p (x) = Pr[y jx].

Logistic (logit) regression for binary case obtains MLE for
p (x)
ln 1 p (x)
= x0 β.

Statisticians implement using a statistical package for the class of

generalized linear models (GLM)
I logit is in the Bernoulli (or binomial) family with logistic link
I logit is often the default.
Logit model is a linear (in x) classi…er
I yb = 1 if p b(x) > 0.5
I b > 0 since p
i.e. if x0 β b ) and Λ(0) =
b(x ) = Λ (x0 β e0 = 0.5.
1 +e 0

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 10 / 43
1. Classi…cation 1.2 Logit

Logit Example
Example considers supplementary health insurance for 65-90 year-olds.

. * Data for 65-90 year olds on supplementary insurance indicator and regressors
. use mus203mepsmedexp.dta, clear

. global xlist income educyr age female white hisp marry ///
> totchr phylim actlim hvgg

. describe suppins $xlist

storage display value

variable name type format label variable label

suppins float %9.0g =1 if has supp priv insurance

income double %12.0g annual household income/1000
educyr double %12.0g Years of education
age double %12.0g Age
female double %12.0g =1 if female
white double %12.0g =1 if white
hisp double %12.0g =1 if Hispanic
marry double %12.0g =1 if married
totchr double %12.0g # of chronic problems
phylim double %12.0g =1 if has functional limitation
actlim double %12.0g =1 if has activity limitation
hvgg float %9.0g =1 if health status is excellent,
good or very good

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 11 / 43
1. Classi…cation 1.2 Logit

Logit Example (continued)

Summary statistics
I ȳ = 0.58 so not near extreme of most y = 0 or most y = 1.

. * Summary statistics
. summarize suppins $xlist

Variable Obs Mean Std. Dev. Min Max

suppins 3,064 .5812663 .4934321 0 1

income 3,064 22.47472 22.53491 -1 312.46
educyr 3,064 11.77546 3.435878 0 17
age 3,064 74.17167 6.372938 65 90
female 3,064 .5796345 .4936982 0 1

white 3,064 .9742167 .1585141 0 1

hisp 3,064 .0848564 .2787134 0 1
marry 3,064 .5558094 .4969567 0 1
totchr 3,064 1.754243 1.307197 0 7
phylim 3,064 .4255875 .4945125 0 1

actlim 3,064 .2836162 .4508263 0 1

hvgg 3,064 .6054178 .4888406 0 1

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 12 / 43
1. Classi…cation 1.2 Logit

Logit Example
Logit model coe¢ cient estimates
(∂ Pr[y = 1]/∂xj = βj Λ(x0 β)f1 Λ(x0 β)g)

. * logit model
. logit suppins $xlist, nolog

Logistic regression Number of obs = 3,064

LR chi2(11) = 345.23
Prob > chi2 = 0.0000
Log likelihood = -1910.5353 Pseudo R2 = 0.0829

suppins Coef. Std. Err. z P>|z| [95% Conf. Interval]

income .0180677 .0025194 7.17 0.000 .0131298 .0230056

educyr .0776402 .0131951 5.88 0.000 .0517782 .1035022
age -.0265837 .006569 -4.05 0.000 -.0394586 -.0137088
female -.0946782 .0842343 -1.12 0.261 -.2597744 .070418
white .7438788 .2441096 3.05 0.002 .2654327 1.222325
hisp -.9319462 .1545418 -6.03 0.000 -1.234843 -.6290498
marry .3739621 .0859813 4.35 0.000 .205442 .5424823
totchr .0981018 .0321459 3.05 0.002 .0350971 .1611065
phylim .2318278 .1021466 2.27 0.023 .0316242 .4320315
actlim -.1836227 .1102917 -1.66 0.096 -.3997904 .0325449
hvgg .17946 .0811102 2.21 0.027 .0204868 .3384331
_cons -.1028233 .577563 -0.18 0.859 -1.234826 1.029179

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 13 / 43
1. Classi…cation 1.2 Logit

Logit Example (continued)

Classi…cation table manually
I error rate = (737 + 347) /3064 = 1084/3064 = 0.354

. * Classification table manually

. predict ph_logit
(option pr assumed; Pr(suppins))

. generate yh_logit = ph_logit >= 0.5

. generate err_logit = (suppins==0 & yh_logit==1) | (suppins==1 & yh_logit==0)

. summarize suppins ph_logit yh_logit err_logit

Variable Obs Mean Std. Dev. Min Max

suppins 3,064 .5812663 .4934321 0 1

ph_logit 3,064 .5812663 .1609388 .0900691 .9954118
yh_logit 3,064 .7085509 .4545041 0 1
err_logit 3,064 .3537859 .4782218 0 1

. tabulate suppins yh_logit

=1 if has
supp priv yh_logit
insurance 0 1 Total

0 546 737 1,283

1 347 1,434 1,781

Total 893 2,171 3,064

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 14 / 43
1. Classi…cation 1.2 Logit

Logit Example (continued)

Classi…cation table using estat classi…cation postestimation command
I problem: reversed ordering in table makes hard to compare to other
models given later.
. * Classification table
. estat classification

Logistic model for suppins

True
Classified D ~D Total

+ 1434 737 2171

- 347 546 893

Total 1781 1283 3064

Classified + if predicted Pr(D) >= .5

True D defined as suppins != 0

Sensitivity Pr( +| D) 80.52%

Specificity Pr( -|~D) 42.56%
Positive predictive value Pr( D| +) 66.05%
Negative predictive value Pr(~D| -) 61.14%

False + rate for true ~D Pr( +|~D) 57.44%

False - rate for true D Pr( -| D) 19.48%
False + rate for classified + Pr(~D| +) 33.95%
False - rate for classified - Pr( D| -) 38.86%

Correctly classified 64.62%

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 15 / 43
1. Classi…cation 1.3 Nonparametric local logit regression

1.3 Nonparametric local logit regression

Extension of local linear to the logit model

I replace squared residual with log density.
At x = x0 maximize w.r.t. α0 and β0 the weighted logit log density

∑ni=1 wh (xi x0 ) fyi ln Λ(α0 + (xi x0 )0 β0 )

+(1 yi ) ln[1 Λ(α0 + (xi x0 )0 β0 )]g

Stata add-on command locreg in ivqte package.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 16 / 43
1. Classi…cation 1.4 Nonparametric k-nearest neighbors

1.4 Nonparametric k-nearest neighbors

For each observation i consider the K neighboring observations that

have the closest x value and estimate Pr[Y = j ] by the fraction of the
K neighboring observations with y = j.
k-nearest neighbors (K-NN) for many classes
I Pr[Y = j jx = x0 ] = K1 ∑i 2N 0 1[yi = j ]
I where N0 is the K observations on x closest to x0 .
There are many measures of closeness
I default is Euclidean distance between observations i and j
n o1 /2
p
∑a=1 (xai xja )2 where there are p regressors

Obtain predicted probabilities

I then assign to the class with highest predicted probability.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 17 / 43
1. Classi…cation 1.4 Nonparametric k-nearest neighbors

k-nearest neighbors example

Here use Euclidean distance and set K = 11
I and results here don’t use looclass option
I 584 + 394 = 978 are misclassi…ed (versus logit 737 + 347 = 1084).

. * K-nn classification table with leave-one out cross validation not as good
. estat classtable, nototals nopercents // without LOOCV

Resubstitution classification table

Key

Number

Classified
True suppins 0 1

0 889 394

1 584 1,197

Priors 0.5000 0.5000

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 18 / 43
1. Classi…cation 1.5 Discriminant Analysis

1.5 Linear Discriminant Analysis

Developed for classi…cation problems such as “Is a skull Neanderthal

or Homo Sapiens” given various measures of the skull.
Discriminant analysis speci…es a joint distribution for (Y , X).
Linear discriminant analysis with K categories
I assume XjY = k is N (µk , Σ) with density fk (x) = Pr[X = xjY = k ]
F note that only the mean of X varies with the category k
I and let π k = Pr[Y = k ]
The desired Pr[Y = k jX = x] is obtained using Bayes theorem

Pr[Y = k & X = x] π k f k (x)

Pr[Y = k jX = x] = = .
Pr[X = x] ∑K
j =1 π j f j (x )

Assign observation X = x to class k with largest Pr[Y = k jX = x].

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 19 / 43
1. Classi…cation 1.5 Discriminant Analysis

Linear Discriminant Analysis (continued)

Upon simpli…cation assignment to class k with largest

Pr[Y = k jX = x] is equivalent to choosing model with largest
discriminant function
1 0
δ k (x ) = x0 Σ 1
µk µ Σ 1
µk + ln π k
2 k

I bk = xk , Σ c [xk ] and π
b = Var bk = 1
∑N
use µ N i = 1 1 [ yi = k ] .
Called linear discriminant analysis as δk (x) linear in x.
I logit also gives separation linear in x.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 20 / 43
1. Classi…cation 1.5 Discriminant Analysis

Linear Discriminant Analysis Example

638 + 513 = 1141 are misclassi…ed (versus logit 737 + 347 = 1084).

. * Linear discriminant analysis

. discrim lda $xlist, group(suppins) notable

. predict yh_lda
(option classification assumed; group classification)

. estat classtable, nototals nopercents

Resubstitution classification table

Key

Number

Classified
True suppins 0 1

0 770 513

1 638 1,143

Priors 0.5000 0.5000

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 21 / 43
1. Classi…cation 1.5 Discriminant Analysis

Quadratic Discriminant Analysis

Quadratic discriminant analysis

I additionally allow di¤erent variances so XjY = k is N (µk , Σk )
Upon simpli…cation, the Bayes classi…er assigns observation X = x to
class k which has largest
1 0 1 1 0 1 1
δk (x) = x Σ k x + x0 Σ k 1 µ k µ Σ µ ln jΣk j + ln π k
2 2 k k k 2

I called quadratic discriminant analysis as δk (x) is quadratic in x

Use rather than LDA only if have a lot of data as requires estimating
many parameters.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 22 / 43
1. Classi…cation 1.5 Discriminant Analysis

Quadratic Discriminant Analysis Example

815 + 292 = 1107 are misclassi…ed (versus logit 737 + 347 = 1084).

. * Quadratic discriminant analysis

. discrim qda $xlist, group(suppins) notable

. predict yh_qda
(option classification assumed; group classification)

. estat classtable, nototals nopercents

Resubstitution classification table

Key

Number

Classified
True suppins 0 1

0 468 815

1 292 1,489

Priors 0.5000 0.5000

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 23 / 43
1. Classi…cation 1.5 Discriminant Analysis

LDA versus Logit

ESL ch.4.4.5 compares linear discriminant analysis and logit

I Both have log odds ratio linear in X
I LDA is joint model if Y and X versus logit is model of Y conditional
on X .
I In the worst case logit ignoring marginal distribution of X has a loss of
e¢ ciency of about 30% asymptotically in the error rate.
I If X 0 s are nonnormal (e.g. categorical) then LDA still doesn’t do too
bad.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 24 / 43
1. Classi…cation 1.5 Discriminant Analysis

ISL Figure 4.9: Linear and Quadratic Boundaries

LDA uses a linear boundary to classify and QDA a quadratic

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 25 / 43
1. Classi…cation 1.6 Support Vector Machines

1.6 Support Vector Classi…er

Build on LDA idea of linear boundary to classify when K = 2.
Maximal margin classi…er
I classify using a separating hyperplane (linear combination of X )
I if perfect classi…cation is possible then there are an in…nite number of
such hyperplanes
I so use the separating hyperplane that is furthest from the training
observations
I this distance is called the maximal margin.
Support vector classi…er
I generalize maximal margin classi…er to the nonseparable case
I this adds slack variables to allow some y ’s to be on the wrong side of
the margin
I Maxβ,ε M (the margin - distance from separator to training X ’s)
subject to β0 β 6= 1, yi ( β0 + xi0 β) M (1 εi ), εi 0 and
∑ni=1 εi C.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 26 / 43
1. Classi…cation 1.6 Support Vector Machines

Support Vector Machines

The support vector classi…er has a linear boundary (in x0 )

I f (x0 ) = β0 + ∑ni=1 αi x00 xi , where x00 xi = ∑pj=1 x0j xij .
The support vector machine has nonlinear boundaries
I f (x0 ) = β0 + ∑ni=1 αi K (x0 , xi ) where K ( ) is a kernel
I polynomial kernel K (x0 , xi ) = (1 + ∑pj=1 x0j xij )d
I radial kernel K (x0 , xi ) = exp( γ ∑pj=1 (x0j xij )2 )
Can extend to K > 2 classes (see ISL ch. 9.4).
I one-versus-one or all-pairs approach
I one-versus-all approach.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 27 / 43
1. Classi…cation 1.6 Support Vector Machines

ISL Figure 9.9: Support Vector Machine

In this example a linear or quadratic classi…er won’t work whereas
SVM does.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 28 / 43
1. Classi…cation 1.6 Support Vector Machines

Support Vector Machines Example

Use Stata add-on svmachines (Guenther and Schonlau)
224 + 463 = 687 are misclassi…ed (versus logit 737 + 347 = 1084).

. * Support vector machines - need y to be byte not float and matsize > n
. set matsize 3200

. global xlistshort income educyr age female marry totchr

. generate byte ins = suppins

. svmachines ins income

. svmachines ins $xlist

. predict yh_svm

. tabulate ins yh_svm

yh_svm
ins 0 1 Total

0 820 463 1,283

1 224 1,557 1,781

Total 1,044 2,020 3,064

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 29 / 43
1. Classi…cation 1.6 Support Vector Machines

Comparison of model predictions

The following compares the various category predictions.

SVM does best but we did in-sample predictions here
I especially for SVM we should have training and test samples.

. * Compare various in-sample predictions

. correlate suppins yh_logit yh_knn yh_lda yh_qda yh_svm
(obs=3,064)

suppins yh_logit yh_knn yh_lda yh_qda yh_svm

suppins 1.0000
yh_logit 0.2505 1.0000
yh_knn 0.3604 0.3575 1.0000
yh_lda 0.2395 0.6955 0.3776 1.0000
yh_qda 0.2294 0.6926 0.2762 0.5850 1.0000
yh_svm 0.5344 0.3966 0.6011 0.3941 0.3206 1.0000

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 30 / 43
1. Classi…cation 1.7 Regression Trees and Random Forests

1.7 Regression Trees and Random Forests

Regression trees, bagging, random forests and boosting can be used
for categorical data.
Let pbmk be the proportion of training observations in region m that
are from class k.
From ISL2 section 8.1.2 splits can be determined by
Error rate 1 bmk )
max(p
k
Gini index ∑K bmk (1 p
k =1 p bmk )
K
Entropy ∑k =1 mk
b
p ln b
pmk

Stata user-written rforest supports classi…cation in addition to

regression.
Stata user-written boost applies to Gaussian (normal), logistic and
Poisson regression
I it uses as loss function for cross-validation the
pseudo-R 2 = 1 ln L(full model)/ln L(intercept-only model)
I Matthias Schonlau (2005), The Stata Journal, 5(3), 330-354.
A. Colin Cameron Univ.of California - Davis .ML
() Part 6: Classi…cation and Unsupervised April 2024 31 / 43
1. Classi…cation 1.8 Neural Networks

1.8 Neural Networks

Neural networks work very well for classi…cation such as images.

The development of neural nets was originally for classi…cation such
as images.
As neural nets ability improved they supplanted support vector
machines for classi…cation.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 32 / 43
2. Unsupervised Learning

2. Unsupervised Learning

Challenging area: no y , only x.

Example is determining several types of individual based on responses
to many psychological questions.
Principal components analysis.
Clustering Methods
I k-means clustering.
I hierarchical clustering.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 33 / 43
2. Unsupervised Learning 2.1 Principal Components

2.1 Principal Components

Initially discussed in section on dimension reduction.
For p regressors goal is to …nd a few (m ) linear combinations of X
that explain a good fraction of the total variance
p p
∑j =1 Var (Xj ) = ∑j =1 n1 ∑ni=1 xij2 for mean 0 X ’s.
Zm = ∑pj=1 φjm Xj where ∑pj=1 φ2jm = 1 and φjm are called factor
loadings.
A useful statistic is the proportion of variance explained (PVE)
I a scree plot is a plot of PVEm against m
I and a plot of the cumulative PVE by m components against m.
I choose m that explains a “sizable” amount of variance
I ideally …nd interesting patterns with …rst few components.
Easier when used PCA earlier in supervised learning as then observe
Y and can treat m as a tuning parameter.
Stata pca command.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 34 / 43
2. Unsupervised Learning 2.2 Cluster Analysis

2.2 Cluster Analysis: k-Means Clustering

Goal is to …nd homogeneous subgroups among the X .

K-Means splits into K distinct clusters where within cluster variation
is minimized.
Let W (Ck ) be measure of variation
I MinimizeC 1 ,...,C k ∑K
k =1 W (Ck )
p
I Euclidean distance W (Ck ) = n1 ∑K i ,i 0 2C k ∑j =1 (xij xi 0 j ) 2
k

Global maximum requires K n partitions.

Instead use algorithm 10.1 (ISL p.388) which …nds a local optimum
I run algorithm multiple times with di¤erent seeds
I choose the optimum with smallest ∑K k =1 W (Ck ).

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 35 / 43
2. Unsupervised Learning 2.2 Cluster Analysis

ISL Figure 10.5

Data is (x1 .x2 ) with K = 2, 3 and 4 clusters identi…ed.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 36 / 43
2. Unsupervised Learning 2.2 Cluster Analysis

k-means Clustering Example

Use same data as earlier principal components analysis example.

. * k-means clustering with defaults and three clusters

. use machlearn_part2_spline.dta, replace

. graph matrix x1 x2 z // matrix plot of the three variables

. cluster kmeans x1 x2 z, k(3) name(myclusters)

. tabstat x1 x2 z, by(myclusters) stat(mean)

Summary statistics: mean

by categories of: myclusters

myclusters x1 x2 z

1 .8750554 .503166 1.34776

2 -.8569585 -1.120344 -.5772717
3 .1691631 .6720648 -.3493614

Total .0301211 .0226274 .0664539

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 37 / 43
2. Unsupervised Learning 2.2 Cluster Analysis

Hierarchical Clustering

Do not specify K .
Instead begin with n clusters (leaves) and combine clusters into
branches up towards trunk
I represented by a dendrogram
I eyeball to decide number of clusters.
Need a dissimilarity measure between clusters
I four types of linkage: complete, average, single and centroid.
For any clustering method
I it is a di¢ cult problem to do unsupervised learning
I results can change a lot with small changes in method
I clustering on subsets of the data can provide a sense of robustness.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 38 / 43
3. Conclusions

3. Conclusions
Guard against over…tting
I use K -fold cross validation or penalty measures such as AIC.
Biased estimators can be better predictors
I shrinkage towards zero such as Ridge and LASSO.
For ‡exible models popular choices are
I neural nets
I random forests.
Though what method is best varies with the application
I and best are ensemble forecasts that combine di¤erent methods.
Machine learning methods can outperform nonparametric and
semiparametric methods
I so wherever econometricians use nonparametric and semiparametric
regression in higher dimensional models it may be useful to use ML
methods
I though the underlying theory still relies on assumptions such as sparsity.
A. Colin Cameron Univ.of California - Davis .ML
() Part 6: Classi…cation and Unsupervised April 2024 39 / 43
4. Software for Machine Learning

4. Software for Machine Learning

Many ML functions are in Python and R.
Stata 17 covers LASSO, ridge, elastic net, PCA, NP regression, series
regression, splines, LDA, QDA, but add-ons are needed for neural
networks (brain) or random forests (rforest) or support vector
machines (svmachines).
Stata has integration with Python
I Giovanni Cerulli (2020), Machine Learning using Stata/Python,
https://arxiv.org/pdf/2103.03122v1.pdf
F Stata add-on r_ml_stata.ado and r_ml_stata.ado are Stata wrappers
for tree, boosting, random forest, regularized multinomial, neural
network, naive Bayes, nearest neighbor, support vector machine
F https://sites.google.com/view/giovannicerulli/machine-learning-in-
stata
To run R in Stata the user-written Rcall package integrates R within
Stata
I https://github.com/haghish/rcall
A. Colin Cameron Univ.of California - Davis .ML
() Part 6: Classi…cation and Unsupervised April 2024 40 / 43
4. Software for Machine Learning

Some R Commands (possibly superseded)

Basic classi…cation
I logistic: glm function
I discriminant analysis: lda() and qda functions in MASS library
I k nearest neighbors: knn() function in class library
Support vector machines
I support vector classi…er: svm(... kernel="linear") in e1071 library
I support vector machine: svm(... kernel="polynomial") or svm(...
kernel="radial") in e1071 library
I receiver operator characteristic curve: rocplot in ROCR library.
Unsupervised Learning
I principal components analysis: function prcomp()
I k-means clustering: function kmeans()
I hierarchical clustering: function hclust()

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 41 / 43
5. References

5. References

ISLR2: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibsharani
(2021), An Introduction to Statistical Learning: with Applications in R, 2nd Ed.,
Springer.
I Free PDF from https://www.statlearning.com/ and $40 softcover book via
Springer Mycopy.

ISLP: Gareth James, Daniela Witten, Trevor Hastie, Robert Tibsharani and
Jonathan Taylor (2023), An Introduction to Statistical Learning: with Applications
in Python, Springer.
I Free PDF from https://www.statlearning.com/ and $40 softcover book via
Springer Mycopy.

Geron2: Aurelien Geron (2019), Hands-On Machinle Learning with Scikit-Learn,

Keras and Tensor Flow, Second edition, O’Reilly
I excellent book using Python for ML writtten by a computer scientist.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 42 / 43
5. References

References (continued)

ESL: Trevor Hastie, Robert Tibsharani and Jerome Friedman (2009), The
Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer.
I More advanced treatment.
I Free PDF and $40 softcover book at
https://link.springer.com/book/10.1007/978-0-387-84858-7

Chapter 28.6.7-26.8.8 “Machine Learning for prediction and inference” in A. Colin

Cameron and Pravin K. Trivedi (2023), Microeconometrics using Stata, Second
edition.
I covers classi…cation and unsupervised learning only very brie‡y.

A. Colin Cameron Univ.of California - Davis .ML

() Part 6: Classi…cation and Unsupervised April 2024 43 / 43

Carraro Tlb1
87% (15)
Carraro Tlb1
147 pages
Tuo Zhao Notes
No ratings yet
Tuo Zhao Notes
47 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
class
No ratings yet
class
102 pages
Logreg
No ratings yet
Logreg
26 pages
Basic ML Algorithm
No ratings yet
Basic ML Algorithm
74 pages
Logistic Regression
No ratings yet
Logistic Regression
30 pages
Unit II
100% (1)
Unit II
13 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
Chap10 Logistic Regression
No ratings yet
Chap10 Logistic Regression
36 pages
Lecture3 Logistic Regression Regularization
No ratings yet
Lecture3 Logistic Regression Regularization
39 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
Probit and Logit Models Stata Program and Output PDF
No ratings yet
Probit and Logit Models Stata Program and Output PDF
10 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Binary Logistic (5)
No ratings yet
Binary Logistic (5)
29 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
1. Classification
No ratings yet
1. Classification
56 pages
Attachment LogR
No ratings yet
Attachment LogR
10 pages
Slides 3
No ratings yet
Slides 3
25 pages
Logistic Regression Loss
No ratings yet
Logistic Regression Loss
7 pages
Logistic regression by Nirzona
No ratings yet
Logistic regression by Nirzona
11 pages
3. LR, decision tree
No ratings yet
3. LR, decision tree
48 pages
3-LG_Eval
No ratings yet
3-LG_Eval
52 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
No ratings yet
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
79 pages
Lecture Note #9_PEC-CS701E
No ratings yet
Lecture Note #9_PEC-CS701E
41 pages
Chap10_LogisticRegression
No ratings yet
Chap10_LogisticRegression
19 pages
Lec5 Class
No ratings yet
Lec5 Class
14 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
07 Logistics Regression
No ratings yet
07 Logistics Regression
23 pages
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
No ratings yet
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
54 pages
13 Logistic Regression Main
No ratings yet
13 Logistic Regression Main
14 pages
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
No ratings yet
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
20 pages
Advanced Regression
No ratings yet
Advanced Regression
13 pages
Ourse Notes Ogistic Egression: Course Notes: Descriptive Statistics Course Notes: Descriptive Statistics
No ratings yet
Ourse Notes Ogistic Egression: Course Notes: Descriptive Statistics Course Notes: Descriptive Statistics
6 pages
ML Assignment
No ratings yet
ML Assignment
20 pages
Lecture 03 Logistic Regression
No ratings yet
Lecture 03 Logistic Regression
34 pages
Class23 26 LinearClassification NeuralNetworks - 05 15nov2019
No ratings yet
Class23 26 LinearClassification NeuralNetworks - 05 15nov2019
35 pages
Linear Models For Classification: Logreg - PDF - May 4, 2010 - 1
No ratings yet
Linear Models For Classification: Logreg - PDF - May 4, 2010 - 1
7 pages
2223hk1 Slide03 ML2022
No ratings yet
2223hk1 Slide03 ML2022
33 pages
Multivariate classification
No ratings yet
Multivariate classification
7 pages
Intro LOGIT
No ratings yet
Intro LOGIT
46 pages
S4-LogisticRegression-15Jan2025
No ratings yet
S4-LogisticRegression-15Jan2025
25 pages
Lecture15 Binary Dependent Variables
No ratings yet
Lecture15 Binary Dependent Variables
38 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Exam Prep for:: Environmental Statistics with R
From Everand
Exam Prep for:: Environmental Statistics with R
Mzn Lnx
No ratings yet
xxxx-Matrix Analysis for Scientists and Engineers-solutions
No ratings yet
xxxx-Matrix Analysis for Scientists and Engineers-solutions
15 pages
ML 2024 Part1 CrossValidation
No ratings yet
ML 2024 Part1 CrossValidation
43 pages
common-distributions-one-pager
No ratings yet
common-distributions-one-pager
1 page
Measure Theory
No ratings yet
Measure Theory
110 pages
Xxxx Hypothesis Testing
No ratings yet
Xxxx Hypothesis Testing
101 pages
Applied Linear Algebra
No ratings yet
Applied Linear Algebra
121 pages
Anthony Rimmington - From Military To Industrial Complex The Conversion of Biological Weapons' Facilities in The Russian Federation
No ratings yet
Anthony Rimmington - From Military To Industrial Complex The Conversion of Biological Weapons' Facilities in The Russian Federation
35 pages
A Process of Finding and Attracting Capable Applicants For Employment
No ratings yet
A Process of Finding and Attracting Capable Applicants For Employment
63 pages
Product Data Sheet: Link 150 - Ethernet Gateway - 2 Ethernetport - 24 Vdcandpoe
No ratings yet
Product Data Sheet: Link 150 - Ethernet Gateway - 2 Ethernetport - 24 Vdcandpoe
3 pages
BCA
No ratings yet
BCA
32 pages
Description: Tags: 08bylevel
No ratings yet
Description: Tags: 08bylevel
2 pages
The Game of Chess (Goodman, 1914)
No ratings yet
The Game of Chess (Goodman, 1914)
40 pages
Session 02 the Central Nervous System
No ratings yet
Session 02 the Central Nervous System
19 pages
Srinivas Puni, FRM: Director at Quantart FX
No ratings yet
Srinivas Puni, FRM: Director at Quantart FX
2 pages
IP PPT Group 3
No ratings yet
IP PPT Group 3
21 pages
The Emerging Power of Social Media - Prospects and Problems
No ratings yet
The Emerging Power of Social Media - Prospects and Problems
2 pages
Mod 8
No ratings yet
Mod 8
7 pages
Puppetry
50% (2)
Puppetry
183 pages
FICHA TECNICA DE CILINDRO HIDRAULICOS DE SIMPLE EFECTO ANTI FALLOS
No ratings yet
FICHA TECNICA DE CILINDRO HIDRAULICOS DE SIMPLE EFECTO ANTI FALLOS
1 page
Module 1 Highway and Railroad Engg
No ratings yet
Module 1 Highway and Railroad Engg
21 pages
SMT Lead Site Supervision Assesment Checklist
No ratings yet
SMT Lead Site Supervision Assesment Checklist
18 pages
Dog Psychology and Training Part 1
No ratings yet
Dog Psychology and Training Part 1
4 pages
N890.090
No ratings yet
N890.090
1 page
The Consilience Project | Technology is Not Values Neutral: Ending the Reign of Nihilistic Design - The Consilience Project
No ratings yet
The Consilience Project | Technology is Not Values Neutral: Ending the Reign of Nihilistic Design - The Consilience Project
40 pages
SR6015039 Pranamit Sen
No ratings yet
SR6015039 Pranamit Sen
2 pages
Gwalior Bhopal: Departure Arrival
No ratings yet
Gwalior Bhopal: Departure Arrival
3 pages
Group 1 - Heineken Report - CPM
No ratings yet
Group 1 - Heineken Report - CPM
17 pages
Vehicle Quantity
No ratings yet
Vehicle Quantity
1 page
Mathematical Logic Lecture Notes
No ratings yet
Mathematical Logic Lecture Notes
11 pages
7 Sacred Resins
No ratings yet
7 Sacred Resins
2 pages
CV AJ Menzies Update 02142012
No ratings yet
CV AJ Menzies Update 02142012
8 pages
9.7 - Portfolio PDF
No ratings yet
9.7 - Portfolio PDF
15 pages
Syllabus-Lhs Madrigal 2022
No ratings yet
Syllabus-Lhs Madrigal 2022
2 pages
METHOD-STATEMENTS-ELECTRICAL - Testing
No ratings yet
METHOD-STATEMENTS-ELECTRICAL - Testing
3 pages
ISO 17025 Training Day 1
No ratings yet
ISO 17025 Training Day 1
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.