ML 2024 Part6 Classification Unsupervised
ML 2024 Part6 Classification Unsupervised
A. Colin Cameron
Univ.of California - Davis
.
April 2024
1. Introduction
Overview
1 Classi…cation (categorical y )
1 Loss function
2 Logit
3 Local logit regression
4 k-nearest neighbors
5 Discriminant analysis
6 Support vector machines
7 Regression trees and random forests
8 Neural networks
2 Unsupervised learning (no y )
1 Principal components analysis
2 Cluster analysis
1. Classi…cation: Overview
Regression methods
I predict probabilities based on log-likelihood rather than MSE
I assign to class with the highest predicted probability (Bayes classi…er)
F in binary case yb = 1 if p
b 0.5 and yb = 0 if p
b < 0.5.
I parametric: logistic regression, multinomial regression
I nonparametric: local logit, nearest-neighbors logit
Discriminant analysis
I additionally assumes a normal distribution for the x’s
I predict probabilities
I use Bayes theorem to get Pr[Y = k jX = x] and Bayes classi…er.
I used in many other social sciences
The test error rate is for the n0 observations in the test sample
1
∑i =1 1[y0i 6= yb0i ].
n0
Ave(1[y0 6= yb0 ]) =
n0
Cross validation uses number of misclassi…ed observations. e.g.
LOOCV is
1 1
∑i =1 Erri = n ∑i =1 1[yi 6= yb(
n n
CV(n ) = i ) ].
n
A. Colin Cameron Univ.of California - Davis .ML
() Part 6: Classi…cation and Unsupervised April 2024 7 / 43
1. Classi…cation 1.1 Loss Function
Classi…cation Table
Bayes classi…er
The Bayes classi…er selects the most probable class
I the following gives theoretical justi…cation.
b (x)) = 1[yi 6= ybi ]
L(G , G
b (x)) is 0 on diagonal of K
I L(G , G K table and 1 elsewhere
I b is predicted categories.
where G is actual categories and G
Then minimize the expected prediction error
EPE b (x))]
= EG ,x [L(G , G
h i
= Ex ∑ k = 1 L ( G k , G
K
b (x)) Pr[Gk jx]
1.2 Logit
Logit Example
Example considers supplementary health insurance for 65-90 year-olds.
. * Data for 65-90 year olds on supplementary insurance indicator and regressors
. use mus203mepsmedexp.dta, clear
. global xlist income educyr age female white hisp marry ///
> totchr phylim actlim hvgg
Summary statistics
I ȳ = 0.58 so not near extreme of most y = 0 or most y = 1.
. * Summary statistics
. summarize suppins $xlist
Logit Example
Logit model coe¢ cient estimates
(∂ Pr[y = 1]/∂xj = βj Λ(x0 β)f1 Λ(x0 β)g)
. * logit model
. logit suppins $xlist, nolog
=1 if has
supp priv yh_logit
insurance 0 1 Total
True
Classified D ~D Total
. * K-nn classification table with leave-one out cross validation not as good
. estat classtable, nototals nopercents // without LOOCV
Key
Number
Classified
True suppins 0 1
0 889 394
1 584 1,197
I bk = xk , Σ c [xk ] and π
b = Var bk = 1
∑N
use µ N i = 1 1 [ yi = k ] .
Called linear discriminant analysis as δk (x) linear in x.
I logit also gives separation linear in x.
. predict yh_lda
(option classification assumed; group classification)
Key
Number
Classified
True suppins 0 1
0 770 513
1 638 1,143
. predict yh_qda
(option classification assumed; group classification)
Key
Number
Classified
True suppins 0 1
0 468 815
1 292 1,489
. * Support vector machines - need y to be byte not float and matsize > n
. set matsize 3200
. predict yh_svm
yh_svm
ins 0 1 Total
suppins 1.0000
yh_logit 0.2505 1.0000
yh_knn 0.3604 0.3575 1.0000
yh_lda 0.2395 0.6955 0.3776 1.0000
yh_qda 0.2294 0.6926 0.2762 0.5850 1.0000
yh_svm 0.5344 0.3966 0.6011 0.3941 0.3206 1.0000
2. Unsupervised Learning
myclusters x1 x2 z
Hierarchical Clustering
Do not specify K .
Instead begin with n clusters (leaves) and combine clusters into
branches up towards trunk
I represented by a dendrogram
I eyeball to decide number of clusters.
Need a dissimilarity measure between clusters
I four types of linkage: complete, average, single and centroid.
For any clustering method
I it is a di¢ cult problem to do unsupervised learning
I results can change a lot with small changes in method
I clustering on subsets of the data can provide a sense of robustness.
3. Conclusions
Guard against over…tting
I use K -fold cross validation or penalty measures such as AIC.
Biased estimators can be better predictors
I shrinkage towards zero such as Ridge and LASSO.
For ‡exible models popular choices are
I neural nets
I random forests.
Though what method is best varies with the application
I and best are ensemble forecasts that combine di¤erent methods.
Machine learning methods can outperform nonparametric and
semiparametric methods
I so wherever econometricians use nonparametric and semiparametric
regression in higher dimensional models it may be useful to use ML
methods
I though the underlying theory still relies on assumptions such as sparsity.
A. Colin Cameron Univ.of California - Davis .ML
() Part 6: Classi…cation and Unsupervised April 2024 39 / 43
4. Software for Machine Learning
Basic classi…cation
I logistic: glm function
I discriminant analysis: lda() and qda functions in MASS library
I k nearest neighbors: knn() function in class library
Support vector machines
I support vector classi…er: svm(... kernel="linear") in e1071 library
I support vector machine: svm(... kernel="polynomial") or svm(...
kernel="radial") in e1071 library
I receiver operator characteristic curve: rocplot in ROCR library.
Unsupervised Learning
I principal components analysis: function prcomp()
I k-means clustering: function kmeans()
I hierarchical clustering: function hclust()
5. References
ISLR2: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibsharani
(2021), An Introduction to Statistical Learning: with Applications in R, 2nd Ed.,
Springer.
I Free PDF from https://www.statlearning.com/ and $40 softcover book via
Springer Mycopy.
ISLP: Gareth James, Daniela Witten, Trevor Hastie, Robert Tibsharani and
Jonathan Taylor (2023), An Introduction to Statistical Learning: with Applications
in Python, Springer.
I Free PDF from https://www.statlearning.com/ and $40 softcover book via
Springer Mycopy.
References (continued)
ESL: Trevor Hastie, Robert Tibsharani and Jerome Friedman (2009), The
Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer.
I More advanced treatment.
I Free PDF and $40 softcover book at
https://link.springer.com/book/10.1007/978-0-387-84858-7