MAI Lecture 01 Introduction
MAI Lecture 01 Introduction
LECTURE 1 Introduction
1
Machine Learning (ML)
Programs that improve with experience.
2
Revolutionizing Science and Technology
“A breakthrough in machine learning would be worth ten
Microsofts.” (Bill Gates, Microsoft)
Computer Computer
Program Output
Computer Computer
Program Output
Computer Computer
Program Output
7
Applications of ML
Use past data to …
Other Examples:
Fraud Detection, Flagging inappropriate social media posts,
Natural Language Processing, Document Classification,
Designing Economic Mechanisms, Computational Advertising, …
8
Comp Info
Biology Theory
Computer
Ethics Science Robotics
Cognitive
Math Machine Learning
Science
Control
theory Statistics
Economics
ECE Neuro
Science 9
The Turing Test, 1950
10
Checkers Program, 1952
11
Perceptron, 1957
Predecessor of deep networks.
12
1960s: Lots of hope for AI to solve everything!
14
Foundations of ML, 1980s-present
Formal notions of learnability from Data.
• When data-driven learning is possible?
Probably Approximately Correct Learning (PAC) by Valiant.
How much data is required?
• What’s the difference between great and mediocre learners?
Improving the performance of a learning algorithm.
Boosting algorithm of Freund and Schapire.
• How to deal with difficult and noisy learning problems?
(Soft Margin) Support Vector Machines by Cortes and Vapnik
• What to do when the learning task evolves over time?
Online learning framework.
15
TD-Gammon, 1992
Gerald Tesauro at IBM thought a
neural network to play
Backgammon.
16
Deep Blue, 1997
IBM’s Deep Blue won against
Kasparov in chess.
17
Expanding the reach, 2000s
Learning to rank
Powering search engines: Google, Bing, …
Topic Modeling:
Detecting and organizing documents by subject matter.
Making sense of the unstructured data on the web.
Online economy:
Ad placement and pricing.
Product recommendation.
19
Surrounded by Machine Learning
Nika Haghtalab
Your instructor?
20
“With great power, there must also come
– great responsibility!”
21
Data Privacy
Learning models leak training data Learning algorithms detect sexual
(Fredrickson et al. ‘15) orientation better than people
(Wang & Kosinski’17)
23
Learning and the Society
• Bad dynamics, perpetuating and worsening stereotypes and biases.
• Who carries the burden of bad prediction?
• How to design good dynamics?
24
Challenging Questions
Ordinal data
Interval data
Ratio data
Observations
Nominal – Non numeric category
Ordinal - Non numeric category with order
Interval have an arbitrary zero, whereas
Ratio have an actual non-arbitrary zero
Level of Measurements
Normalization Observation
all input and output from machine learning
algorithms are
typically vectors of floating-point numbers.
nominal, ordinal, interval, or ratio.
However, nominal and ordinal are not inherently
numeric
Some algorithm have range -1 to +1 or 0 to
+1.
Why is normalization necessary?
Different scales of values (in million and in 10)
Ex. Volume of a stack in are Millions Number of
stack is 10
The number overwhelms
Solution :Use percentage 5% , 10%
Example the iris dataset
"Sepal Length","Sepal Width","Petal Length","Petal Width","Spec
ies"
5.1,3.5,1.4,0.2,"setosa"
4.9,3.0,1.4,0.2,"setosa"
4.7,3.2,1.3,0.2,"setosa" Five information
... • Sepal length
7.0,3.2,4.7,1.4,"versicolor" • Sepal width
6.4,3.2,4.5,1.5,"versicolor" • Petal length
6.9,3.1,4.9,1.5,"versicolor"
...
• Petal width
6.3,3.3,6.0,2.5,"virginica" • Species
5.8,2.7,5.1,1.9,"virginica"
7.1,3.0,5.9,2.1,"virginica"
Normalizing Nominal Observations
5.1,3.5,1.4,0.2,"setosa“
7.0,3.2,4.7,1.4,"versicolor“
one-of-n normalization 6.3,3.3,6.0,2.5,"virginica"
Range -1, 1
Setosa 1,-1,-1
Versicolor -1,1,-1
Virginica -1,-1,1
Range 0,1?
How do we encode?
Normalizing Ordinal Observations
Equilateral Normalization
Equilateral Normalization
Ideal output: -1, -1, 1
Actual output: -1, 1, -1
Implementation
• “Practical Neural Network Recipes in C++” by Masters (1993), who
cited an article in PCAI as the actual source. (Guiver, 1991)
Equilateral Normalization
Additional Normalizations
Z Normalization
Min-Max Normalization
Data classification
Regression analysis
Clustering
Time Series
44
Classification
Example: Credit
scoring
Differentiating
between low-
risk and high-risk
customers from
their income and
savings
Model 45
Classification: Applications
46
Face Recognition
Test images
48
Regression Applications
α1
49
Time Series
Encode the Data
Financial Analysis Normalize (Sliding Window)
50
Resources: Datasets
51
Textbook and Course Material
• Textbooks
• Mathematics for machine learning. Deisenroth, Marc Peter,
A. Aldo Faisal, and Cheng Soon Ong. ,Cambridge University
Press, 2020.
• Pattern Recognition and Machine Learning, Christopher
Bishop
• Machine Learning: A Probabilistic Perspective, Kevin P.
Murphy
• References
– Machine Learning, by Tom Mitchell
– The Elements of Statistical Learning by Trevor Hastie, Robert
Tibshirani, Jerome Friedman.
• Course Notes
– Slides available on course Google class room
52