0% found this document useful (0 votes)
49 views

MAI Lecture 01 Introduction

This document provides an introduction to machine learning for artificial intelligence. It discusses how machine learning programs improve with experience by learning from large amounts of data. It outlines some of the key applications of machine learning, including detecting spam, predicting the weather, and classifying images. The document then reviews some of the major developments in the field, including early work on checkers-playing programs in the 1950s and breakthroughs with neural networks that led to today's renaissance in deep learning. It closes by discussing some of the challenging questions around how machine learning will shape the future and impact society.

Uploaded by

Yeabsira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

MAI Lecture 01 Introduction

This document provides an introduction to machine learning for artificial intelligence. It discusses how machine learning programs improve with experience by learning from large amounts of data. It outlines some of the key applications of machine learning, including detecting spam, predicting the weather, and classifying images. The document then reviews some of the major developments in the field, including early work on checkers-playing programs in the 1950s and breakthroughs with neural networks that led to today's renaissance in deep learning. It closes by discussing some of the challenging questions around how machine learning will shape the future and impact society.

Uploaded by

Yeabsira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Mathematics for AI

LECTURE 1 Introduction

1
Machine Learning (ML)
Programs that improve with experience.

2
Revolutionizing Science and Technology
“A breakthrough in machine learning would be worth ten
Microsofts.” (Bill Gates, Microsoft)

“It will be the basis and fundamentals of every successful


huge IPO win in 5 years.” (Eric Schmidt, Google / Alphabet

“AI and machine learning are going to change the world


and we really have not begun to scratch the surface.”
(Jennifer Chayes, Microsoft / Berkeley)
“ML is transforming sector after sector of the economy, and
the rate of progress only seems to be accelerating.” (Daphne
Koller, Stanford / Coursera/ Insitro)

“Machine learning is the next Internet” (Tony Tether, DARPA)


3
What is Machine Learning?
Yes, Yes, No, No 2, 1, 0, -1 Fun(x): x > 0? 2, 1, 0, -1

Output Input Program Input

Computer Computer

Program Output

Fun(x): x > 0.5? Yes, Yes, No, No


Machine Learning Traditional Computing 4
What is Machine Learning?
Yes, Yes, No, No 2, 1, 0, -1 1, 0.5, 0, -1

Output Input Input

Computer Computer

Program Output

Fun(x): x > 0.5? Yes, No, No, No


Machine Learning Traditional Computing 5
Tom Mitchell, 1997:
A computer program A is said to learn from experience E with respect to
What is Machine Learning?
some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.
Yes, Yes, No, No 2, 1, 0, -1 1, 0.5, 0, -1

Output Input Input

Computer Computer

Program Output

Fun(x): x > 0.5? Yes, No, No, No


Training Testing 6
Learning to Detect Spam
• Use past emails and whether or Spam or Not E-Mails
not they were flagged as spam.
• Learn a program that takes a
future email and decides whether
it is a spam:
• E.g. If the email is from an
unknown sender, has a
misspelling, and has “Million
Dollars” in it flag as spam.

7
Applications of ML
Use past data to …

Detect spam Predict weather Classify images

Other Examples:
Fraud Detection, Flagging inappropriate social media posts,
Natural Language Processing, Document Classification,
Designing Economic Mechanisms, Computational Advertising, …
8
Comp Info
Biology Theory
Computer
Ethics Science Robotics

Cognitive
Math Machine Learning
Science

Control
theory Statistics
Economics

ECE Neuro
Science 9
The Turing Test, 1950

A machine is intelligent if its answers


Alan Turing are indistinguishable from a human’s.

10
Checkers Program, 1952

Created a Checkers-playing program


Arthur Samuel
that got better overtime.

Also introduced the term “Machine


Learning”.

11
Perceptron, 1957
Predecessor of deep networks.

Separating two classes of objects using a linear


Frank Rosenblatt
threshold classifier.
@ Cornell!
Provable learning and convergence guarantees.

12
1960s: Lots of hope for AI to solve everything!

AI didn’t live up to the hype!


• 1966: Machine Translation failed.
• 1970: Minsky and Papert argued against Perceptron.
• 1971: Speech Understanding failed.
• 1973: Lighthill report torn apart AI.

“In no part of the field have the discoveries made so far


produced the major impact that was then promised”

• 1974: The UK and US stopped funding AI research.

The AI Winter, 1974-1980


13
Rebirth as Machine Learning
Machine Learning:
• Originally, a bit of a name game to get funding.
• Fundamentally a different approach to intelligence:

Machine Learning Artificial Intelligence


Data-driven Knowledge-based
Bottom-up approach Heavy use of logic
Top-down approach

14
Foundations of ML, 1980s-present
Formal notions of learnability from Data.
• When data-driven learning is possible?
 Probably Approximately Correct Learning (PAC) by Valiant.
 How much data is required?
• What’s the difference between great and mediocre learners?
 Improving the performance of a learning algorithm.
 Boosting algorithm of Freund and Schapire.
• How to deal with difficult and noisy learning problems?
 (Soft Margin) Support Vector Machines by Cortes and Vapnik
• What to do when the learning task evolves over time?
 Online learning framework.
15
TD-Gammon, 1992
Gerald Tesauro at IBM thought a
neural network to play
Backgammon.

The net played 100K+ games


against itself and beat the world
champion.

Algorithm found new techniques


that people had erroneously ruled
out.

16
Deep Blue, 1997
IBM’s Deep Blue won against
Kasparov in chess.

The crucial winning move was


made due to machine learning
methods developed by Gerald
Tesauro.

17
Expanding the reach, 2000s
Learning to rank
 Powering search engines: Google, Bing, …

Topic Modeling:
 Detecting and organizing documents by subject matter.
 Making sense of the unstructured data on the web.

Online economy:
 Ad placement and pricing.
 Product recommendation.

Machine learning became profitable!


18
Return of Neural Networks, 2010s
Neural networks return and excel at image
recognition, speech recognition, …

The 2018 Turing award was given to Yoshua


Bengio, Geoff Hinton, and Yann LeCun.

19
Surrounded by Machine Learning

Nika Haghtalab

Your instructor?
20
“With great power, there must also come
– great responsibility!”

21
Data Privacy
Learning models leak training data Learning algorithms detect sexual
(Fredrickson et al. ‘15) orientation better than people
(Wang & Kosinski’17)

Leaked data Real image

Formal definitions of data privacy:


• K- anonymity (Sweeney)
• Differential Privacy (Dwork, McSherry, Nissim, Smith).

Latanya Sweeney Cynthia Dwork Frank McSherry Kobbi Nissim 22


Adam Smith
Robust and Secure ML

Image Recognition Speech recognition Poisoning Attacks


Misreading traffic signs Hide commands in Tay (chat bot) became
(Eykholt et al) noise (Carlini & Wagner) inflammatory in 16 hr.

How to create robust and secure machine learning algorithms?

23
Learning and the Society
• Bad dynamics, perpetuating and worsening stereotypes and biases.
• Who carries the burden of bad prediction?
• How to design good dynamics?

24
Challenging Questions

Machine learning and Artificial Intelligence will shape the future,


what kind of a future do we want?

What is the role of machine learning?


 ML for good versus ML for profit.

How do automation and learning change the quality of life?


 Job loss and displacement, life satisfaction, safety and
security?

How do we approach machine learning and (inter-)national


security? Weaponization of machine learning and
25
AI?
Level of Measurements
In statistics data is divided into two
 Qualitative –qualities or descriptions
 Quantitate - quantities or numbers
Example – A Cup of Coffee
 Qualitatively (non –numerical qualities )
 Brown
 Strong aroma
 White cup
 Hot to the touch
 Quantitatively
 12 fluid ounces
 106 calories
 65 degrees Celsius
 $4.99 cost
Level of Measurement
According to
“On the Theory
of Scales of
Measurement.”
(Stevens,
Nominal data
1946)

Ordinal data
Interval data
Ratio data
Observations
 Nominal – Non numeric category
 Ordinal - Non numeric category with order
 Interval have an arbitrary zero, whereas
 Ratio have an actual non-arbitrary zero
Level of Measurements
Normalization Observation
 all input and output from machine learning
algorithms are
 typically vectors of floating-point numbers.
 nominal, ordinal, interval, or ratio.
 However, nominal and ordinal are not inherently
numeric
 Some algorithm have range -1 to +1 or 0 to
+1.
 Why is normalization necessary?
 Different scales of values (in million and in 10)
 Ex. Volume of a stack in are Millions Number of
stack is 10
 The number overwhelms
 Solution :Use percentage 5% , 10%
Example the iris dataset
"Sepal Length","Sepal Width","Petal Length","Petal Width","Spec
ies"
5.1,3.5,1.4,0.2,"setosa"
4.9,3.0,1.4,0.2,"setosa"
4.7,3.2,1.3,0.2,"setosa" Five information
... • Sepal length
7.0,3.2,4.7,1.4,"versicolor" • Sepal width
6.4,3.2,4.5,1.5,"versicolor" • Petal length
6.9,3.1,4.9,1.5,"versicolor"
...
• Petal width
6.3,3.3,6.0,2.5,"virginica" • Species
5.8,2.7,5.1,1.9,"virginica"
7.1,3.0,5.9,2.1,"virginica"
Normalizing Nominal Observations
5.1,3.5,1.4,0.2,"setosa“
7.0,3.2,4.7,1.4,"versicolor“
 one-of-n normalization 6.3,3.3,6.0,2.5,"virginica"
 Range -1, 1
 Setosa 1,-1,-1
 Versicolor -1,1,-1
 Virginica -1,-1,1
 Range 0,1?
 How do we encode?
Normalizing Ordinal Observations

 Ordinal data are not necessarily numeric


but have an implied ordering

Example: Education level


Normalizing Ordinal Observations

Where nH and nL are range of encoding


Normalizing Quantitative Observations

 Quantitative observations are always numeric


 We may not need to normalize
 Given dataHighdH , dataLow dH ,normalizedHigh nH,
normalizedLow nL
Example Normalizing Quantitative Observations

 Normalizing the weight of a car


• dataHigh: 4,000
• dataLow: 100
• normalizedHigh: 1
• normalizedLow: -1
 Given weigh =1000
Other Ways of Normalization
 Reciprocal Normalization

 Equilateral Normalization
Equilateral Normalization
Ideal output: -1, -1, 1
Actual output: -1, 1, -1

isting 2.1: Calculated Class Equilateral Values 3 Classes


: -0.8660 , -0.5000
: -0.8660 , -0.5000
: 0.0000 , 1.0000

Advantages of Equilateral Encoding


• Requires one fewer output than one-of-n
• Spreads the “blame” better than one-of-n
Equilateral Encoding Examples
 2 Cat.  3 Cat.  4 Cat.

Implementation
• “Practical Neural Network Recipes in C++” by Masters (1993), who
cited an article in PCAI as the actual source. (Guiver, 1991)
Equilateral Normalization
Additional Normalizations
 Z Normalization

 Min-Max Normalization

 Unit Vector Normalization Gaussian


Mean Normalization
Machine Learning Models

 Data classification
 Regression analysis
 Clustering
 Time Series

44
Classification

 Example: Credit
scoring
 Differentiating
between low-
risk and high-risk
customers from
their income and
savings

Discriminant: IF income > θ1 AND savings > θ2


THEN low-risk ELSE high-risk

Model 45
Classification: Applications

 Aka Pattern recognition


 Face recognition: Pose, lighting, occlusion (glasses, beard), make-
up, hair style
 Character recognition: Different handwriting styles.
 Speech recognition: Temporal dependency.
 Use of a dictionary or the syntax of the language.
 Sensor fusion: Combine multiple modalities; eg, visual (lip image) and
acoustic for speech
 Medical diagnosis: From symptoms to illnesses
 Web Advertizing: Predict if a user clicks on an ad on the Internet.

46
Face Recognition

Training examples of a person

Test images

AT&T Laboratories, Cambridge UK 47


http://www.uk.research.att.com/facedatabase.html
Regression

 Example: Price of a used car


 x : car attributes
y = wx+w0
y : price
y = g (x | θ )
g ( ) model,
θ parameters

48
Regression Applications

 Navigating a car: Angle of the steering wheel


(CMU NavLab)
 Kinematics of a robot arm
(x,y) α1= g1(x,y)
α2= g2(x,y)
α2

α1

49
Time Series
 Encode the Data
 Financial Analysis  Normalize (Sliding Window)

50
Resources: Datasets

 UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html


 UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html
 Statlib: http://lib.stat.cmu.edu/
 Delve: http://www.cs.utoronto.ca/~delve/

51
Textbook and Course Material
• Textbooks
• Mathematics for machine learning. Deisenroth, Marc Peter,
A. Aldo Faisal, and Cheng Soon Ong. ,Cambridge University
Press, 2020.
• Pattern Recognition and Machine Learning, Christopher
Bishop
• Machine Learning: A Probabilistic Perspective, Kevin P.
Murphy
• References
– Machine Learning, by Tom Mitchell
– The Elements of Statistical Learning by Trevor Hastie, Robert
Tibshirani, Jerome Friedman.
• Course Notes
– Slides available on course Google class room
52

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy