0% found this document useful (0 votes)

55 views

Deep Unsupervised Learning

The document summarizes an unsupervised deep learning tutorial given by Alex Graves and Marc'Aurelio Ranzato. The tutorial covered topics such as autoregressive models, representation learning, and unsupervised reinforcement learning. Autoregressive models were discussed in more detail, including how they split high-dimensional data into sequences and condition predictions on past information. Advantages like good log-likelihoods and ability to generate samples were mentioned, along with disadvantages like being order-dependent and expensive for high-dimensional data. Specific autoregressive models for language, audio, and images were also briefly described.

Uploaded by

askool99

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

Deep Unsupervised Learning

Uploaded by

askool99

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 90

Unsupervised Deep Learning

Tutorial – Part 1
Alex Graves Marc’Aurelio Ranzato

NeurIPS, 3 December 2018

Part 1 – Alex Graves

● Introduction to unsupervised learning

● Autoregressive models
● Representation learning
● Unsupervised reinforcement learning
● 10-15 minute break
Part 2 – Marc’Aurelio Ranzato
● Practical Recipes of Unsupervised Learning
● Learning representations
● Learning to generate samples
● Learning to map between two domains
● Open Research Problems
● 10-15 minutes questions (both presenters)
Introduction to
Unsupervised Learning
Types of Learning
With Teacher Without Teacher

Active Reinforcement Learning / Intrinsic Motivation /

Active Learning Exploration

Passive Supervised Learning Unsupervised Learning

Types of Learning
With Teacher Without Teacher

Active Reinforcement Learning / Intrinsic Motivation /

Active Learning Exploration

Passive Supervised Learning Unsupervised Learning

Why Learn Without a Teacher?

If our goal is to create intelligent systems that can succeed at a wide

variety of tasks (RL or supervised), why not just teach them those
tasks directly?

1. Targets / rewards can be difficult to obtain or define.

2. Want rapid generalisation to new tasks and situations
3. Unsupervised learning is interesting
Why Learn Without a Teacher?

If our goal is to create intelligent systems that can succeed at a wide

variety of tasks (RL or supervised), why not just teach them those
tasks directly?

1. Targets / rewards can be difficult to obtain or define

2. Want rapid generalisation to new tasks and situations
3. Unsupervised learning is interesting
Why Learn Without a Teacher?

If our goal is to create intelligent systems that can succeed at a wide

variety of tasks (RL or supervised), why not just teach them those
tasks directly?

1. Targets / rewards can be difficult to obtain or define

2. Unsupervised learning feels more human
3. Want rapid generalisation to new tasks and situations
Why Learn Without a Teacher?

If our goal is to create intelligent systems that can succeed at a wide

variety of tasks (RL or supervised), why not just teach them those
tasks directly?

1. Targets / rewards can be difficult to obtain or define

2. Unsupervised learning feels more human
3. Want rapid generalisation to new tasks and situations
Transfer Learning
● Teaching on one task and transferring to another (multi-task
learning, one-shot learning…) kind of works
● E.g. Retraining speech recognition systems from a language with
lots of data can improve performance on a related language with
little data
● But never seems to transfer as far or as fast as we want it to
● Maybe there just isn’t enough information in the
targets/rewards to learn transferable skills?

Stop learning tasks, start learning skills – Satinder Singh

The Cherry on the Cake
● The targets for supervised learning contain far less information
than the input data
● RL reward signals contain even less
● Unsupervised learning gives us an essentially unlimited supply of
information about the world: surely we should exploit that?

If intelligence was a cake, unsupervised learning would be the cake,

supervised learning would be the icing on the cake, and reinforcement
learning would be the cherry on the cake.

– Yann LeCun
Example
● ImageNet training set contains ~1.28M images, each assigned one of
1000 labels
● If labels are equally probable, complete set of randomly shuffled labels
contains ~log2(1000)*1.28M ≈ 12.8 Mbits
● Complete set of images uncompressed at 128 x128 contains ~500
Gbits: > 4 orders of magnitude more
● A large conv net (~30M weights) can memorise randomised ImageNet
labellings. Could it memorise randomised pixels?

UNDERSTANDING DEEP LEARNING REQUIRES RETHINKING GENERALIZATION, Zhang et. al. 2016
Supervised Learning
● Given a dataset D of inputs x labelled with targets y, learn to predict
y from x, typically with maximum likelihood:

● (Still) the dominant paradigm in

deep learning: image classification,
speech recognition, translation…
Unsupervised Learning
● Given a dataset D of inputs x, learn to predict… what?

● Basic challenge of unsupervised

learning is that the task is undefined
● Want a single task that will allow the network generalise to many
other tasks (which ones?)
Density Modelling
● Simplest approach: do maximum likelihood on the data instead of
the targets

● Goal is to learn the ‘true’ distribution from which the data was drawn
● Means attempting to learn everything about the data
Where to Look
Not everyone agrees that trying to understand everything is a good
idea. Shouldn’t we instead focus on things that we believe will one day
be useful for us?

… we lived our lives under the constantly changing sky without sparing it a
glance or a thought. And why indeed should we? If the various formations had
had some meaning, if, for example, there had been concealed signs and
messages for us which it was important to decode correctly, unceasing
attention to what was happening would have been inescapable…

– Karl Ove Knausgaard, A Death in the Family

Problems with Density Modelling
● First problem: density modelling is hard! From having too few bits to learn
from, we now have too many (e.g. video, audio), and we have to deal with
complex interactions between variables (curse of dimensionality)
● Second Problem: not all bits are created equal. Log-likelihoods depend
much more on low-level details (pixel correlations, word N-Grams) than on
high-level structure (image contents, semantics)
● Third problem: even if we learn the underlying structure, it isn’t always clear
how to access and exploit that knowledge for future tasks (representation
learning)
Generative Models
● Modelling densities also gives us a generative model of the data (as
long as we can draw samples)
● Allows us to ‘see’ what the model has and hasn’t learned
● Can also use generative models to imagine possible scenarios, e.g.
for model-based RL

What I cannot create, I do not understand

– Richard Feynman
Autoregressive Models
The Chain Rule for Probabilities

Slide Credit: Piotr Mirowski

Autoregressive Networks
● Basic trick: split high dimensional data
up into a sequence of small pieces,
predict each piece from those before
(curse of dimensionality)
● Conditioning on past is done via
network state (LSTM/GRU, masked
convolutions, transformers…), output
layer parameterises predictions
Slide Credit: Piotr Mirowski
Slide Credit: Piotr Mirowski
Slide Credit: Piotr Mirowski
Slide Credit: Piotr Mirowski
Slide Credit: Piotr Mirowski
Advantages of
Autoregressive Models
● Simple to define: just have to pick an ordering
● Easy to generate samples: just sample from each predictive
distribution, then feed in the sample at the next step as if it’s real
data (dreaming for neural networks?)
● Best log-likelihoods for many types of data: images, audio,
video, text…
Disadvantages of
Autoregressive Models
● Very expensive for high-dimensional data (e.g millions of predictions
per second for video); can mitigate with parallelisation during
training, but generating still slow
● Order dependent: get very different results depending on the order
in which predictions are made, and can’t easily impute out of order
● Teacher forcing: only learning to predict one step ahead, not many
(potentially brittle generation and myopic representations)
Language Modelling
Some of the obese people lived five to eight
years longer than others.

Abu Dhabi is going ahead to build solar city

and no pollution city.

Or someone who exposes exactly the truth while lying.

VIERA , FLA . -- Sometimes, Rick Eckstein dreams about baseball swings.

For decades, the quintessentially New York city has elevated its streets to the status of an icon.

The lawsuit was captioned as United States ex rel.

R. Jozefowicz et. al. Exploring the Limits of Language Modeling (2016)

WaveNets

van den Oord, A., et al. “WaveNet: A Generative Model for Raw Audio.” arxiv (2016).
PixelRNN - Model

● Fully visible
● Model pixels with Softmax
● ‘Language model’ for images

van den Oord, A., et al. “Pixel Recurrent Neural Networks.” ICML (2016).
Pixel RNN - Samples

van den Oord, A., et al. “Pixel Recurrent Neural Networks.” ICML (2016).
Conditional Pixel CNN

van den Oord, A., et al. “Conditional Pixel CNN.” NIPS (2016).
Autoregressive over slices, then pixels within a slice
Slice 1 Slice 2 Slice 3 Slice 4
1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4
5 6 7 8 5 6 7 8
Source 5 6 7 8 5 6 7 8
9 9
9 9

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8

Target 9 10 11 12 9 10 11 12 9 10 11 12 9 10 11 12

13 14 15 16 13 14 15 16 13 14 15 16 13 14 15 16

J. Menick et. al. Generating High Fidelity Images with subsample pixel networks and multidimensional upscaling (2018)
256 x 256 CelebA-HQ

J. Menick et. al. Generating High Fidelity

Images with subsample pixel networks and
multidimensional upscaling (2018)
128 x128 ImageNet

J. Menick et. al. Generating High Fidelity

Images with subsample pixel networks and
multidimensional upscaling (2018)
Video Pixel Network (VPN)

Kalchbrenner, N., et al. “Video Pixel Networks.” ICML (2017).

Handwriting
Synthesis

A. Graves, Generating Sequences with Recurrent Neural Networks (2013)

Autoregressive Mixture Models
Co-ordinate Density

Component Weights
Distribution over Sequences

Carter et. al., Experiments in Handwriting with a Neural Network (2016)

Representation Learning
The Language of Neural Networks

● Deep networks work by learning complex, often

hierarchical internal representations of input data
● These form a kind of language the network uses to
describe the data
● Language can emerge from tasks like object recognition:
has pointy ears, whiskers, tail => cat (c.f. Wittgenstein)
C. Olah et. al. Feature Visualization, distill (2018)
Unsupervised Representations

● Task-driven representations are limited by the requirements of

the task: e.g. don’t need to internalise the laws of physics to
recognise objects
● Unsupervised representations should be more general: as long as
the laws of physics help to model observations in the world, they
are worth representing
Reading the Latent Language
● We want neural networks to describe the data to us (image
captioning without the captions?)
● Then we can re-use the descriptions to plan, reason, and
generalise at a more abstract level
● Good density models must learn a rich internal language, but we can’t
read it (distil for WaveNet?): we need to break open the black box
● One way to make representations more accessible is to force them
through a bottleneck
Autoencoder
Latent
representation

Input Reconstruction

Encoder Decoder

Reconstruction cost
Slide: Irina Higgins, Loïc Matthey
Autoencoder
Latent
representation

Input Reconstruction

Encoder Decoder

Reconstruction cost
Slide: Irina Higgins, Loïc Matthey
Variational AutoEncoder
Kingma et al, 2014
Rezende et al, 2014
Latent
distribution

Input Reconstruction

Encoder Decoder

Coding Cost

Reconstruction cost
Slide: Irina Higgins, Loïc Matthey
Minimum Description Length for VAE
● Alice wants to transmit x as compactly as possible to Bob, who knows
only the prior p(z) and the decoder weights
● The coding cost is the number of bits required for Alice to transmit a
sample from qθ(z|x) to Bob (e.g. bits-back coding)
● The reconstruction cost measures the number of additional error
bits Alice will need to send to Bob to reconstruct the data given the
latent sample (e.g. arithmetic coding)
● The sum of the two costs is the total length of the message Alice needs
to send to Bob to allow him to recover x (c.f. variational inference)

Chen at. al., Variational Lossy Autoencoder (2017)

Code Collapse
● Ideally a VAE would put high-level information in the codes, leave
low-level information to the decoder
● But when the decoder is sufficiently powerful (e.g. autoregressive) the
coding distribution tends to ‘collapse’ to the prior p(z)
● This means no information is passed through the bottleneck and no latent
representation is learned
● MDL suggests a reason: a powerful decoder can implicitly learn p(z),
meaning that if each x is independently transmitted, the number of bits
saved by the decoder by conditioning on z ≈ the cost of transmitting z
Thought Experiments

● Experiment 1: An MNIST Decoder learns a uniform mixture over 10

disjoint models. Prior is uniform over 10 classes. Conditioning on the image
class saves ~ log2(10) bits, encoding the class costs ~ log2(10) bits

● Experiment 2: Pick 100 character strings at random from an encyclopedia.

The context from the paragraph, article etc. is missing. Is it worth appending
that information to each of the strings?
Learn the Dataset, Not the Datapoints
● Suggests a fundamental flaw with using log-likelihoods to find representations: never
worth encoding high-level information
● Example: conditioning on ImageNet labels makes a huge difference to samples, tiny
difference to log-probs (≈ log2(1000) bits)
● But one label applies to many data, so worth encoding high-level information if we
only encode it once for the whole dataset (≈ 1000 x log2(1000) bits)
● Want to amortise the coding cost over the whole dataset
● Use high level information to organise low level data, not annotate it

…one must take seriously the idea of working with datasets, rather than datapoints, as
the key objects to model.
– Edwards & Storkey, Towards a Neural Statistician, (2017)
Associative Compression Networks
● ACNs modify the VAE loss by replacing the unconditional prior p(z) with a
conditional prior p(z|z’), where z’ is the latent representation of an
associated data point (one of the K nearest Euclidean neighbours to z)
● p(z|z’) – parameterised by an MLP – models only part of the latent space,
rather than the whole thing, which greatly reduces the coding cost
● Implicit amortisation: the more clustered the codes, the cheaper they are
● Result: rich, informative codes are learned, even with powerful decoders.

Graves et. al., Associative Compression Networks for Representation Learning (2018)
MDL for ACN
● Alice now wants to transmit the entire dataset to Bob, in any order
(justified for IID data?)
● Bob has the weights of the associative prior, decoder and encoder
● Alice chooses an ordering for the data that minimises total coding cost
(travelling salesman) and sends the data to Bob one at a time.
● After receiving each latent code + error bits, he decodes the datapoint,
then re-encodes it and uses the result to determine the associative
prior for the next code
Red bits are C
different from
standard VAE,
The rest is the
same
Unordered: KL from unconditional prior
Ordered: KL from conditional ACN prior
Binary MNIST reconstructions: leftmost column are test set images
CelebA Reconstructions: leftmost column from test set
‘Daydream’ sampling: encode data, sample latent from conditional prior,
generate new data conditioned on latent, repeat
Mutual Information
● Want codes that ‘describe’ the data as well as possible
● Mathematically, we want to maximise the mutual information
between the code z and the data x

● For an autoencoder, the difference between decoding x with z and

(optimally) decoding without z is a lower bound on MI(x, z), so
minimising the reconstruction cost maximises MI
● But decoding is very expensive if we just want codes
● Are there other ways to maximise MI?
Contrastive Predictive Coding

van den Oord et al., Representation Learning with Contrastive Predictive Coding (2018)
General Artificial Intelligence
van den Oord et al., Representation Learning with Contrastive Predictive Coding (2018)
Representation Learning with Contrastive Predictive Coding General Artificial Intelligence
van den Oord et al., Representation Learning with Contrastive Predictive Coding (2018)
Representation Learning with Contrastive Predictive Coding General Artificial Intelligence
Gutmann et al., Noise-Contrastive Estimation (2009)
Representation Learning with Contrastive Predictive Coding General Artificial Intelligence
Representation Learning with Contrastive Predictive Coding General Artificial Intelligence
Representation Learning with Contrastive Predictive Coding General Artificial Intelligence
Representation Learning with Contrastive Predictive Coding General Artificial Intelligence
Representation Learning with Contrastive Predictive Coding General Artificial Intelligence
Representation Learning with Contrastive Predictive Coding General Artificial Intelligence
Representation Learning with Contrastive Predictive Coding General Artificial Intelligence
Representation Learning with Contrastive Predictive Coding General Artificial Intelligence
Speech - LibriSpeech

t-SNE on codes coloured by speaker identity

van den Oord et al., Representation Learning with Contrastive Predictive Coding (2018)
Representation Learning with Contrastive Predictive Coding General Artificial Intelligence
Images - ImageNet

Representation Learning with Contrastive Predictive Coding General Artificial Intelligence

NLP - BookCorpus

Representation Learning with Contrastive Predictive Coding General Artificial Intelligence

Unsupervised
Reinforcement Learning
Auxiliary Tasks
● How can unsupervised learning help reinforcement learning?
● Simplest way is as an auxiliary task: maximise reward and
minimise unsupervised loss with the same network
● Hope is that the representations learned for the unsupervised
task will help with the RL task
● Also applies to supervised learning (e.g. semi-supervised
learning, unsupervised pre-training)
UNREAL Agent
Pixel Control – auxiliary policies
are trained to maximise change in
pixel intensity of different regions
of the input

Reward Prediction – given three recent

frames, the network must predict the
reward that will be obtained in the next
unobserved timestep.

M. Jaderberg et. al., Reinforcement Learning with Unsupervised Auxiliary Tasks. (2016)
Unsupervised RL Baselines

M. Jaderberg et. al., Reinforcement Learning with Unsupervised Auxiliary Tasks. (2016)
Sparse Rewards? More Cherries!

Single scalar reward signal Many reward signals

Reinforcement Learning on DM-Lab
Auxiliary loss is on policy
Predict 30 steps in the future

Auxiliary Losses

Representation Learning with Contrastive Predictive Coding General Artificial Intelligence

Reinforcement Learning on DM-Lab

-- Batched A2C
-- Aux loss

Representation Learning with Contrastive Predictive Coding General Artificial Intelligence

Intrinsic Motivation

● Unsupervised learning can guide the policy of an RL

agent as well as shaping the representations
● Agent becomes intrinsically motivated to discover
or control aspects of the environment, with or
without an extrinsic reward
● Many variants, no consensus…
Curious Agents
Can reward the agent’s curiosity by guiding it towards ‘novel’ observations
from which it can rapidly learn. Many curiosity signals can be used:
● Prediction Error: choose actions to
maximise prediction error in observations.
Problem is noise addiction: inherently
unpredictable environments become
unreasonably interesting. One solution is to
make predictions in latent space instead:
network doesn’t import noise into latent
representations, only useful structure

Pathak et. al. Curiosity-driven Exploration by Self-supervised Prediction (2017)

Curious Agents (cotd.)
● Bayesian Surprise: maximise KL between posterior (after seeing observation) and
prior (before seeing it)
Baldi et. al., Bayesian Surprise Attracts Human Attention. (2005)
● Prediction Gain: maximise change in prediction error before and after seeing an
observation. Approximates Bayesian surprise.
Bellemare et. al. (Unifying Count-Based Exploration and Intrinsic Motivation. 2016)
● Complexity Gain: maximise increase in complexity of (regularised) predictive
model. Assumes a parsimonious model will only increase complexity if it
discovers a meaningful regularity. Needs a way of measuring complexity (e.g. VI).
Graves et. al. Automated Curriculum learning For Neural Networks. (2017)
Prediction Gain Syllabus

Automated Curriculum learning For Neural Networks. Graves et. al. (2017)
Curiouser and Curiouser…
● Complexity Gain: Seek out data that
maximise the decrease in bits of
everything the agent has ever
observed (!). In other words find (or
create) the thing that makes the
most sense of the agent’s life
so far: science, art, music, jokes…

Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty,
Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes, Schmidhuber, 2008
Empowered Agents
Instead of curiosity, agent can be motivated by empowerment: attempt to
maximise the Mutual Information between the agent’s actions and the
consequences of its actions (e.g. the state the actions will lead to). Agent wants
to have as much control as possible over its future.
Klyubin et. al. Empowerment: A Universal Agent-Centric Measure of Control (2005)

One way to maximise mutual information is to classify the high level

‘option’ that determined the actions from the final state (while keeping the
option entropy high): contrastive estimation again?
Gregor et. al. Variational Intrinsic Control (2016)
Conclusions
● Unsupervised learning gives us much more signal to learn from
● But it isn’t clear what the learning objective should be
● Density modelling is one option
● Autoregressive neural networks are a powerful family of density model
● Methods such as autoencoding and predictive coding can yield useful latent
representations
● RL can benefit from unsupervised learning as an auxiliary loss, and from
intrinsic motivation signals such as curiosity

Deep Learning: A Visual Introduction
No ratings yet
Deep Learning: A Visual Introduction
53 pages
EXOS User Guide 22 4 PDF
No ratings yet
EXOS User Guide 22 4 PDF
1,812 pages
Status Code: 41: Network Connection Timed Out
No ratings yet
Status Code: 41: Network Connection Timed Out
16 pages
CM20315_01_Intro
No ratings yet
CM20315_01_Intro
62 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
Deep
No ratings yet
Deep
15 pages
Week1 UDL CM20315 01 Intro
No ratings yet
Week1 UDL CM20315 01 Intro
49 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
Lec 1
No ratings yet
Lec 1
30 pages
Deep Learning For NLP
No ratings yet
Deep Learning For NLP
78 pages
Cheatsheets For Deep Learning 1650192034
No ratings yet
Cheatsheets For Deep Learning 1650192034
95 pages
Deep Learning Final Sheet
No ratings yet
Deep Learning Final Sheet
915 pages
Unit 3 Introduction to Deep Learning part 1
No ratings yet
Unit 3 Introduction to Deep Learning part 1
7 pages
AA12_Deep_Learning_2024 (1)
No ratings yet
AA12_Deep_Learning_2024 (1)
30 pages
Deep Learning Lecture 0 Introduction Alexander Tkachenko
No ratings yet
Deep Learning Lecture 0 Introduction Alexander Tkachenko
31 pages
Artificial neural network course slides
No ratings yet
Artificial neural network course slides
61 pages
Statistics Mechanic of Deep Learning
No ratings yet
Statistics Mechanic of Deep Learning
28 pages
conmatphys-031119-050745
No ratings yet
conmatphys-031119-050745
28 pages
Deep Neural Network
No ratings yet
Deep Neural Network
12 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
Unit-3
No ratings yet
Unit-3
16 pages
Ai 4 All
No ratings yet
Ai 4 All
18 pages
Unit 1
No ratings yet
Unit 1
20 pages
A Selective Overview of Deep Learning: Jianqing Fan Cong Ma Yiqiao Zhong April 16, 2019
No ratings yet
A Selective Overview of Deep Learning: Jianqing Fan Cong Ma Yiqiao Zhong April 16, 2019
37 pages
Thesis
No ratings yet
Thesis
87 pages
Deep Learning With Tensorflow
100% (1)
Deep Learning With Tensorflow
70 pages
Lec 1,2
No ratings yet
Lec 1,2
69 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Unit II
No ratings yet
Unit II
27 pages
2024 MTH058 Lecture04 AILearningParadigms
No ratings yet
2024 MTH058 Lecture04 AILearningParadigms
85 pages
NNFL Lecture 5 21 July 2021
No ratings yet
NNFL Lecture 5 21 July 2021
66 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
ML Lecture#1
No ratings yet
ML Lecture#1
52 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
195 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
34 pages
2015.08.26.Lecture01Intro 2
No ratings yet
2015.08.26.Lecture01Intro 2
37 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
Assignment_14_Modern_AI
No ratings yet
Assignment_14_Modern_AI
3 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
[Fall 2024] Deep Learning 3
No ratings yet
[Fall 2024] Deep Learning 3
54 pages
Efficient Deep Learning (First Early Release) (Gaurav Menghani Naresh Singh) (Z-Library)
No ratings yet
Efficient Deep Learning (First Early Release) (Gaurav Menghani Naresh Singh) (Z-Library)
69 pages
Lecture 1a - Introduction
No ratings yet
Lecture 1a - Introduction
38 pages
AI Lab 1
No ratings yet
AI Lab 1
11 pages
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
No ratings yet
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
33 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
151 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
34 pages
1 AI_Introduction and ML
No ratings yet
1 AI_Introduction and ML
32 pages
Deep Learning Material
No ratings yet
Deep Learning Material
136 pages
UNIT I part 1 notes
No ratings yet
UNIT I part 1 notes
28 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
Large Scale Deep Learning
No ratings yet
Large Scale Deep Learning
170 pages
DL Intro
No ratings yet
DL Intro
64 pages
01 Introduction 1
No ratings yet
01 Introduction 1
71 pages
Deep Learning in Neural Networks An Overview
No ratings yet
Deep Learning in Neural Networks An Overview
89 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
2. 02 PyTorch, Datasets, and Models
No ratings yet
2. 02 PyTorch, Datasets, and Models
39 pages
Self-Supervision, Bert, and Beyond: Building Transformer-Based Natural Language Processing Applications (Part 2)
No ratings yet
Self-Supervision, Bert, and Beyond: Building Transformer-Based Natural Language Processing Applications (Part 2)
117 pages
GenAIWorkshop GEOMAR With Footnotes Final
No ratings yet
GenAIWorkshop GEOMAR With Footnotes Final
41 pages
Deep Learning Midsem Merged Previous Batch
No ratings yet
Deep Learning Midsem Merged Previous Batch
423 pages
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)
Deep Learning: Computer Vision, Python Machine Learning And Neural Networks
From Everand
Deep Learning: Computer Vision, Python Machine Learning And Neural Networks
Rob Botwright
No ratings yet
Deep Learning Frameworks
From Everand
Deep Learning Frameworks
Jamal Hopper
No ratings yet
33 Steps To Great Presentation
No ratings yet
33 Steps To Great Presentation
27 pages
2016.04.0 SPPServerSupportGuideRev2
No ratings yet
2016.04.0 SPPServerSupportGuideRev2
73 pages
Cyber Law and Security: UNIT-2
No ratings yet
Cyber Law and Security: UNIT-2
48 pages
C Programming Mock Test C-CAT
No ratings yet
C Programming Mock Test C-CAT
15 pages
ALU and MAC Notes
No ratings yet
ALU and MAC Notes
36 pages
Detecting Mobile Malicious Webpages in Real Time
No ratings yet
Detecting Mobile Malicious Webpages in Real Time
14 pages
Srinivas Resume
No ratings yet
Srinivas Resume
3 pages
2201
No ratings yet
2201
19 pages
Customer Relationship Management
100% (1)
Customer Relationship Management
2 pages
IPS Global Directory and Addons Datasheet
No ratings yet
IPS Global Directory and Addons Datasheet
8 pages
Adwea Ims It MGMT PLC v1.0
No ratings yet
Adwea Ims It MGMT PLC v1.0
93 pages
European Soccer Betting Guide For Beginners
75% (8)
European Soccer Betting Guide For Beginners
3 pages
Machine Design Final Report
100% (2)
Machine Design Final Report
23 pages
Java Bytecode: What's Inside Class Files
No ratings yet
Java Bytecode: What's Inside Class Files
16 pages
G Scan Guide
No ratings yet
G Scan Guide
111 pages
WorkshopPLUS - Data AI Azure Machine Learning
No ratings yet
WorkshopPLUS - Data AI Azure Machine Learning
2 pages
Stanford Encyclopedia of Philosophy: Fuzzy Logic
No ratings yet
Stanford Encyclopedia of Philosophy: Fuzzy Logic
1 page
PRRU3801 Hardware Description V200 05 PDF
No ratings yet
PRRU3801 Hardware Description V200 05 PDF
30 pages
Sample Paragraph: Ella Es Julia. Es Mi Madre. Tiene Treinta y Cinco Años. Es Alta, Simpática, y Muy Inteligente
No ratings yet
Sample Paragraph: Ella Es Julia. Es Mi Madre. Tiene Treinta y Cinco Años. Es Alta, Simpática, y Muy Inteligente
1 page
Judson L Moore Resume
No ratings yet
Judson L Moore Resume
2 pages
XSL Gives Your XML Some Style
No ratings yet
XSL Gives Your XML Some Style
13 pages
Cs - 301 Mid Term by Vu Topper RM
No ratings yet
Cs - 301 Mid Term by Vu Topper RM
20 pages
Project 1: Threads: 2.1 Background
No ratings yet
Project 1: Threads: 2.1 Background
14 pages
Electric Drives and Controls Pneumatics Service Linear Motion and Assembly Technologies Hydraulics
No ratings yet
Electric Drives and Controls Pneumatics Service Linear Motion and Assembly Technologies Hydraulics
102 pages
User Interface Design
No ratings yet
User Interface Design
4 pages
AanyaJindal Resume
No ratings yet
AanyaJindal Resume
1 page
Download Complete (Ebook) Learning C# by developing games with Unity 2020: An enjoyable and intuitive approach to getting started with C# programming and Unity by Harrison Ferrone ISBN 9781800204447, 1800204442 PDF for All Chapters
100% (6)
Download Complete (Ebook) Learning C# by developing games with Unity 2020: An enjoyable and intuitive approach to getting started with C# programming and Unity by Harrison Ferrone ISBN 9781800204447, 1800204442 PDF for All Chapters
55 pages
Teme Disertatie 2019
No ratings yet
Teme Disertatie 2019
16 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Deep Unsupervised Learning

Uploaded by

Deep Unsupervised Learning

Uploaded by

Unsupervised Deep Learning

NeurIPS, 3 December 2018

● Introduction to unsupervised learning

Active Reinforcement Learning / Intrinsic Motivation /

Passive Supervised Learning Unsupervised Learning

Active Reinforcement Learning / Intrinsic Motivation /

Passive Supervised Learning Unsupervised Learning

If our goal is to create intelligent systems that can succeed at a wide

1. Targets / rewards can be difficult to obtain or define.

If our goal is to create intelligent systems that can succeed at a wide

1. Targets / rewards can be difficult to obtain or define

If our goal is to create intelligent systems that can succeed at a wide

1. Targets / rewards can be difficult to obtain or define

If our goal is to create intelligent systems that can succeed at a wide

1. Targets / rewards can be difficult to obtain or define

Stop learning tasks, start learning skills – Satinder Singh

If intelligence was a cake, unsupervised learning would be the cake,

● (Still) the dominant paradigm in

● Basic challenge of unsupervised

– Karl Ove Knausgaard, A Death in the Family

What I cannot create, I do not understand

Slide Credit: Piotr Mirowski

Abu Dhabi is going ahead to build solar city

Or someone who exposes exactly the truth while lying.

VIERA , FLA . -- Sometimes, Rick Eckstein dreams about baseball swings.

The lawsuit was captioned as United States ex rel.

R. Jozefowicz et. al. Exploring the Limits of Language Modeling (2016)

J. Menick et. al. Generating High Fidelity

J. Menick et. al. Generating High Fidelity

Kalchbrenner, N., et al. “Video Pixel Networks.” ICML (2017).

A. Graves, Generating Sequences with Recurrent Neural Networks (2013)

Carter et. al., Experiments in Handwriting with a Neural Network (2016)

● Deep networks work by learning complex, often

● Task-driven representations are limited by the requirements of

Chen at. al., Variational Lossy Autoencoder (2017)

● Experiment 1: An MNIST Decoder learns a uniform mixture over 10

● Experiment 2: Pick 100 character strings at random from an encyclopedia.

● For an autoencoder, the difference between decoding x with z and

t-SNE on codes coloured by speaker identity

Representation Learning with Contrastive Predictive Coding General Artificial Intelligence

Representation Learning with Contrastive Predictive Coding General Artificial Intelligence

Reward Prediction – given three recent

Single scalar reward signal Many reward signals

Representation Learning with Contrastive Predictive Coding General Artificial Intelligence

Representation Learning with Contrastive Predictive Coding General Artificial Intelligence

● Unsupervised learning can guide the policy of an RL

Pathak et. al. Curiosity-driven Exploration by Self-supervised Prediction (2017)

One way to maximise mutual information is to classify the high level

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.