0% found this document useful (0 votes)
188 views

Lecture 1 - Intro

This document provides an overview of the course CE4172 - Tiny ML. The key points are: 1. The course teaches how to deploy machine learning models on microcontrollers with limited memory and power. It focuses on using TensorFlow Lite for microcontrollers like those in the Arduino Nano 33 BLE Sense board. 2. Students will learn how to develop embedded systems, understand machine learning architectures, and use relevant tools to perform on-device analytics with sensors. 3. The course involves building simple machine learning models, training them, converting models to TensorFlow Lite, implementing applications, and deploying to microcontrollers. Labs and a course project applying these skills are required.

Uploaded by

Yi Heng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
188 views

Lecture 1 - Intro

This document provides an overview of the course CE4172 - Tiny ML. The key points are: 1. The course teaches how to deploy machine learning models on microcontrollers with limited memory and power. It focuses on using TensorFlow Lite for microcontrollers like those in the Arduino Nano 33 BLE Sense board. 2. Students will learn how to develop embedded systems, understand machine learning architectures, and use relevant tools to perform on-device analytics with sensors. 3. The course involves building simple machine learning models, training them, converting models to TensorFlow Lite, implementing applications, and deploying to microcontrollers. Labs and a course project applying these skills are required.

Uploaded by

Yi Heng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Assoc Prof Nicholas Vun

School of Computer Science


And Engineering

CE4172 - Tiny ML
Course Overview
What is TinyML about?
Deployment of Machine Learning models on tiny devices
• low cost microcontrollers with kiloBytes range (kB) of memory with
integrated sensors
• bare metal system - no operating system support
• very low power consumption

This course is based on


• TensorFlow Lite for Microcontrollers
• ARM Cortex-M family

In this course
• Arduino Nano 33 BLE Sense board
• Cortex M4 microcontroller with several sensors
• 256kB of RAM 1
ML vs TinyML
Machine Learning
• detect complex events that rule-based system will struggle to identify
• usually need large amount of data to train
• run on cloud (Data Centres) with high performance computers using GPU,
Accelerator with large memory and storage
TinyML
• on very low cost, and ‘low’ computing horsepower microcontroller
• with small amount of data (for inference) don't do the training on the microcontroller itself
• small amount of memory (< 1 MB, down to <100KB)
• no (need) network connectivity to cloud
• no transfer of data - save power
• no transfer of data – data privacy
• no transfer of data - very short latency - fast in response
• run on very energy efficient ‘computer’
• target : mW with 10s to 1000mAh battery (last for years)
2
• can be deployed anywhere without any maintenance
What does study of TinyML involves?
• Embedded system development – Hardware and software
• Understanding of Machine Learning architectures
• Learning the relevant techniques and tools
In order to deploy on low cost low power devices and systems to
• perform on-device analytics for a variety of sensing methods in
physical world (E.g. phones, cars, building, industry machines, home
appliances, medical devices, human bodies)
• Audio
• Vision (image and video)
• Motion
• Chemical
Development of low-cost low-power smart devices
IoT + AI ≡ AIoT 3
Some (Possible) Use-case Examples
• Wake detection in daily devices – by voice, by motion, by vision
• Monitoring health by using wearable devices (heartbeat, blood pressure,
body weight etc)
• Detection and identification of human presence and condition
• Monitoring premises security
• Monitoring of industrial machinery for maintenance to avoid downtime
• Detection of defect parts in factory production
• Monitoring of stock on shelves in supermarket/departmental store
• Smart meters to save energy consumption
• Monitoring the number of customers waiting in the queue to improve
customer service
• Monitoring of environment and endangered animals
All perform
• without connecting to the network – fast response and data privacy
4
• by machine self-learning - without the need of domain knowledge
Aside: Domain Knowledge vs Machine Learning
Heuristic - a function that ranks alternatives in search algorithms at each
branching step based on available information to decide which branch to
follow (Wikipedia)
• typical needs domain knowledge from experience or provided by
Expert/Specialist - e.g., doctor

An example:
During the 2012 ImageNet computer image recognition competition
• Alex Krizhevsky used machine to implement a based deep
learning algorithm
• First time that machine learning based algorithm beats
handcrafted software written by computer vision experts.
5
What we will do in this course
• Basic introduction of ML and Deep Learning (Neural Network)
• Building a “Hello World” equivalent of TinyML
• Trainings of the model
• Converting model for TensorFlow Lite
• Implementing an application with the model
• Deploying to Microcontrollers
• Case Studies
• Optimization Techniques

Lab exercises:
Hands-on exercises to practice the above on the embedded device
Course project (Open ended):
To design and implement a TinyML application using the embedded
6
device
What we will use
Device
Arduino Nano 33 BLE Sense

Programming Language
Python, C++

Tools and Platform (Either Windows or Linux)


Anaconda
• Jupyter Notebook
• TensorFlow and Keras
• TensorFlow Lite
Arduino IDE
Netron
7
What you will learn in this course
Course Aim:
Students will learn the techniques to implement machine learning on
resource constrained devices that are to be deployed as smart IoT
devices which form the crucial end components in Edge computing.

Intended Learning Outcome:


Upon the successful completion of this course, students shall be able to:
• build deep learning based models for IoT applications
• train the model and deploy inference engine on Microcontroller
• analyze, evaluate and conduct performance optimization to support
highly efficient smart IoT implementations
• develop smart IoT devices based applications

8
Course plan and Assessment
Weekly Lecture
• Tuesday 1130am-130pm
• Pre-recorded videos (if needed)

Tutorial/Lab* (Exact schedule will be announced)


• Wednesday 230pm-430pm*
• Venue: Software Lab 1

Assessment:
• Exam (1-hr) – 30%
• Lab Quizzes (two) – 20%
• Course project – 50%

9
Project Assessment Rubric
Assessment will be based on demonstration/presentation, a (short) report and
code submission
1. Features (complexity and novelty) – 10%
2. Model Training process - 5% (Bonus mark)
3. Inference performance - 15%
4. Optimization techniques used in the implementation – 10%
• Code size minimization
• Real time performance
• Power efficiency optimization Demos
5. Demo/Presentation and report – 15%

System will be based on the Arduino Nano Sense board provides


• can be interfaced with extra sensor(s)
• can be used to control a system (e.g., mounted on a robot)

Submission deadline: Monday of Week 12 10


Main References
a) TinyML, Peter Warden & Daniel Situnayake

b) TinyML Cookbook, Gian Marco lodice

c) TensorFlow Lite for Microcontroller framework,

https://www.tinyml.org/

NTU Library e-books 11


CE4172: IoT – TinyML

A bit about
Artificial Intelligence

13
AI History and its Renaissance

ARTIFICIAL INTELLIGENCE: Machines that are capable of performing tasks that typically require
human intelligence

1950: ‘Artificial Intelligence’ was first coined


LISP programming language and computers

1973: The first AI Winter

1982: The second Spring


Japan’s Fifth Generation Computer Project

1985: The second Winter

2000: The Renaissance

Present: The Golden Age?

1950 1970 1980 2000 2010


14
AI Present Day Renaissance

Powerful computers:
Become widely available, such as
Cloud computing and GPU

Big Data:
Availability of large amount of data
due to internet and smart mobile
phones

1950 1990 2010


Software Algorithms:
Machine Learning, Deep Learning
15
AI Implementations

Rule-based expert systems


Fuzzy logic
Need data

Supervised Learning with labelled


data (e.g., regression and
classification)

Unsupervised Learning without


labelled data (e.g., clustering)

1950 1990 2010


Reinforcement Learning with the
use of ‘agent’ that learns to maximise
‘reward’ in an environment

16
AI Implementations – Deep Learning

Implementation of ML based on Deep


Neural Network that mimics the
human brains
Artificial Neural Network (ANN)
Classifying numbers-based data

Convolution Neural Network (CNN)


Classifying images

Recurrent Neural Network (RNN)


Time series data (e.g., audio)
1950 1990 2010

Deep Reinforcement Learning

Source: https://ai.plainenglish.io/ Transfer Learning 17


Neurons and Neural Network

Hidden Layer
Input layer
• consists of learnable
parameters (the neurons)
Hidden Layer • the ‘algorithm’ that can
learn and improve by
itself
output layer

Mimics the human brain to recognize pattern


• ability to learn

18
Deep Neural Network for Deep Learning
Input layer
Deep Neural Network
• multiple layers of hidden layers
• much more sophisticated
Hidden algorithms can be learnt
Layers

output layer

19
Aside: AlexNet

At the 2012 ImageNet computer image


recognition competition:
• Alex Krizhevsky used machine to
implement machine learning based
deep learning algorithm (CNN)

First time that machine learning based


algorithm
• beat, by a huge margin, handcrafted
software written by computer vision
domain experts.

21
CNN for Image Recognition

(Pooling)

(More on this later)


22
Aside: Image (2D) Convolution in CNN
Applying ‘convolutions’ to an image consists of 7x7 pixels
• Correlation operations using dot products

23
E.g. Face Detection Training and Inference

25
Aside: The Future of Deep Learning
The future of deep learning
(According to its pioneers and Turing Award Winners
Yoshua Bengio, Yann Lecun, Geoffrey Hinton
Communications of the ACM, July 2021, Vol. 64 No. 7, Pages 58-65)
1. Supervised learning requires too much labelled data
2. Reinforcement learning requires far too many trials.
3. Current systems are not as robust to changes in distribution as humans, who
can quickly adapt to such changes with very few examples.
4. Current deep learning is most successful at perception tasks and generally what
are called system 1 tasks (Image classification, Object Detection and Face
recognition)
Using deep learning for system 2 tasks that require a deliberate sequence (e.g., to
code a program) of steps is an exciting area that is still in its infancy.

But with ChatGPT ….. 26


IoT
and
Machine Learning

27
Microcontrollers
Microcontrollers are single-device computers that are highly integrated
with peripherals and (nowadays) various sensors
• used in embedded systems for specific functions
E.g. car, mobile phone, notebook, webcam,
Many are based on 8-bit processors but increasingly
come with 32-bit processors (ARM Cortex-M family)

Main features
• small physical size and low cost
• power efficiency
• but resource constraint
• small amount of memory (< 1MB)
• ‘low’ computing horsepower
• usually don’t support OS (no MMU) 28
Microcontroller Market

29
Microcontroller for IoT
Internet of Things (IoT) are devices based on microcontrollers
• incorporate with sensors to sense the surrounding environments
• with network connectivity

IoT devices can pass data to each other


• to coordinate and perform ‘smart’ function and be more responsive

Examples:
Modern jet engine that's filled with thousands of sensors collecting
and transmitting data back to make sure it is operating efficiently.

A motion sensor at home that detects motion sending messages to turn on


light bulb, security camera and inform house owner
• but what about the motion is caused by a cat? 30
IoT and AI
Global IoT market to grow from 7.6 billion in 2019 to o 24.1 billion devices
in 2030
• generating $1.5 trillion in revenue.
(Source: Transforma Insights)
https://transformainsights.com/news/iot-market-24-billion-usd15-trillion-revenue-2030)

Many will deploy with AI/ML technologies


• boost performance
• more secure
• more reliable

Example: Networked smart sensors in factory


• collect operational and performance data analysis
• improve production system performance
• predict machines service requirement 31
IoT key requirements
Very energy efficient such that it can be deployed anywhere without any
maintenance.
But transferring of data through network requires the most energy
• also take time to transfer
• long latency
Excessive capturing of data
• need more memory
• need to transfer more often
Hence need to operate more intelligently
• by processing and analyzing the data locally without transmission
• and achieve better data privacy

Is it possible to incorporate ML (Deep Learning) on memory constraint


microcontroller? 32
Memory bound and Compute-Bound
Application computations performance can be
• memory bound or compute bound
Memory bound
• time taken to complete a computation depend on the amount and
access speed of memory that hold the data for processing
Compute-bound
• time taken to complete a computation depend on the speed of the CPU
Deep Neural Network performing Inference
• mainly doing multiplication of matrices
• same data are used repeatedly in different combinations
• low memory requirements (tens to hundreds of KBs)
• i.e., mainly compute-bound
• well-suit for latest generation of 32-bit microcontrollers 33
ML Training and Inference

35
Deep Learning
Workflow

36
Training and Building a Deep Learning based Model
The following are typical steps involved in developing a model based on
machine learning (deep learning)

1. Decide on a goal
2. Select, collect and label a dataset
3. Design a model architecture
4. Train the model using the dataset
5. Convert the model
6. Run the inference
7. Evaluate and troubleshoot

37
Decide on a goal
Need to first decide what we want to predict
• to enable us to decide
• what dataset to collect
• which model architecture

Example:
Predicting whether a factory machine is about to break down
• classification problem with two possible outcome
• Normal
• Abnormal

38
Selecting a Dataset
What data to collect?
• only select those that are relevant
• ignore obvious irrelevant data
Example:
Predicting whether a factory machine is about to break down
• Temperature of the machine – yes
• Vibration of the machine – Yes
• Noise emitted by the machine – Yes (but may be related to vibration)
• Speed of operation - ?
• Duration from last service – ?
• Ambient noise in the factory – No
Usually need domain expertise and perform some experiments to decide
• and ease of collecting the data 40
Collecting Data
After deciding on the dataset to collect
• how much data to collect?
• rule of thumb - the more the better.

Example:
Predicting whether a factory machine is about to break down
• collect data that represent the full range of conditions
• data correspond to different possible modes of failures(?)
• from summer to winter
• collect as a set of time series data
• vibration level every 10 seconds
• temperature every minute
• rate of production once every two minutes
41
Labelling Data
After collecting the dataset
• label the data as normal and abnormal
• assume ‘just before’ failure = abnormal

Example:
Predicting whether a factory machine is about to break down

42
Source: “TinyML” by by Pete Warden & Daniel Situnayake
Which Model Architecture to use
Many possibilities, depending on
• type of problem
• type of data available (e.g. time series)
• data transformation (e.g. in frequency domain)
• personal experience (i.e. domain knowledge)

Also consider the resource constraint of the device to deploy the model
• memory size limitation
• processor capability
• microcontroller special features (e.g. hardware accelerator)

Start with a simpler model (less layers of neurons)


• iterate until we obtain a useful result
44
E.g., Time Series Data and Transformation (Keyword Detection)

45
Aside: Tensor
A tensor is a list (array) that contains either numbers, or other tensors
Examples:
Vector is a 1D tensor (5,) : [1 2 3 4 5]

[ [1 2 3 4]
Matrix is a 2D tensor (3,4) : [5 6 7 8]
[9 8 7 6] ]

[[[…]
[…]
[…]]
Higher Dimension tensors (2, 3, x): [ [ … ]
[…]
[…]]]
46
Features and Window
Feature – particular type of information that the model will accept for
training
• a 1-D tensor with 3 entries may represent 3 features
• images may be represented with higher dimension tensors

Example: Factory Machine


Feature = [Production, Temp, Vibration] = [102, 34, 0.18]

The feature contains time-series of unequal interval value


• choose a Window of 1 minute and take their average values during
the interval 47
Normalization
Input tensors are usually filled with floating-point values
• simpler for the training algorithms if all values are similar in size

Normalization
• change the value to within the range of 0 to 1

Example: Images with a 3 x 3-pixel grayscale image (0-255)


Normalized by multiplied by 1/255

48
Source: “TinyML” by by Pete Warden & Daniel Situnayake
Training the Model
The process for the model to learn
• to produce the correct output for a given set of input

Training data are input into the model


• model predicts the output value
• compare the predicted output against the expected output
• calculate the error
• make small iterative adjustment to the model’s parameters until the
error is acceptable (Backpropagation)

Each neuron in the neural networks contains two parameters


• weight and bias

52
A 2-layer Deep Learning Network
Input Layer 1 Layer 2

Output

Biases
53
Neuron Structure
A neuron produces a single output through a linear transformation
• weighted sum of the inputs
• plus a constant value called Bias

(Source – TinyML Cookbook)

But can only solve simple linear problems with linear transformations
54
Activation Functions for Neuron
Add a non-linear function on the neuron's output
• to help the network able to learns complex pattern
• Example: Rectified Linear Unit (Relu)

(More on this later)

(Source – TinyML Cookbook)

55
Various terms used in Model Training
At the beginning
• weights are assigned with random values
• biases typical start with a value of 0

During training
• data are fed in batches into the model
• number of data to be processed before updating the parameters

Adjustment of parameters is done by an algorithm call Backpropagation

Epochs is the number of iterations that the learning algorithm will work
through the entire training dataset.

56
ML Training and Inference

error
Backward

57
How many Epochs should we use?
We stop when the model’s performance stop improving

Metrics used to evaluate the model performance


• Loss and Accuracy

Loss
• How far the model is from predicting the expected output

Accuracy
• Percentage of time the model make the right predictions

If the model output begins to make accurate prediction against expected


output
• convergence has occurred 58
Loss and Accuracy Graph for Convergence

To improve the model’s


performance
• change the various
values used to set up
the model
• change the number
of layers
• change (increase)
the number of
EPochs

59
Source: “TinyML” by by Pete Warden & Daniel Situnayake
Underfitting and Overfitting
Two most common reasons a model fails to converge
• Overfitting and Underfitting
Underfit
• model has not yet been able to learn to make a good prediction
• typically due to too small architecture (i.e. simple model) to capture
the complexity of the system
• Try to increase the complexity of the model architecture

Overfit
• model can predict perfectly based on its training data
• but not able to perform well for data not previously seen
• can be due to memorizing, or ‘short cut’ present in the training data
• E.g. recognising of texts that are embedded in the training data
• Try to get more and varied dataset 60
Overfitting

Overfitting pattern:
increasing loss during
the training using the
validation data

61
Source: “TinyML” by by Pete Warden & Daniel Situnayake
Training, Validation and Testing
Dataset are split into 3 groups during training to check the performance of
the model against overfitting
1. Training Data (e.g. 60%)
2. Validation Data (e.g. 20%)
3. Testing Data (e.g. 20%)

Training is done using the Training data


• validation data is fed through the model periodically to calculate its loss
• provide a better representative of the performance because the model
has not seen the data before
• adjust the model to improve validation performance

Once the validation performance is acceptable


• use the test data to do final confirmation 62
Converting the Model for Microcontroller
Normally the model can be deployed once the Training is completed
• the model is loaded into the memory and executed by a TensorFlow
interpreter

But TensorFlow interpreter is too big for microcontroller used in TinyML


• need a lightweight version of the interpreter and tools
• TensorFlow Lite

Need to convert the TensorFlow based model to the TensorFlow Lite


format
• using the TensorFlow Lite Converter tool

64
Deployment - Running Inference
The model can now be integrated into the application programme
• using the TensorFlow Lite for Microcontroller library
• C++ programming language.

Example: Predicting whether a factory machine is about to break down

66
Source: “TinyML” by by Pete Warden & Daniel Situnayake
Field Run and Evaluation
Does it work to expectation?

If not, what are the possible cause?


• dataset used in training is not exactly representative of data in real
operation (e.g. due to ambient temperature)
• collect new data when operating under the new condition
• overfitting
• collect more data for training

68

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy