Anomaly Detection
Anomaly Detection
1. Density Estimation
I would like to give full credits to the respective authors as these are my personal python
notebooks taken from deep learning courses from Andrew Ng, Data School and Udemy :)
This is a simple python notebook hosted generously through Github Pages that is on my
main personal notes repository on https://github.com/ritchieng/ritchieng.github.io. They are
meant for my personal review but I have open-source my repository of personal notes as a
lot of people found it useful.
o Density estimation
Other anomaly detection examples
1
o If you have too many false positives
Detecting positives when they are not
Decrease
ε
o Area under the graph (red shaded area) must always equate to 1
Parameter estimation
2
o m might be (m - 1)
In practice, it makes very little difference
In machine learning, most people typically use (1 / m)
Now we will use the Gaussian distribution to develop an anomaly detection algorithm
1c. Algorithm
Density
estimation
3
Anomaly detection
algorithm
they’re anomalous
4
o When developing a learning algorithm (choosing features etc.), making
decisions is much easier if we have a way of evaluating our learning
algorithm
o Assume we have some labeled data, of anomalous and non-anomalous
examples
y = 0 if normal
y = 1 if anomalous
o Training set (x1, x2, …, xm)
Assume normal examples, not anomalous
o Cross validation set (xcv_1, xcv_2, …, xcv_m)
o Test set (xtest_1, …, xtest_m)
Aircraft engines example
o 10,000 good (normal) engines
o 20 flawed (anomalous) engines
Training set: 6000 good engines
This will be used to fit p(x)
CV: 2000 good engines (y = 0), 10 anomalous (y = 1)
Test: 2000 good engines (y = 0), 10 anomalous (y = 1)
Algorithm Evaluation
o Because y = 0 is more common, there is a skewed data set
Hence, classification metric is not appropriate
5
| Anomaly Detection | Supervised Learning | |———-|————-| | Very small number of
positive examples (y = 1 such that 0-20) | Large number of positive and negative examples |
| Large number of negative examples (y = 0) | | | Many different types of anomalies. Hard for
any algorithm to learn from positive examples what the anomalies look like; future
anomalies may look nothing like any of the anomalous examples we have seen so far. |
Enough positive examples for algorithm to get a sense of what positive examples are like,
future positive examples likely to be similar to ones in training set. | | Fraud Detection |
Email Spam Classification | | Manufacturing | Weather Prediction | | Monitoring machines in
a data center | Cancer Classification |
6
How do we choose features?
o Choose features that might take on unusually large or small values in the
event of an anomaly
o Example: monitoring computers in a data center
The new feature x5 would take a very large value when there is a
huge CPU load but low network traffic
This way you can catch anomalies
3.
7
Multivariate Gaussian (Normal) Distribution
o Covariance matrix, Σ
Varying two elements (diagonal) - variance
If you reduce sigma, sharper gaussian
If you increase sigma, wider gaussian
8
o Mean matrix, μ
Varying the μ parameter shifts the center of the distribution
9
3b. Anomaly Detection using Multivariate Gaussian Distribution
Multivariate gaussian distribution
10
o The original model is actually a special case of the multivariate gaussian
model
o Try to get rid features that are linearly dependent and duplicate features for
multivariate gaussian model
11