0% found this document useful (0 votes)

6 views

Basics of Machine Learning1

The document discusses the development of machine translation systems for Nigerian languages, emphasizing the importance of machine learning (ML) in various applications such as web search and automatic translation. It covers basic ML concepts, including types of learning, preprocessing techniques, feature selection, and the significance of data in machine learning. The presentation outlines the process of integrating and interpreting data to enhance the performance of machine learning models.

Uploaded by

Oluwa Muyiwa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Basics of Machine Learning1

Uploaded by

Oluwa Muyiwa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

Developing Machine Translation Systems for Nigerian Languages:

The Federal University of Technology, Akure

Basics of Machine Learning

Adetunmbi, Adebayo Olusola (Ph.D, mcpn, mieee)

aoadetunmbi@futa.edu.ng
Presentation Outline
• Introduction

• Data Everywhere

• Basic Machine Learning Concepts

• Conclusion

• References
Introduction
• Machine learning has come to stay and touches our daily life in many
ways and its being used without knowing.

• Application areas:
- Effective web search
- Automatic translation of documents
- Image recognition
- Information security (e.g. access control)
- Number and Speech recognition
- Collaborative filtering
Introduction

• Classical modelling and analysis

- based on first principle of mathematical modelling to estimate
parameters difficult to measure.
- sometimes system understudy too complex to model
mathematically.

. Learning from data – i.e. Machine Learning (ML)

ML could be used to discover knowledge from data.
emanates from Artificial Intelligence
Data Everywhere
 Agriculture
 Banks
 Drivers License
 Government Parastatals
 Higher Institutions – Universities, Polytechnics, Colleges, etc
 Hospitals
 Industries
 Primary and Secondary Schools
 National ID card
 Social Media – Twitter, facebook, etc

What is the essence of generating unused Big or Huge Data?

Machine Learning
ML, a product of many disciplines (Figure 1)

Figure 1: Discipline of ML
Machine Learning
 The primary aim of machine learning is to allow the computers to learn
automatically without human intervention or assistance and adjust actions
accordingly.

 Modern machine learning is a statistical process that starts with a body of data and
tries to derive a rule or procedure that explains the data or can predict future data.

This approach—learning from data—contrasts with the older “expert system”

approach to AI, in which programmers sit down with human domain experts to learn
the rules and criteria used to make decisions, and translate those rules into software
code.
Traditional Programming

Data
Computer Output
Program
Machine Learning

Data
Computer Program
Output

Cousera: data science and machine learning

Machine Learning

 Hundreds new every year

 Every machine learning algorithm has three components:
• Representation
• Evaluation
• Optimization
Representation
 Decision trees
 Sets of rules / Logic programs
 Instances
 Graphical models (Bayes/Markov nets)
 Neural networks
 Support vector machines
 Model ensembles
 Etc.
Evaluation

 Accuracy
 Precision and recall
 Squared error
 Likelihood
 Posterior probability
 Cost / Utility
 Margin
 Entropy
 K-L divergence
 Etc.
Optimization
Combinatorial optimization
 E.g.: Greedy search
Convex optimization
 E.g.: Gradient descent
Constrained optimization
 E.g.: Linear programming
Types of Learning
Supervised (inductive) learning
 Training data includes desired outputs
Unsupervised learning
 Training data does not include desired outputs
Semi-supervised learning
 Training data includes a few desired outputs
Reinforcement learning
 Rewards from sequence of actions
Inductive Learning

 Given examples of a function (X, F(X))

 Predict function F(X) for new examples X
 Discrete F(X): Classification
 Continuous F(X): Regression
 F(X) = Probability(X): Probability estimation
What We’ll Cover
 Preprocessing

 Feature Selection

 Supervised learning

 Unsupervised learning
ML in Practice

• Understanding domain, prior knowledge, and goals

• Data Integration & Selection: Obtain data from various sources.
• Preprocessing: Cleanse data.
• Transformation: Convert to common format. Transform to new
format.
• learning Models: Obtain desired results.
• Interpreting results: Present results to user in meaningful manner.
• Consolidating and deploying discovered knowledge
16
Preprocessing (outliers)
(a) Outlier detection (and removal): These are unusual data values that are not
consistent with most observations.
• Causes: measurement errors, coding and recording errors, and abnormal values.
Strategies for dealing with outliers:
(i) Detect and remove outliers, or
(ii) Develop robust modeling methods that are insensitive to outliers.
Preprocessing (Missing Data)
• Missing data occurs due to the absence of data items that hide some unknown information
about the data which may be important.
• negative impact on the performance of classification algorithms
• Peng and Lei (2004) opined that:
- 1% missing data are considered trivial;
- 1-5% of missing data is manageable in a dataset;
- 5-15% requires sophisticated methods to handle;
- above 15% missing data in a dataset may severely impact the real representation
and interpretation of the entire dataset.
Causes of Missing data in dataset could be caused by different factors such as;
• Incorrect measurements as a result of faulty equipment
• missing values due to human errors during data collation stage
Missing Data Treatment Methods
 Case Deletion Approach - ignores cases with missing data and performs analysis on the
 Mean/Mode Imputation Approach – replaces missing data with either mean (numeric attribute) or mode
(nominal attribute) of all cases observed or for a given attribute using mean or mode of all known values of
that attribute in the class where the instance with missing data belongs.

 All Possible Values Imputation – replaces the missing data for a given attribute by all possible values of
that attribute or by all possible values of the attribute in the class.

 Regression Imputation Approach – replaces missing data with estimated values. After whicg, the
variables are used to make a prediction and the predicted value is replaced as an actual obtained value.

 A forward fill or backfill method can be used to propagate the previous value forward or the next value
backward for missing value replacement

 Missing data treatment with C4.5

 Hot deck imputation approach

 K-Nearest Neighbor Imputation (kNN)

Pre-Processing (Data Discretization/Normalization)
• Data discretization converts nominal and non-nominal attributes to the
discrete value.
𝑽(𝒊)
• Decimal Scaling 𝐕 𝐢 = ′
𝟏𝟎𝒌
(1)
For the smallest k such that max(|v’(i)|) < 1
Min – Max Normalization
𝑉 𝑖 −𝑚𝑖𝑛( 𝑉 𝑖 )
V′ i = (2)
(𝑚𝑎𝑥 𝑉 𝑖 −𝑚𝑖𝑛 𝑉 𝑖 )

Standard Deviation Normalization

𝑉 𝑖 −𝑚𝑒𝑎𝑛(𝑣)
V′ i = (3)
(𝑆𝑑(𝑣)
Pre-Processing (Data Discretization/Normalization
• Among others include:
• Equal-width
• Entropy
• Binarization is implemented by assigning a threshold value derived from
calculating the mean of the attribute values within each feature to derived a
Boolean value.

𝑛 𝑥
• 𝐴𝑣 = 𝑥=𝑖 𝑛 (4)
where Av is the average number
• n is the number of items
• x is the total sum of all number
Table 1: Training-Sample Dataset
Id duration State Dwin trans_ spkts dpkts sbytes dbytes Rate Output
Depth (Label2)
1 0.000011 INT 0 0 2 0 496 0 90909.09 Normal
2 0.000008 INT 0 0 2 0 1762 0 125000 Attack
3 0.000005 INT 0 0 2 0 1068 0 200000 Attack
4 0.000006 INT 0 0 2 0 900 0 166666.7 Attack
5 0.00001 ACC 1 1 2 0 2126 0 100000 Attack
6 0.000003 INT 0 0 2 0 784 0 333333.3 Attack
7 0.000006 INT 0 0 2 0 1960 0 166666.7 Attack
8 0.000028 ACC 1 0 2 0 1384 0 35714.29 Attack
9 0 ACC 1 1 1 0 46 0 0 Attack
10 0 ACC 1 0 1 0 46 0 0 Attack
11 0 ACC 1 0 1 0 46 0 0 Attack
12 0 ACC 1 0 1 0 46 0 0 Attack
13 0.000004 INT 0 0 2 0 1454 0 250000 Normal
14 0.000007 INT 0 0 2 0 2062 0 142857.1 Normal
15 0.000011 INT 0 0 2 0 2040 0 90909.09 Normal
16 0.000004 INT 0 0 2 0 1052 0 250000 Normal
17 0.000003 INT 0 0 2 0 314 0 333333.3 Normal
18 0.00001 INT 0 0 2 0 1774 0 100000 Normal
19 0.000002 INT 0 0 2 0 1568 0 500000 Attack
20 0.000005 INT 0 0 2 0 1658 0 500050 Attack
Table 2 Binarization Formation of Training-Sample Dataset
BINARIZATION
Attributes Minimum Average Maximum 0 1
dur 0 1.006756146 59.999989 below 1 above 1

proto 131 unique discretized values

service 13 unique discretized values
state 7 unique discretized values
spkts 1 18.66647233 10646 below 19 above 19

dpkts 0 17.54593597 11018 below 18 above 18

sbytes 24 7993.908165 14355774 below 7994 above 7994

dbytes 0 13233.78556 14657531 below 13234 above 13234

rate 0 82410.88674 1000000.003 below 82411 above 82411

Table 3: Discretized Training-Sample
Id dur State dwin trans_ spkts dpkts sbytes dbytes rate Output
depth Label2)
1 0 0 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 0 1 1
3 0 0 0 0 0 0 0 0 1 1
4 0 0 0 0 0 0 0 0 1 1
5 0 1 1 1 0 0 0 0 1 1
6 0 0 0 0 0 0 0 0 1 1
7 0 0 0 0 0 0 0 0 1 1
8 0 1 1 0 0 0 0 0 1 1
9 0 1 1 1 0 0 0 0 0 1
10 0 1 1 0 0 0 0 0 0 1
11 0 1 1 0 0 0 0 0 0 1
12 0 1 1 0 0 0 0 0 0 1
13 0 0 0 0 0 0 0 0 1 0
14 0 0 0 0 0 0 0 0 1 0
15 0 0 0 0 0 0 0 0 1 0
16 0 0 0 0 0 0 0 0 1 0
17 0 0 0 0 0 0 0 0 1 0
18 0 0 0 0 0 0 0 0 1 0
19 0 0 0 0 0 0 0 0 1 1
20 0 0 0 0 0 0 0 0 1 1
Feature Selection
A Feature Selection (FS) is a method of identifying the relevant features in a data set.

Hence, remove noise (redundant features) and reduce the computational time

There are different feature selection methods, Discussions are limited to Entropy
Information gain and correlation-based feature selection (CFS).

Entropy information gain per attribute in Table 3 is computed as stated in equations (5

and (6).
𝑛
Info D = H(S) = − 𝑖=1 𝑃𝑖 𝐿𝑜𝑔2 𝑃𝑖 (5)

𝑆𝑉
Gain (S,A) = H (S) - 𝑣∈(𝐴) 𝑆 𝐻 𝑆𝑣 (6)
where v is the possible values of A, Sv is subset of S where the value of A = v, Pi is the
proportion of instances with class i, a class is a category to which an instance may belong
with a certain probability
Feature Selection
The split information is calculated using equation (7) while Gain Ratio is computed using
equation (8).

𝑟 𝑆𝑖 𝑆𝑖
 Split info (S, A) = 𝑖=1 𝑆 𝑙𝑜𝑔2 (7)
𝑆

Gain info
 Gain Ratio = (8)
𝑆𝑝𝑙𝑖𝑡 𝐼𝑛𝑓𝑜

 From Table 3, generates the description of subset grouping of the dataset for features
state, dwin, trans_depth, and output

 Computation of Information Gain, Split info and Gain Ratio

Table 4: Attribute information of Training-Sample Dataset

S/No Feature Attribute set

1 State 2
2 Dwin 2
3 trans_depth 2
5 Output 2
Using the spectral information model of equations (5)
From Table 4, the numbers of yes = 1 is 13, while the total number of records = 20, the calculation is shown
below
13 13 7 7
− 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = − (0.65 ∗ 0.62) − (0.3 ∗ 1.74) = − (−0.403) − (0.522)
20 20 20 20

= + 0.403+0.522 = 0.925
To further calculate gain values for each features using equation (3.3), we have;
14 7 7 7 7 6 6 6 0 0
1. Info (State) = 20 𝑙𝑜𝑔2 14 − 𝑙𝑜𝑔2 14 + 20 𝑙𝑜𝑔2 6 − 𝑙𝑜𝑔2 6 =0
14 14 6 6
Gain state = Info(D) - Info(State) = 0.925 – 0 = 0.925
14 7 7 7 7 6 6 6 0 0
2. Info (Dwin) = 20 𝑙𝑜𝑔2 14 − 𝑙𝑜𝑔2 14 + 20 𝑙𝑜𝑔2 6 − 𝑙𝑜𝑔2 6 =0
14 14 6 6
Gain Dwin = Info(D) - Info(Dwin) = 0.925 – 0 = 0.925
18 11 11 7 7 2 2 2 0 0
3. Info (Trans_d) = 20 𝑙𝑜𝑔2 18 − 𝑙𝑜𝑔2 18 + 20 𝑙𝑜𝑔2 2 − 𝑙𝑜𝑔2 2
18 18 2 2
= 0.8681
Gain Trans_d = Info(D) - Info(Trans_d) = 0.925 – 0.8681 = 0.0569
To calculate split info for each features using formula split info in equation (7)
And to calculate Gain Ratio we use equation (8)
7 7 6 6
1. Split info (State) = 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 = 0.5 log20.5 + 1 log21= 0.5 *-1+0
14 14 6 6
= - 0.5
Gain State 0.925
Gain Ratio (State) = = = - 1.85
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜 𝑆𝑡𝑎𝑡𝑒 −0.5
7 7 6 6
2. Split info (Dwin) = 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 = 0.5 log2 0.5+1 log21 = 0.5 *-1+0
14 14 6 6
= - 0.5
Gain State 0.925
Gain Ratio (Dwin) = 𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜 𝑆𝑡𝑎𝑡𝑒
= −0.5
= - 1.85
11 11 2 2
3. Split info (Trans_d) = 𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 = 0.6111 log2 0.6111 + 1 log2 1
18 18 2 2
= 0.6111 x-0.7111 + 1 log2 1 =-0.4345 + 1 log2 1 = 0.4345
Gain trans _d 0.0569
Gain Ratio Trans_d = SplitInfo trans _𝑑
= 0.4345 = 0.1310
Table 6 gives the result, Trans_d is the most valuable feature.

Table 6 Rank of Training-Sample Dataset Features

Spectral information model Info (D) =0.925
State Dwin Trans_d

Info 0 0 0.8681
Gain 0.925 0.925 0.0569

Split info - 0.5 - 0.5 0.4345

Gain Ratio - 1.85 - 1.85 0.1310

Correlation Coefficient Feature Selection
Pearson correlation coefficient equation given in Equation (9) will be computed to determine

correlation between each features.

𝑁 𝑋𝑌−( 𝑋)( 𝑌)
𝐶𝐹𝑆 = (9)
(𝑁 𝑋 2 −( 𝑋 2 )(𝑁− 𝑌 2 −( 𝑌) 2)

Where N is the number of features_value (pairs of scores)

X is the Parameter 1
Y is the parameter 2
Using a sample dataset of Table 3 to illustrate the implementation of Pearson correlation

coefficient of state and dwin feature.

20 (6 )−(6)(6) 84
𝐶𝐹𝑆 = = 84 = 1
20 (6)−(6)2 20(6)(6)2

The CFS application produces perfect correlation between two features.

Supervised learning
Predicting a class label using naïve Bayesian

Table 7: Sample data 2

S/N Protocol Service Flag Category

s1 Tcp http SF normal
s2 Tcp http REJ intrusion
s3 Tcp telnet REJ intrusion
s4 Udp ftp REJ normal
s5 Udp telnet SF normal
s6 Tcp ftp REJ normal
Bayesian Classifier
 Describes Instances to be classified by attribute
vectors 
x  ( x1 ,...., x n )
 Assigns to instances most probable or maximum a posterior
(MAP), classification from a finite set of c classes


cMAP  arg max cC P( x | c) P(c)
 argmaxcC means that we choose cC such that the
corresponding probability is maximal and C={c1,..cm} is the set
of class label.
Bayesian Classifier
• On the assumption of naïve Bayes classifier that the attribute vectors x1,….xn are
conditionally independent given the classification, we obtain
n
P( x1 ,..., xn | c j )   P( x
i 1
i | cj)

and using the latter expression, former equation becomes the naïve Bayes classifier

n
c  arg max P(c j )
c j C
 P( x
i 1
i |cj)

• Naïve Bayes classifier is trained by a set of labeled training data presented to it in

relational form
Bayesian Classifier
• The training data in Table 7 will be used as illustration. The class label attribute
(category) values are normal and intrusion. Let C1 correspond to category = normal and
C2 to category = intrusion. Assuming we wish to classify the tuple

X = (Protocol = Tcp, Service = ftp, Flag = SF).

• There is need to maximize P(X|Ci)P(Ci) for I =1,2. The prior probability of each class
P(i) can be computed based on the training tuples.

P(Category = normal) = 4/6 = 0.667

P(Category = intrusion) = 2/6 = 0.333

Bayesian Classifier
• To compute P(X|Ci) for i = 1,2, the following conditional probabilities are computed

• P(Protocol = tcp | Category = normal) = 0.5

• P(Protocol = tcp | Category = intrusion) = 0.997

• P(Protocol = udp | Category = normal) = 0.5

• P(Protocol = udp | Category = intrusion) = 0.003

• P(Service = http | Category = normal) = 0.250

• P(Service = http | Category = intrusion) = 0.499

• P(Service = telnet | Category = normal) = 0.250

Bayesian Classifier
P(Service = telnet | Category = intrusion) = 0.499

P(Service = ftp | Category = normal) = 0.499

P(Service = ftp | Category = intrusion) = 0.003

P(Flag = SF | Category = normal) = 0.50

P(Flag = SF | Category = intrusion) = 0.003

P(Flag = REJ| Category = normal) = 0.50

P(Flag = REJ | Category = intrusion) = 0.997

It should be noted here that Laplace adjustment are used in the computation of conditional

probabilities and ni(xk) was set to 0.0001.

= 0.5 x 0.499 x 0.5 = 0.125

Bayesian Classifier
Using the above probabilities, we obtain

P( X | Category  normal )  P( Protocol  Tcp | Category  normal ) x

P(Service  ftp | Category  normal ) xP( Flag  SF | Category  normal )

= 0.5 x 0.499 x 0.5 = 0.125

Similarly,

6
P( X | Category  int rusion)  0.997 x0.003x0.003  8.773 x10

To find the class Ci, that maximizes P(X|Ci)P(Ci), we compute

P( X | Category  normal ) P(Category  normal )  0.125 x0.667  0.083

P( X | Category  int rusion) P(Category  int rusion)  8.773x106 x0.333  2.821x106

Therefore, the naïve Bayesian Classifier predicts Category = normal for tuple X.
Rough Set-Based Approach

• Rough set theory (RST) is a useful mathematical tool to deal with imprecise and
insufficient knowledge, find hidden patterns in data, and reduce dataset size.

• Rough set is used for evaluation of significance of data and easy interpretation of results.

• RST contributes immensely to the concept of reducts. Reducts is the minimal subsets of
attributes with the most predictive outcome.

• RST are very effective in removing redundant features from discrete data sets.

• Rough set generates explainable rules

RS -Theoretical Background
Central to RS is its indiscernibility. An indiscernibility relation

occurs between two elements, when they cannot be differentiated.

Let S  U , A,V , f be an information system, where U is a universe

containing a finite set of N objects {x1 , x2 ,....xN } .

A is a non-empty finite set of attributes used in description of

objects.
RS -Theoretical Background

V describes values of all attributes, that is, V  aA

Va where Va forms a

set of values of the a-th attribute. f : UxA  V is the total decision

function such that f ( x, a) Va for every a  A and x U .

Information system is referred to as decision table (DT) if the attributes

in S is divided into two disjoint sets called condition (C) and decision
attributes (D) where A = C  D and C  D = .
RS -Theoretical Background
DT  U , C  D,V , f (10)

A subset of attributes B A defines an equivalent relation on U, denoted

as IND(B).

IND( B)  {( x, y) UxU | f ( x, b)  f ( y, b)b  B} . (11)

The equivalent classes of B-indiscernibility relation are denoted [x]B.

[ x]B  { y U | ( x, y)  IND( B)} (12)

RS -Theoretical Background

Indiscernibility (IND) relation as defined from Table 7. The non-empty

subsets of the conditional attributes are {Protocol}, {Service}, {Flag},

{Protocol, Service}, {Protocol, flag}, {Service, Flag}, and {Protocol,

Service, Flag}.

IND ({Protocol}) = {{s1,s2, s3, s6},{s4,s5}}

IND ({Service}) = {{s1,s2},{s3,s5},{s4,s6}}
IND ({Flag}) = {{s1,s5},{s2,s3,s4,s6}}
RS -Theoretical Background
IND({Protocol, Service}) = {{s1,s2},{s3},{s4},{s5}.{s6}}
IND ({Protocol, Flag}) = {{s1},{s2,s3,s6},{s4},{s5}}
IND ({Service, Flag}) = {{s1}, {s2},{s3},{s4,s6},{s5}}
IND ({Protocol, Service, Flag}) = {s1},{s2},{s3},{s4},{s5},{s6}}
IND ({Category}) = {{s1,s4, s5, s6},{s2,s3}}

Given B  A and X  U . X can be approximated using only the information

contained within B by constructing the B lower and B-upper approximations of

BX  {x  X | [ x]B  X }
set X defined as: BX  {x  X | [ x]B  X  0} (13)
RS -Theoretical Background
Given attributes A = C  D and C  D = . The positive POSC(D), the negative
NEGC(D) and the boundary BNDC(D) regions for a given set of condition
attribute C in the relation to IND (D), can be defined as
POSC ( D )  CX
xD*

NEGC ( D)  U  CX (14)
xD *

BNDC ( D)  CX  CX
xD* xD*

where D* denotes the family of equivalence classes defined by the relation

IND(D). POSC(D) contains all objects of U that can be classified correctly into

the distinct classes defined by IND(D).

RS -Theoretical Background

The boundary region, BNDC(D)), is the set of objects that can possibly, but not

certainly, be classified in this way. The negative region, NEGC(D), is the set of

objects that cannot be classified to classes of U/D.

The concept of indiscernibility relation is a means of generating rules.

For example, the IND sets (s4,s6) and (s1,s5) for attributes Service and

Flag respectively, the following rules are generated:

Rule 1. (Service, ftp)  (Category, normal)

Rule 2. (Flag, SF)  (Category, normal)

The number of consistent rules in a DT can be used as consistency

factor of the DT and is denoted by  (C , D) , where C and D are condition
and decision attributes respectively.
RS -Theoretical Background

In Table 7,  (C , D)  6 6  1 shows that the DT is consistent.

Decision rules are then presented in form of “if …then…” rules. For example

rule 1 in Table 1 can be presented as follows

If (Service, http) then (Category, normal)

A set of decision rules is called a decision algorithm.

Dependency of Attributes

In Table 7, for instance there are no total dependencies on any of the attributes.

 (Pr otocol , Category )  62

 ( Service, Category)  62
 ( Flag , Category )  62
 ({Pr otocol , Service}, Category)  64
 ({Pr otocol , Flag}, Category)  63
 ({Service, Flag}, Category)  66
 ({Pr otocol , Service, Flag}, Category)  66

The degree of dependency of Table 7 shows that the data is consistent.

Nearest Neighbour Classifiers
• kNN, a supervised learning algorithm where the result of new
instance query is classified based on majority of k-nearest
neighbour category.

• Closeness is defined in terms of a distance metric, such as

Euclidean distance. The Euclidean distance between two points
or tuples, say X1= (x11, x12,…, x1n) and X2= (x21, x22,…, x2n), is

n
dist ( X 1 , X 2 )   (x
i 1
1i  x2i ) 2
Given this example to classify

(Protocol = Tcp, Service = ftp, Flag = SF, Gategory = ?) based on Table 7.

Highlighted below are the steps involved in the computation of kNN algorithm

- Determine parameter k = number of nearest neighbour

-Calculate the distance between the query-instance and all the training tuples.

- Sort the distance and determine nearest neighbour based on the kth

minimum distance

- Gather the category of the nearest neighbour

- use simple majority of the category of nearest neighbours as the prediction

- determine parameter k = number of nearest neighbour. Suppose use k = 3

- calculate the distance between the query instance and all the training samples based

on Euclidean distance

Table 8: Distance between query instance and all the training samples

S/N Protocol Service Flag Category Euclidean Distance to

query instance
S1 Tcp http SF Normal 02  12  02  1  1
S2 Tcp http REJ Intrusion 02  12  12  2  1.414
S3 Tcp telnet REJ Intrusion 02  12  12  2  1.414
S4 Udp ftp REJ normal 12  02  12  2  1.414
S5 Udp telnet SF normal 12  12  02  2  1.414
S6 Tcp ftp REJ normal 02  02  12  1  1

- Sort the distance and determine nearest neighbours based on the k-th minimum
distance.
Table 9: Sorted Distance based on k-th minimum distance

S/N Protocol Service Flag Category Euclidean Distance

to query instance
S1 Tcp http SF Normal 0 1  0  1  1
2 2 2

S6 Tcp ftp REJ normal 0  0 1  1  1

2 2 2

S2 Tcp http REJ Intrusion 0  1  1  2  1.414

2 2 2

Gather the category of the nearest neighbour. Note that Table 9 shows only the first 3

sorted records

Use simple majority of the category of nearest neighbours as the prediction value of the

query instance. There are 2 normal and 1 intrusion, since 2 >1, then the new sample

(Protocol = Tcp, Service = ftp, Flag = SF) is included in normal category.

kNN algorithm is easy to implement but computationally intensive, especially when

given a large training sets.
Clustering Techniques
• Clustering is the process of grouping a set of
physical or abstract objects into classes of similar
objects.

• Objects clustering embraces various scientific

disciplines from mathematics and statistics to
biology and genetics.
•
• Clustering techniques used k-means,fuzzy c-
means,fuzzy rough c-means
K-means Clustering Techniques

k-means clustering (X,K)

X, instances set X = {x1, x2, ….xn}
K, Number of clusters
Mk, K clusters centres or means
Until there are no changes in K cluster
centres
determine the centroid coordinate
determine the distance of each object to
the centroids
group the object based on minimum
distance
end until
Table 10: a sample relational data

Objects/Attributes X Y
A1 1 1
A2 2 1
A3 4 3
A4 5 4

Initial value of centroids. Objects A1 and A2 are chosen as the first c1 and c2 denoting the

coordinate of the centroids, then c1 = (1,1) and c2 = (2,1).

Compute Objects-Centroids distance using the Euclidean distance to obtain the distance

matrix
0 1 3.61 5  c1  (1,1) group1
D 
0

1 0 2.83 4.24 c2  (2,1) group2

Each column in the distance matrix symbolizes the object. The first row

of the distance corresponds to the distance of each object to the first

centroid and the second row in each object to the second centroid.

Object Clustering – Each object is assigned based on the minimum

distance. Thus object A1 is assigned to group1, objects A2,A3 and A4
are assigned to group2. The element of group matrix (G0) is 1 if and
only if the object is assigned to that group.
1 0 0 0 group1
G 
0

0 1 1 1 group2

-. Iteration-1, determine centroids: knowing the number of each group. New

centroids are computed based on the new membership in each group.

Group 1 only has one member, the centroid remains in c1 = (1,1). Group2

has three members, thus the centroid becomes the average coordinate

among the three members.

c2   2 45 13 4
3 , 3  11 8
,
3 3
Iteration-1, Objects-centroid distances. Compute the distance of the new

centroids again to obtain the distance matrix D1

0 1 3.61 5  c1  (1,1) group1

D 1

3.14 2.36 0.47 1.89 c2  (113 , 83 ) group2

Iteration-1, objects clustering similar to step 3. Assign object based on the

minimum distance. Based on the new distance matrix (D1), object A2 is moved

1 1 0 0 group1
G 
1
to Group1.
0 0 1 1 group2
Iteration 2, determine centroids: (repeat step 4 to compute new centroid based

on the clustering of previous iteration). New centroids are

c1   122 , 121   (1 12 ,1) and c2   425 , 324   (4 1 2 ,3 1 2)

Iteration 2, objects centroids distance. Repeat step 2 again, we have new

distance D2

0.5 0.5 3.20 4.61 c1  (1 12 ,1) group1

D 
1

4.30 3.54 0.71 0.71 c2  (4 12 ,3 12 ) group2

Iteration -2, Objects clustering: Again we assign each object based on the

minimum distance

1 1 0 0 group1
G 2

0 0 1 1 group2

We obtain G2 = G1. Comparing the grouping of last iteration and this iteration,

we discovered that no object is moving from or to any group. Then the

computation of the k-means clustering has reached its stability and no more

iteration is needed.
Fuzzy C-means Algorithm
Step1 Input fuzzy coefficient m, , number of clusters, Randomly initialize the
membership matrix (U) that her condition in equation 5.9.
Step 2: Compute centroid based on equation N
m
u ij .xi
cj  i 1
N

u
i 1
m
ij

Step 3: compute N c
J m   u
2
m
ij xi  c j
i 1 j 1

Step 4: If || J(k+1) – J(k)|| <  then stop otherwise return to step 5.

Step 5: Compute uij based on equation 5.12 and go to step 2
1
uij  2 , i  1, 2,..., c; k  1, 2,..., N
c  xi  c j  m1

 
 xi  ck


k 1
 
The Ensemble Classifier
Ensemble classifier use a combination of a set of models or classifiers, each of which solves

the same original task in order to obtain a better composite global classifier with more

accurate and reliable estimates or decisions than using a single classifier (Ali and Pazzani,

1996).

Unlabele
M1 d tuples

Training :
: Bagging, Predicted
Data set labels
Boosting,
: etc
:
Mk

Figure 1: Multiple classifiers used in increasing model accuracy.

Conclusion
Machine Learning has been used to achieve to Taxonomies of AI problems as spelt out in
The National Science and Technology Council Committee on Technology (2016)

(1) systems that think like humans (e.g., cognitive architectures and neural networks);

(2) systems that act like humans (e.g., pass the Turing test via natural language processing;
knowledge representation, automated reasoning, and learning);

(3) systems that think rationally (e.g., logic solvers, inference, and optimization);
and

(4) systems that act rationally (e.g, intelligent software agents and embodied robots that
achieve goals via perception, planning, reasoning, learning, communicating, decision-
making, and acting).
References
• Adetunmbi, A.O., Falaki, S.O., Adewale, O.S. and Alese, B.K. (2007) A
Rough Set Approach for Detecting known and novel Network Intrusion,
Second International Conference on Application of Information and
Communication Technologies to Teaching, Research and Administrations
(AICTTRA, 2007), Ife, pp. 190 – 200.
• Adetunmbi A.O. (2008) Intrusion Detection based on Machine Learning
Techniques, PhD Thesis, Federal University of Technology, Akure.
• Diksha S. (2018) Decision Making: Meaning Process and Factors
http://www.businessmanagementideas.com/decision-making/decision-
making-meaning-process-and-factors/3422 retrieved 8th Dec., 2018.
• Pawlak, Z. (1991) Rough Sets: Theoretical Aspects of Reasoning About
Data. Kluwer Academic Publishing, Dordreiht
• Elston S. and Rudin C. (2017) Data Science and Machine Learning
Essentials Video, Coursera
References
• Ayogu, I.I. (2008). Development of a Machine Translation System for
English, Igbo and Yoruba Languages,t A Ph.D Thesis submitted to the
Department of Computer Science, Federal University of Technology,
Akure.
• National Science and Technology Council Committee on Technology,
(2016). Preparing for the future of Artificial Intelligence
Thank you for Listening

Edexcel Gcse 9 1 Business Revision Guide
50% (2)
Edexcel Gcse 9 1 Business Revision Guide
23 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
TE550 Blending Controller: Installation, Use and Service Manual
75% (4)
TE550 Blending Controller: Installation, Use and Service Manual
89 pages
A Passive Localization Algorithm and Its Accuracy
No ratings yet
A Passive Localization Algorithm and Its Accuracy
12 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Cse3001 Ai Ml m2
No ratings yet
Cse3001 Ai Ml m2
118 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
No ratings yet
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
53 pages
MLE
No ratings yet
MLE
15 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
ML unit 3
No ratings yet
ML unit 3
17 pages
Unit 2
No ratings yet
Unit 2
18 pages
CSC 3301-Lecture06 Introduction To Machine Learning
No ratings yet
CSC 3301-Lecture06 Introduction To Machine Learning
56 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Data
No ratings yet
Data
36 pages
ML_DA
No ratings yet
ML_DA
55 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
MSDSModule 2
No ratings yet
MSDSModule 2
35 pages
SWE 227 Slide 01
No ratings yet
SWE 227 Slide 01
21 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Module 2 - Aiml Part 1
No ratings yet
Module 2 - Aiml Part 1
37 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Module 2 - PART 1
No ratings yet
Module 2 - PART 1
50 pages
Week 12 Intro to DS and ML
No ratings yet
Week 12 Intro to DS and ML
67 pages
Chapter 2 Data Preprocessing
No ratings yet
Chapter 2 Data Preprocessing
23 pages
Data Prep and Cleaning For Machine Learning
No ratings yet
Data Prep and Cleaning For Machine Learning
22 pages
Data Science Project - An Inductive Learning Approach, Verri
No ratings yet
Data Science Project - An Inductive Learning Approach, Verri
238 pages
Lecture-2-20022025-092902am
No ratings yet
Lecture-2-20022025-092902am
87 pages
What Is Data Science? Probability Overview Descriptive Statistics
No ratings yet
What Is Data Science? Probability Overview Descriptive Statistics
10 pages
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
DS Module2 L3 L13
No ratings yet
DS Module2 L3 L13
43 pages
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
No ratings yet
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
22 pages
Data Science Cheat Sheet
No ratings yet
Data Science Cheat Sheet
10 pages
Final ML
No ratings yet
Final ML
2 pages
ML-chap-2
No ratings yet
ML-chap-2
60 pages
Lecture5
No ratings yet
Lecture5
26 pages
Statistics for Data Science
No ratings yet
Statistics for Data Science
39 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Data Preprocessing
No ratings yet
Data Preprocessing
9 pages
6 Data Preprocessing
No ratings yet
6 Data Preprocessing
37 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
18ai61-Model Question Paper Solutions
No ratings yet
18ai61-Model Question Paper Solutions
71 pages
Presentation-2 Data Pre-Processing in Machine Learning
No ratings yet
Presentation-2 Data Pre-Processing in Machine Learning
11 pages
Unit4_PPT (2)
No ratings yet
Unit4_PPT (2)
126 pages
ML Checklist PDF
No ratings yet
ML Checklist PDF
4 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
APS1070 Lecture (3) Slides
No ratings yet
APS1070 Lecture (3) Slides
70 pages
Session-2-CO3-Introduction to Data Preprocessing (1)
No ratings yet
Session-2-CO3-Introduction to Data Preprocessing (1)
39 pages
Machine Learning - Unit - 1
100% (1)
Machine Learning - Unit - 1
58 pages
Data Science Master
No ratings yet
Data Science Master
11 pages
Most Compact and Complete Data Science Cheat Sheet 1672981093
No ratings yet
Most Compact and Complete Data Science Cheat Sheet 1672981093
10 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
1635838720082
No ratings yet
1635838720082
35 pages
Air quality prediction using machine learning
No ratings yet
Air quality prediction using machine learning
29 pages
Data Science and Machine Learning Syllabus V1.0
No ratings yet
Data Science and Machine Learning Syllabus V1.0
6 pages
Data Science Checklist
No ratings yet
Data Science Checklist
22 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Job Description
No ratings yet
Job Description
2 pages
A Call To Revival - The Church of God International
No ratings yet
A Call To Revival - The Church of God International
10 pages
6 Steps To Build A Strong Team
No ratings yet
6 Steps To Build A Strong Team
9 pages
How To Set Goals As A Team - Steps and Examples
No ratings yet
How To Set Goals As A Team - Steps and Examples
10 pages
Algoritmiek Scheduling To Minimize Maximum Lateness
No ratings yet
Algoritmiek Scheduling To Minimize Maximum Lateness
32 pages
25th CSG Update
No ratings yet
25th CSG Update
4 pages
10 Practical Ways To Grow Your Worship Team (That Anyone Can Do) - Collected
No ratings yet
10 Practical Ways To Grow Your Worship Team (That Anyone Can Do) - Collected
6 pages
The Aim of The Dataset - 040835
No ratings yet
The Aim of The Dataset - 040835
4 pages
Group 16 Project
No ratings yet
Group 16 Project
18 pages
Lecture 6
No ratings yet
Lecture 6
18 pages
Setup and Hold Checks Between Fast and Slow Clock Domains
No ratings yet
Setup and Hold Checks Between Fast and Slow Clock Domains
3 pages
XLookUp - 5 Examples
No ratings yet
XLookUp - 5 Examples
20 pages
Wild Edible Fruits of Himachal Pradesh
50% (2)
Wild Edible Fruits of Himachal Pradesh
82 pages
Data - 2023 - 2024
No ratings yet
Data - 2023 - 2024
837 pages
PerDev - Q1 - Module 5 - Powers of The Mind
100% (1)
PerDev - Q1 - Module 5 - Powers of The Mind
31 pages
Fine Cooking PDF
100% (1)
Fine Cooking PDF
18 pages
Dukane MCS350 Intercom Systems
No ratings yet
Dukane MCS350 Intercom Systems
8 pages
BPI Confirmation Form
No ratings yet
BPI Confirmation Form
1 page
NI CaseStudy Cs 16265
No ratings yet
NI CaseStudy Cs 16265
3 pages
Heat Exchangers
No ratings yet
Heat Exchangers
30 pages
Aix Hacmp
100% (1)
Aix Hacmp
240 pages
Chapter 3 - Stock Valuation Methods and EMH
No ratings yet
Chapter 3 - Stock Valuation Methods and EMH
41 pages
Atestat Engleză
No ratings yet
Atestat Engleză
16 pages
15312-Toshiba NAND Flash Memory Fact Sheet
No ratings yet
15312-Toshiba NAND Flash Memory Fact Sheet
6 pages
IMMI Bridging Visa Grant Notification
No ratings yet
IMMI Bridging Visa Grant Notification
3 pages
Job Opportunities For Entrepreneurship As A Career
No ratings yet
Job Opportunities For Entrepreneurship As A Career
19 pages
Magnetism and Electromagnetism Theory Thesis Defense
No ratings yet
Magnetism and Electromagnetism Theory Thesis Defense
60 pages
Trainee'S Record Book: Technical Education and Skills Development Authority
No ratings yet
Trainee'S Record Book: Technical Education and Skills Development Authority
10 pages
Bank Service Quality, Customer Satisfaction and Loyalty in Ethiopian Banking Sector
No ratings yet
Bank Service Quality, Customer Satisfaction and Loyalty in Ethiopian Banking Sector
10 pages
Relativistic Quantum Mechanics
No ratings yet
Relativistic Quantum Mechanics
108 pages
The Winner's Curse in IT Outsourcing
No ratings yet
The Winner's Curse in IT Outsourcing
40 pages
Vmat Thesis
100% (1)
Vmat Thesis
4 pages
3 Section 2 Coach Tool Kit - Edinburgh
No ratings yet
3 Section 2 Coach Tool Kit - Edinburgh
10 pages
CS C-15 3 and 4
No ratings yet
CS C-15 3 and 4
144 pages
(Ebook) Financial Management for Health-System Pharmacists by Robert P. Granko ISBN 9781585287123, 1585287121 download pdf
100% (7)
(Ebook) Financial Management for Health-System Pharmacists by Robert P. Granko ISBN 9781585287123, 1585287121 download pdf
71 pages
NE7202-NIS Unit Wise Qns
No ratings yet
NE7202-NIS Unit Wise Qns
14 pages
C++ Task Vector
No ratings yet
C++ Task Vector
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.