0% found this document useful (0 votes)

12 views

Data Mining Lab Manual Student_copy_for_print

The document is a lab manual for a Data Mining and Warehousing course at Mahakal Institute of Technology, detailing various experiments using the Weka mining tool. It includes objectives, procedures, expected outcomes, and viva questions for each experiment, focusing on tasks such as creating decision trees and understanding association rules. The manual serves as a guide for students in the Computer Science and Engineering department to learn practical applications of data mining techniques.

Uploaded by

ninjanotes7

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Data Mining Lab Manual Student_copy_for_print

Uploaded by

ninjanotes7

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

MAHAKAL INSTITUTE OF TECHNOLOGY, UJJAIN

Approved By: All India Council of Technical Education (New

Delhi)

DEPARTMENT OF
COMPUTER SCIENCE AND ENGINEERING

LAB MANUAL

Name of Student : …………………..

Name of Lab : Lab Open Elective (Data Mining and

Warehousing)

Subject Code : CS- 705

Branch : Computer Science and Engineering

Year/Sem : IV/VII.

Affiliated to Rajiv Gandhi Prodyogiki Vishwavidyalaya, Bhopal (MP)

INDEX
S. No. Name of Experiment Date Sign Remark

1. To list all the categorical(or nominal) attributes and the real

valued attributes using Weka mining tool.

2. To identify the rules with some of the important attributes by a)

manually and b) Using Weka .

3. To create a Decision tree by training data set using Weka mining

tool.

4. To find the percentage of examples that are classified correctly

by using the above created decision tree model? ie.. Testing on
the training set.
5 To “Is testing a good idea”.

6 To create a Decision tree by cross validation training data set

using Weka mining tool.

7 Delete one attribute from GUI Explorer and see the effect using
Weka mining tool.
List of Experiments
Lab Elective V (Data Mining and Warehousing)
(CS- 703 B)

S. No. Name of Experiment

1. To list all the categorical(or nominal) attributes and the real valued attributes using
Weka mining tool.

2. To identify the rules with some of the important attributes by a) manually and b)
Using Weka .

3. To create a Decision tree by training data set using Weka mining tool.

4. To find the percentage of examples that are classified correctly by using the above
created decision tree model? ie.. Testing on the training set.

5 To “Is testing a good idea”.

6 To create a Decision tree by cross validation training data set using Weka mining
tool.

7 Delete one attribute from GUI Explorer and see the effect using Weka mining tool.
EXPERIMENT-1

OBJECTIVE:
To list all the categorical(or nominal) attributes and the real valued attributes using Weka mining
tool.

PROCEDURE:

INPUT SET:

OUTPUT SET:
EXPECTED VIVA QUESTIONS:
1. What is data warehouse?
2. What is the benefits of data warehouse?
3. What is Fact?

NAME OF FACULTY:
SIGNATURE:
DATE:
EXPERIMENT-2
OBJECTIVE:
To identify the rules with some of the important attributes by a) manually and b) Using Weka .

THEORY:
Association rule mining is defined as: Let be a set of n binary attributes called items. Let be a set of
transactions called the database. Each transaction in D has a unique transaction ID and contains a
subset of the items in I. A rule is defined as an implication of the form X=>Y where X,Y C I and X
Π Y=Φ . The sets of items (for short itemsets) X and Y are called antecedent (left hand side or
LHS) and consequent (righthandside or RHS) of the rule respectively.
To illustrate the concepts, we use a small example from the supermarket domain.
The set of items is I = {milk,bread,butter,beer} and a small database containing the items (1 codes
presence and 0 absence of an item in a transaction) is shown in the table to the right. An example
rule for the supermarket could be meaning that if milk and bread is bought, customers also buy
butter.
Note: this example is extremely small. In practical applications, a rule needs a support of several
hundred transactions before it can be considered statistically significant, and datasets often contain
thousands or millions of transactions.
To select interesting rules from the set of all possible rules, constraints on various measures of
significance and interest can be used. The best known constraints are minimum thresholds on
support and confidence. The support supp(X) of an itemset X is defined as the proportion of
transactions in the data set which contain the itemset. In the example database, the itemset
{milk,bread} has a support of 2 / 5 = 0.4 since it occurs in 40% of all transactions (2 out of 5
transactions).
The confidence of a rule is defined . For example, the rule has a confidence of 0.2 / 0.4 = 0.5 in the
database, which means that for 50% of the transactions containing milk and bread the rule is
correct. Confidence can be interpreted as an estimate of the probability P(Y | X), the probability of
finding the RHS of the rule in transactions under the condition that these transactions also contain
the LHS .

ALGORITHM:
Association rule mining is to find out association rules that satisfy the predefined minimum support
and confidence from a given database. The problem is usually decomposed into two subproblems.
One is to find those itemsets whose occurrences exceed a predefined threshold in the database;
those itemsets are called frequent or large itemsets. The second problem is to generate association
rules from those large itemsets with the constraints of minimal confidence.
Suppose one of the large itemsets is Lk, Lk = {I1, I2, … , Ik}, association rules with this itemsets
are generated in the following way: the first rule is {I1, I2, … , Ik1} and {Ik}, by checking the
confidence this rule can be determined as interesting or not. Then other rule are generated by
deleting the last items in the antecedent and inserting it to the consequent, further the confidences of
the new rules are checked to determine the interestingness of them. Those processes iterated until
the antecedent becomes empty. Since the second subproblem is quite straight forward, most of the
researches focus on the first subproblem. The Apriori algorithm finds the frequent sets L In
Database D.
· Find frequent set Lk − 1.
· Join Step.
o Ck is generated by joining Lk − 1with itself
· Prune Step.
o Any (k − 1) itemset that is not frequent cannot be a subset of
a frequent k itemset, hence should be removed.
Where · (Ck: Candidate itemset of size k)
· (Lk: frequent itemset of size k)

Apriori Pseudocode
Apriori (T,£)
L<{ Large 1itemsets that appear in more than transactions }
K<2
while L(k1)≠ Φ
C(k)<Generate( Lk − 1)
for transactions t € T
C(t)Subset(Ck,t)
for candidates c € C(t)
count[c]<count[ c]+1
L(k)<{ c € C(k)| count[c] ≥
£
K<K+ 1
return Ụ L(k) k

PROCEDURE:

INPUT SET:

OUTPUT SET:
EXPECTED VIVA QUESTIONS:
1. What is support and confidence?
2. What is association rule?
3. How apriori algorithm is used?

NAME OF FACULTY:
SIGNATURE:
DATE:
EXPERIMENT-3
OBJECTIVE:
To create a Decision tree by training data set using Weka mining tool.

THEORY:
Classification is a data mining function that assigns items in a collection to target categories or
classes. The goal of classification is to accurately predict the target class for each case in the data.
For example, a classification model could be used to identify loan applicants as low, medium, or
high credit risks.
A classification task begins with a data set in which the class assignments are known. For example,
a classification model that predicts credit risk could be developed based on observed data for many
loan applicants over a period of time.
In addition to the historical credit rating, the data might track employment history, home ownership
or rental, years of residence, number and type of investments, and so on. Credit rating would be the
target, the other attributes would be the predictors, and the data for each customer would constitute
a case.
Classifications are discrete and do not imply order. Continuous, floating point values would
indicate a numerical, rather than a categorical, target. A predictive model with a numerical target
uses a regression algorithm, not a classification algorithm.
The simplest type of classification problem is binary classification. In binary classification, the
target attribute has only two possible values: for example, high credit rating or low credit rating.
Multiclass targets have more than two values: for example, low, medium, high, or unknown credit
rating.
In the model build (training) process, a classification algorithm finds relationships between the
values of the predictors and the values of the target. Different classification algorithms use different
techniques for finding relationships. These relationships are summarized in a model, which can then
be applied to a different data set in which the class assignments are unknown.
Classification models are tested by comparing the predicted values to known target values in a set
of test data. The historical data for a classification project is typically divided into two data sets: one
for building the model; the other for testing the model.
Scoring a classification model results in class assignments and probabilities for each case. For
example, a model that classifies customers as low, medium, or high value would also predict the
probability of each classification for each customer.
Classification has many applications in customer segmentation, business modeling, marketing,
credit analysis, and biomedical and drug response modeling.

PROCEDURE:

INPUT SET:

OUTPUT SET:
The decision tree constructed by using the implemented C4.5 algorithm

EXPECTED VIVA QUESTIONS:

1. What is decision tree?
2. What is classification?
3. How decision tree helps in data classification?

NAME OF FACULTY:
SIGNATURE:
DATE:
EXPERIMENT-4

OBJECTIVE:
To find the percentage of examples that are classified correctly by using the above created decision
tree model? ie.. Testing on the training set.

THEORY:
Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is
unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered
to be an apple if it is red, round, and about 4" in diameter. Even though these features depend on the
existence of the other features, a naive Bayes classifier considers all of these properties to
independently contribute to the probability that this fruit is an apple.
An advantage of the naive Bayes classifier is that it requires a small amount of training data to
estimate the parameters (means and variances of the variables) necessary for classification. Because
independent variables are assumed, only the variances of the variables for each class need to be
determined and not the entirecovariance matrix The naive Bayes probabilistic model :
The probability model for a classifier is a conditional model P(C|F1..................Fn) over a dependent
class variable C with a small number of outcomes or classes, conditional on several feature
variables F1 through Fn. The problem is that if the number of features n is large or when a feature
can take on a large number of values, then basing such a model on probability tables is infeasible.
We therefore reformulate the model to make it more tractable.
Using Bayes' theorem, we write P(C|F1...............Fn)=[{p(C)p(F1..................Fn|C)}/p(F1,. Fn)]

In plain English the above equation can be written

as Posterior= [(prior *likehood)/evidence]
In practice we are only interested in the numerator of that fraction, since the denominator does not
depend on C and the values of the features Fi are given, so that the denominator is effectively
constant. The numerator is equivalent to the joint probability model p(C,F1. Fn) which can be
rewritten as follows, using repeated applications of the definition of conditional probability:
p(C,F1........Fn) =p(C) p(F1............Fn|C) =p(C)p(F1|C) p(F2........Fn|C,F1,F2)
=p(C)p(F1|C) p(F2|C,F1)p(F3........Fn|C,F1,F2)
= p(C)p(F1|C) p(F2|C,F1)p(F3.........Fn|C,F1,F2)......p(Fn|C,F1,F2,F3........Fn1)
Now the "naive" conditional independence assumptions come into play: assume that each feature Fi
is conditionally independent of every other feature Fj for j≠i .
This means that p(Fi|C,Fj)=p(Fi|C)
and so the joint model can be expressed as p(C,F1,.......Fn)=p(C)p(F1|C)p(F2|C)...........
=p(C)π p(Fi|C)
This means that under the above independence assumptions, the conditional distribution over the
class variable C can be expressed like this:
p(C|F1.........Fn)= p(C) πp(Fi|C)
Z
where Z is a scaling factor dependent only on F1. Fn, i.e., a constant if the values of the feature
variables are known.
Models of this form are much more manageable, since they factor into a so called class prior p(C)
and independent probability distributions p(Fi|C). If there are k classes and if a model for eachp(Fi|
C=c) can be expressed in terms of r parameters, then the corresponding naive Bayes model has (k −
1) + n r k parameters. In practice, often k = 2 (binary classification) and r = 1 (Bernoulli variables as
features) are common, and so the total number of parameters of the naive Bayes model is 2n + 1,
where n is the number of binary features used for prediction
P(h/D)= P(D/h) P(h) P(D)
• P(h) : Prior probability of hypothesis h
• P(D) : Prior probability of training data D
• P(h/D) : Probability of h given D
• P(D/h) : Probability of D given h
Naïve Bayes Classifier :
Derivation
• D : Set of tuples
– Each Tuple is an ‘n’ dimensional attribute vector
– X : (x1,x2,x3,…. xn)
• Let there me ‘m’ Classes : C1,C2,C3…Cm
• NB classifier predicts X belongs to Class Ci iff
– P (Ci/X) > P(Cj/X) for 1<= j <= m , j <> i
• Maximum Posteriori Hypothesis
– P(Ci/X) = P(X/Ci) P(Ci) / P(X)
– Maximize P(X/Ci) P(Ci) as P(X) is constant
Naïve Bayes Classifier : Derivation
• With many attributes, it is computationally expensive to evaluate P(X/Ci)
• Naïve Assumption of “class conditional independence”
• P(X/Ci) = n P( xk/ Ci)
k=1
• P(X/Ci) = P(x1/Ci) * P(x2/Ci) *…* P(xn/ Ci)

PROCEDURE:

INPUT SET:

OUTPUT SET:
=== Evaluation on training set ===
=== Summary ===

Correctly Classified Instances 554 92.3333 %

Incorrectly Classified Instances 46 7.6667 %
Kappa statistic 0.845
Mean absolute error 0.1389
Root mean squared error 0.2636
Relative absolute error 27.9979 %
Root relative squared error 52.9137 %
Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0.894 0.052 0.935 0.894 0.914 0.936 YES
0.948 0.106 0.914 0.948 0.931 0.936 NO
Weighted Avg. 0.923 0.081 0.924 0.923 0.923 0.936

=== Confusion Matrix ===

a b <-- classified as
245 29 | a = YES
17 309 | b = NO

EXPECTED VIVA QUESTIONS:

1. What is naïve bayes classifier?
2. What are the main advantages of bayes classifier?

NAME OF FACULTY:
SIGNATURE:
DATE:
EXPERIMENT-5
OBJECTIVE:
To Is testing a good idea”.

PROCEDURE:
INPUT SET
OUTPUT SET:
This can be experienced by the different problem solutions while doing practice.

The important numbers to focus on here are the numbers next to the "Correctly Classified
Instances" (92.3 percent) and the "Incorrectly Classified Instances" (7.6 percent). Other important
numbers are in the "ROC Area" column, in the first row (the 0.936); Finally, in the "Confusion
Matrix," it shows the number of false positives and false negatives. The false positives are 29, and
the false negatives are 17 in this matrix. Based on our accuracy rate of 92.3 percent, we say that
upon initial analysis, this is a good model.One final step to validating our classification tree, which
is to run our test set through the model and ensure that accuracy of the model

Comparing the "Correctly Classified Instances" from this test set with the "Correctly Classified Instances"
from the training set, we see the accuracy of the model , which indicates that the model will not break down
with unknown data, or when future data is applied to it.

EXPECTED VIVA QUESTIONS:

1. What is classification accuracy?

NAME OF FACULTY:
SIGNATURE:
DATE:
EXPERIMENT-6
OBJECTIVE:
To create a Decision tree by cross validation training data set using Weka mining tool.

THEORY:
Decision tree learning, used in data mining and machine learning, uses a decision tree as a
predictive model which maps observations about an item to conclusions about the item's target
value In these tree structures, leaves represent classifications and branches represent conjunctions of
features that lead to those classifications. In decision analysis, a decision tree can be used to
visually and explicitly represent decisions and decision making. In data mining, a decision tree
describes data but not decisions; rather the resulting classification tree can be an input for decision
making. This page deals with decision trees in data mining.
Decision tree learning is a common method used in data mining. The goal is to create a model that
predicts the value of a target variable based on several input variables. Each interior node
corresponds to one of the input variables; there are edges to children for each of the possible values
of that input variable. Each leaf represents a value of the target variable given the values of the
input variables represented by the path from the root to the leaf.
A tree can be "learned" by splitting the source set into subsets based on an attribute value test. This
process is repeated on each derived subset in a recursive manner called recursive partitioning. The
recursion is completed when the subset at a node all has the same value of the target variable, or
when splitting no longer adds value to the predictions.
In data mining, trees can be described also as the combination of mathematical and computational
techniques to aid the description, categorization and generalization of a given set of data.
Data comes in records of the form:
(x, y) = (x1, x2, x3..., xk, y)
The dependent variable, Y, is the target variable that we are trying to understand, classify or
generalize. The vector x is comprised of the input variables, x1, x2, x3 etc., that are used for that
task.
PROCEDURE:

INPUT SET:
OUTPUT SET:

=== Stratified cross-validation

===
=== Summary ===

Correctly Classified Instances 539 89.8333 %

Incorrectly Classified Instances 61 10.1667 %
Kappa statistic 0.7942
Mean absolute error 0.167
Root mean squared error 0.305
Relative absolute error 33.6511 %
Root relative squared error 61.2344 %
Total Number of Instances 600

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.861 0.071 0.911 0.861 0.886 0.883 YES
0.929 0.139 0.889 0.929 0.909 0.883 NO
Weighted Avg. 0.898 0.108 0.899 0.898 0.898 0.883

=== Confusion Matrix ===

a b <-- classified as
236 38 | a = YES
23 303 | b = NO

EXPECTED VIVA QUESTIONS:

1. What is training data set?
2. What is testing data set?

NAME OF FACULTY:
SIGNATURE:
DATE:
EXPERIMENT-7

OBJECTIVE:
Delete one attribute from GUI Explorer and see the effect using Weka mining tool.

PROCEDURE:
INPUT SET:
OUTPUT SET
EXPECTED VIVA QUESTIONS:
1. What is nominal and numeric attributes?
2. Which type of test are performed using weka tool?

NAME OF FACULTY:
SIGNATURE:
DATE:

Tiktok Growth PDF
100% (1)
Tiktok Growth PDF
9 pages
Programming With OpenFOAM
No ratings yet
Programming With OpenFOAM
27 pages
Data Mining - Classification Using Frequent Pattern
No ratings yet
Data Mining - Classification Using Frequent Pattern
8 pages
Data Mining Lab Manual
100% (1)
Data Mining Lab Manual
41 pages
DBMS
No ratings yet
DBMS
51 pages
Tutorial
No ratings yet
Tutorial
52 pages
DMBI Sample Questions
No ratings yet
DMBI Sample Questions
7 pages
Data Mining: Budi Santosa, PHD 2008 Lab Komputasi Dan Optimasi Industri Teknik Industri Its
No ratings yet
Data Mining: Budi Santosa, PHD 2008 Lab Komputasi Dan Optimasi Industri Teknik Industri Its
42 pages
DWDM Unit Wise Question Bank
No ratings yet
DWDM Unit Wise Question Bank
8 pages
ATC - Lecture - Notes - Data Mining Techniques - 2021
No ratings yet
ATC - Lecture - Notes - Data Mining Techniques - 2021
77 pages
dwm exp5 a49
No ratings yet
dwm exp5 a49
8 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
34 pages
DMML Unit 2
No ratings yet
DMML Unit 2
64 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Top 9 Data Science Algorithms
No ratings yet
Top 9 Data Science Algorithms
152 pages
Datamining Fifth Lecture
No ratings yet
Datamining Fifth Lecture
65 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Latest Data Mining Lab Manual
No ratings yet
Latest Data Mining Lab Manual
74 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
Intro To Ai ML
No ratings yet
Intro To Ai ML
21 pages
Chapter 4
No ratings yet
Chapter 4
9 pages
BML answer key
No ratings yet
BML answer key
21 pages
Unit III Data Mining Techniques
No ratings yet
Unit III Data Mining Techniques
17 pages
DA_LabFile
No ratings yet
DA_LabFile
63 pages
APznzaYKXa5YwGceeu2-5Hb2cWsN90NIV1g8I9DxBLLoKwuE7P4qjOfEGWd6pCzfmwSqKnWNBm5euXlo07JZKRKi-UcpBSTEjg7UTMzxCaVnPn0Jb2VsTE_sqVGq7R0pvAGyLrtvL4jK7B1dY1fgM9rEecJTtpRn5WSkJB__vFz_Re2xK6z3uN9DfvIaFgXRVYH8z-mJcY-z6Q8hhRFSOd
No ratings yet
APznzaYKXa5YwGceeu2-5Hb2cWsN90NIV1g8I9DxBLLoKwuE7P4qjOfEGWd6pCzfmwSqKnWNBm5euXlo07JZKRKi-UcpBSTEjg7UTMzxCaVnPn0Jb2VsTE_sqVGq7R0pvAGyLrtvL4jK7B1dY1fgM9rEecJTtpRn5WSkJB__vFz_Re2xK6z3uN9DfvIaFgXRVYH8z-mJcY-z6Q8hhRFSOd
174 pages
Feature Extraction and Reduction by using ModifiedApriori algorithm (1)
No ratings yet
Feature Extraction and Reduction by using ModifiedApriori algorithm (1)
9 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
DM Lab Manual
No ratings yet
DM Lab Manual
32 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Ex 9 DWM Aryant
No ratings yet
Ex 9 DWM Aryant
9 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
DWM Lab Manual
No ratings yet
DWM Lab Manual
92 pages
DM Guidelines 14jan2022
No ratings yet
DM Guidelines 14jan2022
5 pages
Goal: Provide An Overview of Basic
No ratings yet
Goal: Provide An Overview of Basic
82 pages
DmUnit 3
No ratings yet
DmUnit 3
42 pages
Introduction To Big Data and Data Mining
No ratings yet
Introduction To Big Data and Data Mining
130 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
28 pages
On Unit-3
No ratings yet
On Unit-3
30 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
No ratings yet
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
5 pages
DWM Unit 2
No ratings yet
DWM Unit 2
4 pages
_
No ratings yet
_
90 pages
Clustering: " Are There Clusters of Similar Cells?"
No ratings yet
Clustering: " Are There Clusters of Similar Cells?"
24 pages
Decision Tree Analysis On J48 Algorithm PDF
No ratings yet
Decision Tree Analysis On J48 Algorithm PDF
6 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
No ratings yet
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
5 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis
No ratings yet
UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis
7 pages
Data Mining in Bioinformatics
No ratings yet
Data Mining in Bioinformatics
21 pages
Association Rule Mining2
No ratings yet
Association Rule Mining2
37 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
6 الى13 داتا ماينق
No ratings yet
6 الى13 داتا ماينق
19 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
Apriori
No ratings yet
Apriori
27 pages
Data Mining and Warehousing Lab
No ratings yet
Data Mining and Warehousing Lab
4 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Cosmos: Integrating 3D Mechanical Design and Analysis With Physical Testing
No ratings yet
Cosmos: Integrating 3D Mechanical Design and Analysis With Physical Testing
16 pages
An FPGA-Based Reconfigurable CNN Accelerator For YOLO
No ratings yet
An FPGA-Based Reconfigurable CNN Accelerator For YOLO
5 pages
Open Educational Resources (OER) : Policy Perspectives and National Initiatives
No ratings yet
Open Educational Resources (OER) : Policy Perspectives and National Initiatives
8 pages
Application Software Installation Guides
No ratings yet
Application Software Installation Guides
6 pages
ADMT Solutions
No ratings yet
ADMT Solutions
18 pages
Techlord: Please Visit Exetools With Https in The Future
No ratings yet
Techlord: Please Visit Exetools With Https in The Future
7 pages
UniFi Network Application 8
No ratings yet
UniFi Network Application 8
9 pages
Uji Normalitas
No ratings yet
Uji Normalitas
12 pages
Full Download Globalization and Its Discontents Revisited Anti Globalization in The Era of Trump Joseph E. Stiglitz PDF
100% (7)
Full Download Globalization and Its Discontents Revisited Anti Globalization in The Era of Trump Joseph E. Stiglitz PDF
34 pages
Certified Linux Practitioner - CyberTalents
No ratings yet
Certified Linux Practitioner - CyberTalents
16 pages
Manuals Wisenet SSM 240529 en Console Client-Admin-V2.16.00
No ratings yet
Manuals Wisenet SSM 240529 en Console Client-Admin-V2.16.00
132 pages
1001
No ratings yet
1001
14 pages
E4825 P5P41D
No ratings yet
E4825 P5P41D
58 pages
ABAP ON HANA Syllabus
No ratings yet
ABAP ON HANA Syllabus
4 pages
Final-Exam CIVI390-2020-NO1-1
No ratings yet
Final-Exam CIVI390-2020-NO1-1
7 pages
UNIT III - OPCON - LECTURE 22 - Stability Compensation
No ratings yet
UNIT III - OPCON - LECTURE 22 - Stability Compensation
14 pages
Argumentative Paper: Artificial Intelligence: A Solution More Than A Threat
No ratings yet
Argumentative Paper: Artificial Intelligence: A Solution More Than A Threat
3 pages
Baterias: Marca Modelo $
No ratings yet
Baterias: Marca Modelo $
8 pages
What Is Linux
No ratings yet
What Is Linux
17 pages
Quiz For PMP 2
100% (1)
Quiz For PMP 2
376 pages
Pepper: Chapter 3 - Agile Software Development
No ratings yet
Pepper: Chapter 3 - Agile Software Development
27 pages
B BGP CG 74x ncs540
No ratings yet
B BGP CG 74x ncs540
186 pages
Kavitha Kanagaraj Cobol Dev TechM Chennai
No ratings yet
Kavitha Kanagaraj Cobol Dev TechM Chennai
4 pages
Computer Networks Question Bank With Answers
100% (1)
Computer Networks Question Bank With Answers
17 pages
DSBDAL_Assignment No 6
No ratings yet
DSBDAL_Assignment No 6
4 pages
Airline Project
No ratings yet
Airline Project
58 pages
Computer Systems Servicing Learning Module K To 12
No ratings yet
Computer Systems Servicing Learning Module K To 12
136 pages
Pss U2 Ef 8di: Control System PSS U2 Remote I/O System PSS U2
No ratings yet
Pss U2 Ef 8di: Control System PSS U2 Remote I/O System PSS U2
34 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Mining Lab Manual Student_copy_for_print

Uploaded by

Data Mining Lab Manual Student_copy_for_print

Uploaded by

MAHAKAL INSTITUTE OF TECHNOLOGY, UJJAIN

Approved By: All India Council of Technical Education (New

Name of Student : …………………..

Name of Lab : Lab Open Elective (Data Mining and

Subject Code : CS- 705

Branch : Computer Science and Engineering

Affiliated to Rajiv Gandhi Prodyogiki Vishwavidyalaya, Bhopal (MP)

1. To list all the categorical(or nominal) attributes and the real

2. To identify the rules with some of the important attributes by a)

3. To create a Decision tree by training data set using Weka mining

4. To find the percentage of examples that are classified correctly

6 To create a Decision tree by cross validation training data set

S. No. Name of Experiment

5 To “Is testing a good idea”.

EXPECTED VIVA QUESTIONS:

In plain English the above equation can be written

Correctly Classified Instances 554 92.3333 %

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

=== Confusion Matrix ===

EXPECTED VIVA QUESTIONS:

EXPECTED VIVA QUESTIONS:

=== Stratified cross-validation

Correctly Classified Instances 539 89.8333 %

=== Detailed Accuracy By Class ===

=== Confusion Matrix ===

EXPECTED VIVA QUESTIONS:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.