0% found this document useful (0 votes)

60 views

Data Mining Techniques Using WEKA: Vinod Gupta School of Management, Iit Kharagpur

This document discusses using the WEKA data mining tool to analyze a dataset containing information about automobiles. It describes WEKA's capabilities for preprocessing data, performing clustering, linear regression, and evaluating models. It then demonstrates clustering and linear regression on a dataset containing specifications and attributes of different autos. The dataset includes information like engine size, fuel efficiency, weight, and risk ratings for each auto.

Uploaded by

Ajay Kumar

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views

Data Mining Techniques Using WEKA: Vinod Gupta School of Management, Iit Kharagpur

Uploaded by

Ajay Kumar

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 17

Data Mining Techniques using WEKA

VINOD GUPTA SCHOOL OF MANAGEMENT, IIT KHARAGPUR

In partial fulfillment Of the requirements for the degree of

MASTER OF BUSINESS ADMINISTRATION

SUBMITTED BY: Prashant Menon 10BM60061 VGSOM, IIT KHARAGPUR

Introduction to WEKA
WEKA is a collection of open source of many data mining and machine learning algorithms, including pre-processing on data Classification: clustering association rule extraction Created by researchers at the University of Waikato in New Zealand Java based (also open source). Main features of WEK 49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 15 attribute/subset evaluators + 10 search algorithms for feature selection. 3 algorithms for finding association rules 3 graphical user interfaces The Explorer (exploratory data analysis) The Experimenter (experimental environment) The KnowledgeFlow (new process model inspired interface) Weka: Download and Installation Download Weka (the stable version) from http://www.cs.waikato.ac.nz/ml/weka/ Choose a self-extracting executable (including Java VM)

(If one is interested in modifying/extending weka there is a developer version that includes the source code) After download is completed, run the self extracting file to install Weka, and use the default set-ups.

Weka Application Interfaces Explorer preprocessing, attribute selection, learning, visualiation Experimenter testing and evaluating machine learning algorithms Knowledge Flow visual design of KDD process Explorer Simple Command-line A simple interface for typing commands Weka Functions and Tools Preprocessing Filters Attribute selection Classification/Regression Clustering Association discovery Visualization Load data file and Preprocessing
Load

data file in formats: ARFF, CSV, C4.5, binary Import from URL or SQL database (using JDBC) Preprocessing filters Adding/removing attributes Attribute value substitution Discretization Time series filters (delta, shift) Sampling, randomization Missing value management Normalization and other numeric transformations

Feature Selection
Very flexible: arbitrary combination of search and evaluation methods Search methods

best-first genetic ranking ... Evaluation measures ReliefF information gain gain ratio

Classification
Predicted target must be categorical Implemented methods decision trees(J48, etc.) and rules Nave Bayes neural networks instance-based classifiers Evaluation methods test data set crossvalidation

Clustering
Implemented methods k-Means EM Cobweb X-means FarthestFirst Clusters can be visualized and compared to true clusters (if given)

Regression
Predicted target is continuous Methods Linear regression Simple Linear Regression Neural networks Regression trees

Weka: Pros and cons

Pros Open source, Free Extensible Can be integrated into other java packages GUIs (Graphic User Interfaces)

Relatively easier to use Features Run individual experiment, or Build KDD phases Cons Lack of proper and adequate documentations Systems are updated constantly (Kitchen Sink Syndrome)

3. WEKA data formats

Data can be imported from a file in various formats: ARFF (Attribute Relation File Format) has two sections: the Header information defines attribute name, type and relations. the Data section lists the data records. CSV: Comma Separated Values (text file) C4.5: A format used by a decision induction algorithm C4.5, requires two separated files Name file: defines the names of the attributes Date file: lists the records (samples) binary Data can also be read from a URL or from an SQL database (using JDBC)

This term paper will demonstrate the following two data mining techniques using WEKA: Clustering (Simple K Means) Linear regression

Clustering
Clustering allows a user to make groups of data to determine patterns from the data. Clustering has its advantages when the data set is defined and a general pattern needs to be determined from the data. One can create a specific number of groups, depending on business needs. One defining benefit of clustering over classification is that every attribute in the data set will be used to analyze the data. A major disadvantage of using clustering is that the user is required to know ahead of time how many groups he wants to create. For a user without any real knowledge of his data, this might be difficult. It might take several steps of trial and error to determine the ideal number of groups to create. However, for the average user, clustering can be the most useful data mining method one can use. It can quickly take the entire set of data and turn it into groups, from which one can quickly make some conclusions.

Data set for WEKA This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics, (b)its assigned insurance risk rating, (c) its normalized losses in use as compared to other cars. The second rating corresponds to the degree to which the auto is more risky than its price indicates. Cars are initially assigned a risk factor symbol associated with its price. Then, if it is more risky (or less), this symbol is adjusted by moving it up (or down) the scale. A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe. The third factor is the relative average loss payment per insured vehicle year. This value is normalized for all autos within a particular size classification (two-door small, station wagons, sports/speciality, etc...), and represents the average loss per car per year.

A part of the saved arff file.

@relation autos @attribute normalized-losses real @attribute make { alfa-romero, audi, bmw, chevrolet, dodge, honda, isuzu, jaguar, mazda, mercedes-benz, mercury, mitsubishi, nissan, peugot, plymouth, porsche, renault, saab, subaru, toyota, volkswagen, volvo} @attribute fuel-type { diesel, gas} @attribute aspiration { std, turbo} @attribute num-of-doors { four, two} @attribute body-style { hardtop, wagon, sedan, hatchback, convertible} @attribute drive-wheels { 4wd, fwd, rwd}

@attribute engine-location { front, rear} @attribute wheel-base real @attribute length real @attribute width real @attribute height real @attribute curb-weight real @attribute engine-type { dohc, dohcv, l, ohc, ohcf, ohcv, rotor} @attribute num-of-cylinders { eight, five, four, six, three, twelve, two} @attribute engine-size real @attribute fuel-system { 1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi} @attribute bore real @attribute stroke real @attribute compression-ratio real @attribute horsepower real @attribute peak-rpm real @attribute city-mpg real @attribute highway-mpg real @attribute price real @attribute symboling { -3, -2, -1, 0, 1, 2, 3} @data ?,alfa-romero,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9,111,5000 ,21,27,13495,3 ?,alfa-romero,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9,111,5000 ,21,27,16500,3 ?,alfa-romero,gas,std,two,hatchback,rwd,front,94.5,171.2,65.5,52.4,2823,ohcv,six,152,mpfi,2.68,3.47,9,154,5000, 19,26,16500,1 164,audi,gas,std,four,sedan,fwd,front,99.8,176.6,66.2,54.3,2337,ohc,four,109,mpfi,3.19,3.4,10,102,5500,24,30, 13950,2 164,audi,gas,std,four,sedan,4wd,front,99.4,176.6,66.4,54.3,2824,ohc,five,136,mpfi,3.19,3.4,8,115,5500,18,22,17450 ,2 ?,audi,gas,std,two,sedan,fwd,front,99.8,177.3,66.3,53.1,2507,ohc,five,136,mpfi,3.19,3.4,8.5,110,5500,19,25,15250 ,2 158,audi,gas,std,four,sedan,fwd,front,105.8,192.7,71.4,55.7,2844,ohc,five,136,mpfi,3.19,3.4,8.5,110,5500,19,25, 17710,1 ?,audi,gas,std,four,wagon,fwd,front,105.8,192.7,71.4,55.7,2954,ohc,five,136,mpfi,3.19,3.4,8.5,110,5500,19,25, 18920,1 158,audi,gas,turbo,four,sedan,fwd,front,105.8,192.7,71.4,55.9,3086,ohc,five,131,mpfi,3.13,3.4,8.3,140,5500,17, 20,23875,1 ?,audi,gas,turbo,two,hatchback,4wd,front,99.5,178.2,67.9,52,3053,ohc,five,131,mpfi,3.13,3.4,7,160,5500,16,22, ?,0 192,bmw,gas,std,two,sedan,rwd,front,101.2,176.8,64.8,54.3,2395,ohc,four,108,mpfi,3.5,2.8,8.8,101,5800,23,29, 16430,2 192,bmw,gas,std,four,sedan,rwd,front,101.2,176.8,64.8,54.3,2395,ohc,four,108,mpfi,3.5,2.8,8.8,101,5800,23,29 ,16925,0 188,bmw,gas,std,two,sedan,rwd,front,101.2,176.8,64.8,54.3,2710,ohc,six,164,mpfi,3.31,3.19,9,121,4250,21,28, 20970,0 188,bmw,gas,std,four,sedan,rwd,front,101.2,176.8,64.8,54.3,2765,ohc,six,164,mpfi,3.31,3.19,9,121,4250,21,28, 21105,0 ?,bmw,gas,std,four,sedan,rwd,front,103.5,189,66.9,55.7,3055,ohc,six,164,mpfi,3.31,3.19,9,121,4250,20,25,24565, 1 ?,bmw,gas,std,four,sedan,rwd,front,103.5,189,66.9,55.7,3230,ohc,six,209,mpfi,3.62,3.39,8,182,5400,16,22,30760,0 ?,bmw,gas,std,two,sedan,rwd,front,103.5,193.8,67.9,53.7,3380,ohc,six,209,mpfi,3.62,3.39,8,182,5400,16,22,41315 ,0

?,bmw,gas,std,four,sedan,rwd,front,110,197,70.9,56.3,3505,ohc,six,209,mpfi,3.62,3.39,8,182,5400,15,20,36880,0 121,chevrolet,gas,std,two,hatchback,fwd,front,88.4,141.1,60.3,53.2,1488,l,three,61,2bbl,2.91,3.03,9.5,48,5100,47, 53,5151,2 98,chevrolet,gas,std,two,hatchback,fwd,front,94.5,155.9,63.6,52,1874,ohc,four,90,2bbl,3.03,3.11,9.6,70,5400,38,43, 6295,1 81,chevrolet,gas,std,four,sedan,fwd,front,94.5,158.8,63.6,52,1909,ohc,four,90,2bbl,3.03,3.11,9.6,70,5400,38,43, 6575,0 118,dodge,gas,std,two,hatchback,fwd,front,93.7,157.3,63.8,50.8,1876,ohc,four,90,2bbl,2.97,3.23,9.41,68,5500 ,37,41,5572,1 118,dodge,gas,std,two,hatchback,fwd,front,93.7,157.3,63.8,50.8,1876,ohc,four,90,2bbl,2.97,3.23,9.4,68,5500 ,31,38,6377,1 118,dodge,gas,turbo,two,hatchback,fwd,front,93.7,157.3,63.8,50.8,2128,ohc,four,98,mpfi,3.03,3.39,7.6,102,5500 ,24,30,7957,1 148,dodge,gas,std,four,hatchback,fwd,front,93.7,157.3,63.8,50.6,1967,ohc,four,90,2bbl,2.97,3.23,9.4,68,5500, 31,38,6229,1 148,dodge,gas,std,four,sedan,fwd,front,93.7,157.3,63.8,50.6,1989,ohc,four,90,2bbl,2.97,3.23,9.4,68,5500,31, 38,6692,1

Clustering in WEKA

Load the data file AUTOS.arff into WEKA using the same steps we used to load data into the Preprocess tab. Take a few minutes to look around the data in this tab. Look at the columns, the attribute data, the distribution of the columns, etc. The screen should look like the figure shown below after loading the data.

With this data set, we are looking to create clusters, so instead of clicking on the Classify tab, click on the Cluster tab. Click Choose and select SimpleKMeans from the choices that appear (this will be our preferred method of clustering for this article). WEKA Explorer window should look like the following figure at this point.

Finally, we want to adjust the attributes of our cluster algorithm by clicking SimpleKMeans (not the best UI design here, but go with it). The only attribute of the algorithm we are interested in adjusting here is the numClusters field, which tells us how many clusters we want to create. (Remember, one need to know this before start.) Let's change the default value of 2 to 4 for now, but keep these steps in mind later if one wants to adjust the number of clusters created. WEKA Explorer should look like the following at this point. Click OK to accept these values.

At this point, we are ready to run the clustering algorithm. Remember that this much rows of data with four data clusters would likely take a few hours of computation with a spreadsheet, but WEKA can spit out the answer in less than a second. The output should look like the figure shown below.

Time taken to build model (full training data) : 0.02 seconds === Model and evaluation on training set === Clustered Instances 0 1 2 3 60 ( 29%) 33 ( 16%) 55 ( 27%) 57 ( 28%)

Based on the values of cluster centroids as shown in the above figure, we can state the characteristics of each of the clusters. For explaniantion we are taking Cluster 1 and Cluster 2 Cluster 2 This group will always look for the premium segment car Peugot . Has the larget wheel base, length, height, curb weight, engine size. As the engine size is inversely proportional to the mileage , it has the lowest city and high way mileage. It has the highest number of cylinders. Compression ratio , horse power, peak rpm all have the highest value which make it a highest priced Car. Cluster 1 This group will always look for the VALUE FOR MONEY car. It belongs to the mass segment. As the engine power is inversely proportional to the mileage , we can see it has

the highest highway and city mileage and low compression ration, horse power and RPM. For this segment price is one of the important criteria before buying the car. The Cluster analysis will help the car company which segment it should target before start of the new product development/bringing the car into the market. Visualization of Clustering Results A more intuitive way to go through the results is to visualize them in the graphical form. To do so: Right click the result in the Result list panel Select Visualize cluster assignments By setting X-axis variable as Cluster, Y-axis variable as Instance_number and Color as aspiration, we get the following output:

Here we can see all the clusters (segments) have mixed response to the aspiration.

Similarly we can change the variables in X-axis, Y-axis and color to visualize other aspects of result. Note that WEKA has generated an extra variable named Cluster (not present in original data) which signifies the cluster membership of various instances. We can save the output as an arff file by clicking on the save button. The output file contains an additional attribute cluster for each instance. Thus besides the value of twenty six attributes for any instance, the output also specifies the cluster membership for that instance.

Creating the regression model with WEKA

To create the model, click on the Classify tab. The first step is to select the model we want to build, so WEKA knows how to work with the data, and how to create the appropriate model: 1. Click the Choose button, then expand the functions branch. 2. Select the LinearRegression leaf. This tells WEKA that we want to build a regression model. As one can see from the other choices, though, there are lots of possible models to build.This should give a good indication of how we are only touching the surface of this subject. There is another choice called SimpleLinearRegression in the same branch. Do not choose this because simple regression only looks at one variable, and we have six. The used attributes are as follows:
The used attributes are : MYCT: machine cycle time in nanoseconds (integer) MMIN: minimum main memory in kilobytes (integer) MMAX: maximum main memory in kilobytes (integer) CACH: cache memory in kilobytes (integer) CHMIN: minimum channels in units (integer) CHMAX: maximum channels in units (integer) PRP: published relative performance (integer) (target variable)

A part of the data file is as follows:

@relation machine_cpu @attribute MYCT numeric @attribute MMIN numeric @attribute MMAX numeric @attribute CACH numeric @attribute CHMIN numeric @attribute CHMAX numeric @attribute class numeric @data 125,256,6000,256,16,128,198 29,8000,32000,32,8,32,269 29,8000,32000,32,8,32,220 29,8000,32000,32,8,32,172 29,8000,16000,32,8,16,132 26,8000,32000,64,8,32,318 23,16000,32000,64,16,32,367 23,16000,32000,64,16,32,489 23,16000,64000,64,16,32,636

When we've selected the right model, WEKA Explorer should look like the following figure.

Now that the desired model has been chosen, we have to tell WEKA where the data is that it should use to build the model. Though it may be obvious to us that we want to use the data we supplied in the ARFF file, there are actually different options, some more advanced than what we'll be using. The other three choices are Supplied test set, where one can supply a different set of data to build the model; Cross-validation, which lets WEKA build a model based on subsets of the supplied data and then average them out to create a final model; and Percentage split, where WEKA takes a percentile subset of the supplied data to build a final model. These other choices are useful with different models, which we'll see in future articles. With regression, we can simply choose Use training set. This tells WEKA that to build our desired model, we can simply use the data set we supplied in our ARFF file. Finally, the last step to creating our model is to choose the dependent variable (the column we are looking to predict). We know this should be the Class, since that's what we're trying to determine. Right below the test options, there's a combo box that lets us choose the dependent variable. The column Class should be selected by default. If it's not, please select it. Now we are ready to create our model. Click Start. The following figure shows what the output should look like.
=== Run information ===

Scheme:

weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8

Relation: Instances:

machine_cpu 209

Attributes: 7 MYCT MMIN MMAX CACH CHMIN CHMAX class Test mode: evaluate on training data

=== Classifier model (full training set) ===

Linear Regression Model

class =

0.0491 * MYCT + 0.0152 * MMIN + 0.0056 * MMAX + 0.6298 * CACH + 1.4599 * CHMAX + -56.075

Time taken to build model: 0 seconds

=== Evaluation on training set === === Summary ===

Correlation coefficient Mean absolute error Root mean squared error Relative absolute error Root relative squared error Total Number of Instances

0.93 37.9748 58.9899 39.592 % 36.7663 % 209

Here class represents PRP(Published Relative Performance)

Interpreting the regression model

It puts the regression model right there in the output, as shown in Listing below: Class(PRP) = 0.0491 * MYCT + 0.0152 * MMIN + 0.0056 * MMAX + 0.6298 * CACH + 1.4599 * CHMAX -56.075

Listing 2 shows the results, plugging in the values for my house.

Listing 2. PRP value using regression model Class(PRP)= 0.0491 * 29 + 0.0152 * 8000 + .0056 * 32000 + .6298 * 32 + 1.4599 * 32 56.075 PRP= 1.4239 + 121.6 + 179.2 + 20.1536 + 46.7168-56.075

PRP = 313.0193

However, looking back to the top of the article, data mining isn't just about outputting a single number: It's about identifying patterns and rules. It's not strictly used to produce an absolute number but rather to create a model that lets us detect patterns, predict output, and come up with conclusions backed by the data

Minimum channel units doesn't matter WEKA will only use columns that statistically contribute to the accuracy of the model (measured in R-squared). It will throw out and ignore columns that don't help in creating a good model. So this regression model is telling us that minimum channels doesn't affect PRP. We can also visualize the classifier error i.e. those instances which are wrongly predicted by regression equation by right clinking on the result set in the Result list panel and selecting Visualize classifier errors.

The X-axis has Price (actual) and the Y-axis has Predicted Price.

Learning Informatica PowerCenter 9.x
From Everand
Learning Informatica PowerCenter 9.x
Rahul Malewar
3/5 (4)
Machine Learning with SAS Viya
From Everand
Machine Learning with SAS Viya
SAS Institute Inc.
No ratings yet
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
DMW_LabFile_0901CS243D11_swastik
No ratings yet
DMW_LabFile_0901CS243D11_swastik
25 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
DMDV_210
No ratings yet
DMDV_210
63 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
DWM1
No ratings yet
DWM1
19 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
DMDV_210
No ratings yet
DMDV_210
61 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
47 pages
Hemant STRT
No ratings yet
Hemant STRT
18 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
STRT Abhay
No ratings yet
STRT Abhay
14 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
Journal Data Mining
No ratings yet
Journal Data Mining
31 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
Data Mining in Bioinformatics
No ratings yet
Data Mining in Bioinformatics
21 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Experiment No: 01 Data Exploration & Data Preprocessing
No ratings yet
Experiment No: 01 Data Exploration & Data Preprocessing
54 pages
BBA CA Semester III Manisha Madam
No ratings yet
BBA CA Semester III Manisha Madam
32 pages
An Introduction To WEKA: Contributed by Yizhou Sun 2008
No ratings yet
An Introduction To WEKA: Contributed by Yizhou Sun 2008
85 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
Wekappt
No ratings yet
Wekappt
58 pages
Data Mining Lab Manual: Aurora's PG College Moosarambagh Mca Department
No ratings yet
Data Mining Lab Manual: Aurora's PG College Moosarambagh Mca Department
42 pages
JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On
No ratings yet
JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On
33 pages
DWDM Lab File
No ratings yet
DWDM Lab File
29 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
Weka
No ratings yet
Weka
15 pages
Data-Mining-Lab-Manual Cs 703b
No ratings yet
Data-Mining-Lab-Manual Cs 703b
41 pages
Syllabus - Data Mining Solution With Weka
No ratings yet
Syllabus - Data Mining Solution With Weka
5 pages
DWDM Lab Manual Using Weka-For MIC
No ratings yet
DWDM Lab Manual Using Weka-For MIC
42 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
100% (1)
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
8 pages
dm-lab-manualiii-i-1-mrits
No ratings yet
dm-lab-manualiii-i-1-mrits
39 pages
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
Data Mining - Lab - Manual
No ratings yet
Data Mining - Lab - Manual
20 pages
An Introduction To WEKA: Contributed by Yizhou Sun 2008
No ratings yet
An Introduction To WEKA: Contributed by Yizhou Sun 2008
85 pages
Adama Science and Technology University-Dm-Lab
No ratings yet
Adama Science and Technology University-Dm-Lab
47 pages
DWDM LAB MANUAL
No ratings yet
DWDM LAB MANUAL
55 pages
dwdm_file-final_ver3.pdf_20241230_172003_0000
No ratings yet
dwdm_file-final_ver3.pdf_20241230_172003_0000
54 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
40 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
WEKA
No ratings yet
WEKA
50 pages
MC0717 Lab Manual
No ratings yet
MC0717 Lab Manual
42 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
Dwh Manual Merged
No ratings yet
Dwh Manual Merged
47 pages
Workshop 1
No ratings yet
Workshop 1
16 pages
weka u5
No ratings yet
weka u5
3 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
No ratings yet
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
104 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
DMLB 1
No ratings yet
DMLB 1
3 pages
WEKA A Machine Learning Workbench for Data Mining
No ratings yet
WEKA A Machine Learning Workbench for Data Mining
11 pages
WEKA Lab Session
No ratings yet
WEKA Lab Session
88 pages
Data Mining Lab Manual
33% (3)
Data Mining Lab Manual
44 pages
Learning Oracle 12c: A PL/SQL Approach
From Everand
Learning Oracle 12c: A PL/SQL Approach
Prof. Sham Tickoo
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Oracle Modernization Solutions
From Everand
Oracle Modernization Solutions
Tom Laszewski
No ratings yet
Oracle 11g Streams Implementer's Guide
From Everand
Oracle 11g Streams Implementer's Guide
Ann L. R. McKinnell
No ratings yet
Stock Market Movement Direction Prediction Using Tree Algorithms
No ratings yet
Stock Market Movement Direction Prediction Using Tree Algorithms
6 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
Link For Google Colab Note Book: Pa Ge
No ratings yet
Link For Google Colab Note Book: Pa Ge
17 pages
Reducing Power Consumption of Digital Predistortion For RF Power Amplifiers Using Real-Time Model Switching
No ratings yet
Reducing Power Consumption of Digital Predistortion For RF Power Amplifiers Using Real-Time Model Switching
9 pages
ID3 Algorithm: Abbas Rizvi CS157 B Spring 2010
No ratings yet
ID3 Algorithm: Abbas Rizvi CS157 B Spring 2010
19 pages
Data Mining Functionalities:: - Characterization and Discrimination
No ratings yet
Data Mining Functionalities:: - Characterization and Discrimination
21 pages
Predictive
No ratings yet
Predictive
61 pages
56.eye Ball Cursor Movement Using Opencv
75% (4)
56.eye Ball Cursor Movement Using Opencv
47 pages
Data Mining
100% (1)
Data Mining
40 pages
Information Gain - Towards Data Science
No ratings yet
Information Gain - Towards Data Science
8 pages
Similarities in Fuzzy Data Mining From a Cognitive View to Real World Applications 1st editon by Bernadette Bouchon Meunier, Maria Rifqi, Marie Jeanne Lesot ISBN 3540688587 9783540688587 pdf download
100% (1)
Similarities in Fuzzy Data Mining From a Cognitive View to Real World Applications 1st editon by Bernadette Bouchon Meunier, Maria Rifqi, Marie Jeanne Lesot ISBN 3540688587 9783540688587 pdf download
55 pages
Forecasting Bank Loans Loss-Given-Default: Jo Ao A. Bastos
No ratings yet
Forecasting Bank Loans Loss-Given-Default: Jo Ao A. Bastos
16 pages
Neural Networks and M5 Model Trees in Modelling Water Level-Discharge Relationship
No ratings yet
Neural Networks and M5 Model Trees in Modelling Water Level-Discharge Relationship
16 pages
Final Main Report 1
No ratings yet
Final Main Report 1
68 pages
Prarthana VR - CV - Commented
100% (1)
Prarthana VR - CV - Commented
2 pages
DWM - Classification-Unit7
No ratings yet
DWM - Classification-Unit7
44 pages
Students Placement Prediction System
No ratings yet
Students Placement Prediction System
5 pages
Project Management Decision Trees
100% (1)
Project Management Decision Trees
38 pages
Sent-Machine Learning For Data Science
100% (1)
Sent-Machine Learning For Data Science
463 pages
3 - Decision trees
No ratings yet
3 - Decision trees
16 pages
CC Unit - 4 Imp Questions
No ratings yet
CC Unit - 4 Imp Questions
4 pages
ML ASSIGNMENT-01
No ratings yet
ML ASSIGNMENT-01
7 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Mudau FN Thesis
100% (2)
Mudau FN Thesis
7 pages
Word Sense Disambiguation and Its Approaches: Vimal Dixit, Kamlesh Dutta and Pardeep Singh
No ratings yet
Word Sense Disambiguation and Its Approaches: Vimal Dixit, Kamlesh Dutta and Pardeep Singh
5 pages
Tree-Based Ensemble Methods: Predicting Asphalt Mixture Dynamic Modulus For Flexible Pavement Design
No ratings yet
Tree-Based Ensemble Methods: Predicting Asphalt Mixture Dynamic Modulus For Flexible Pavement Design
9 pages
Estimating Warehouse Rental Price Using Machine Le
No ratings yet
Estimating Warehouse Rental Price Using Machine Le
16 pages
Machine Learning Revision Notes
No ratings yet
Machine Learning Revision Notes
6 pages
UNIT 2 AAM notes (1)
No ratings yet
UNIT 2 AAM notes (1)
38 pages
Building Credit Scoring Models
No ratings yet
Building Credit Scoring Models
11 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Mining Techniques Using WEKA: Vinod Gupta School of Management, Iit Kharagpur

Uploaded by

Data Mining Techniques Using WEKA: Vinod Gupta School of Management, Iit Kharagpur

Uploaded by

Data Mining Techniques using WEKA

VINOD GUPTA SCHOOL OF MANAGEMENT, IIT KHARAGPUR

MASTER OF BUSINESS ADMINISTRATION

SUBMITTED BY: Prashant Menon 10BM60061 VGSOM, IIT KHARAGPUR

Weka: Pros and cons

3. WEKA data formats

A part of the saved arff file.

Creating the regression model with WEKA

A part of the data file is as follows:

=== Classifier model (full training set) ===

Linear Regression Model

Time taken to build model: 0 seconds

=== Evaluation on training set === === Summary ===

0.93 37.9748 58.9899 39.592 % 36.7663 % 209

Here class represents PRP(Published Relative Performance)

Interpreting the regression model

Listing 2 shows the results, plugging in the values for my house.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.