DWDM_Lab_Tasks(1)
DWDM_Lab_Tasks(1)
Aim:(question it self)
Description:(if any there)
Tool/apparatus:Weka mining tool.
Procedure:
1a)List all the categorical (or nominal) attributes and the real-valued attributes separately.
1) Open the Weka GUI Chooser.
2) Select EXPLORER present in Applications.
3) Select Preprocess Tab.
4) Go to OPEN file and browse the file that is already stored in the system―credit-g.arff.
5) Clicking on any attribute in the left panel will show the basic statistics on that selected attribute.
—------------------------------------------------------------------------------------------------------------------------------
1b)What attributes do you think might be crucial in making the credit assessement ? Come up with
some simple rules in plain English using your selected attributes.
Output:
According to me the following attributes may be crucial in making the credit risk assessment. The
crucial attributes in making credit assessment are:
a) CHECKING
b) FOREIGN
c) OTHER
d) HOUSING
e) HISTORY
f) Duration
Simple Rules for credit assessment:
1) If an applicant is foreign worker other is none i.e., if an applicant is not having any other credits and
checking status is no then the applicant is treated as good
2) If an applicant is foreign worker and then checking status is none then also an applicant is treated as
good
3) If other is none and checking status is none then also an applicant is treated as GOOD.
4) If OTHER is none and house is own then also an applicant is treated as GOOD.
5) If CHECKING is none then only we can treat an applicant as GOOD.
6) If HISTORY is critical or other existing and OTHER is none then also the applicant is treated as
GOOD.
7) If DURATION is less and HISTORY is critical or other existing then also an applicant is treated as
GOOD.
Based on the above attributes, we can make a decision whether to give credit or not.
—------------------------------------------------------------------------------------------------------------------------------
1c)One type of model that you can create is a Decision Tree - train a Decision Tree using the complete
dataset as the training data. Report the model obtained after training.
1. Open Weka GUI Chooser
2. select EXPLORER under Applications.
3. In the Preprocess tab,
4. open credit-g.arff from the system.
5. Go to the Classify tab, choose the J48 algorithm (tree classifier).
6. Set test options to Use training set and optionally select attributes.
7. Click Start to execute and view output in Classifier output.
8. Right-click the result list and select Visualize Tree to view the tree.
—------------------------------------------------------------------------------------------------------------------------------
2a)Suppose you use your above model trained on the complete dataset, and classify credit good/bad
for each of the examples in the dataset. What % of examples can you classify correctly? (This is also
called testing on the training set) Why do you think you cannot get 100 % training accuracy? Why or
Why not?Check to see if the data shows a bias against "foreign workers" (attribute 20),or
"personal-status" (attribute 9).Did removing these attributes have any significant effect? Discuss.
ANS:
In this model, we trained the entire dataset to classify whether credit is good or bad based on
attributes. For instance:
This approach correctly classified 85.5% of examples, while 14.5% were misclassified. The lack of
100% accuracy is due to the presence of irrelevant attributes among the 20 total attributes, which
impacted the model's performance.
Additional Examples:
—------------------------------------------------------------------------------------------------------------------------------
2b)Check to see if the data shows a bias against "foreign workers" (attribute 20),or "personal- status"
(attribute 9).Did removing these attributes have any significant effect? Discuss.
ANS:
The analysis reveals the impact of two attributes—foreign-workers (attribute 20) and
personal-status (attribute 9)—on the model's accuracy:
1. Foreign-workers:
○ Removing this attribute slightly increases accuracy from 85.5% to 85.9% (+0.4%).
○
However, the attribute is deemed important for loan approval decisions and is
preserved.
2. Personal-status:
○ Removing this attribute improves accuracy from 85.5% to 86.6% (+1.1%).
○ This indicates the attribute is unnecessary and can be excluded from analysis.
—------------------------------------------------------------------------------------------------------------------------------
3)One approach for solving the problem encountered in the previous question is using cross-
validation? Describe what cross-validation is briefly. Train a Decision Tree again using cross- validation
and report your results. Does your accuracy increase/decrease? Why?
Cross Validation:- In cross-validation you decided on a fixed number of folds or partitions of the data.
Two-third for training and one-third for testing and repeat procedure three times so that in the end,
every instance has been used exactly once for testing is called stratified cross-fold validation.In training
dataset compared with cross-validation the accuracy is decreases.
Procedure:-
2) Use the Weka GUI Chooser.
3) Select EXPLORER present in Applications.
4) Select Preprocess Tab.
5) Go to OPEN file and browse the file that is already stored in the system “credit- a.arff”.
Go to Classify tab.
6) Choose Classifier “Tree”
7) Select J48
8) Select Test options “Cross-validation”.
9) Set “Folds” Ex:10
10) if need select attribute.
11) now Start weka.
—------------------------------------------------------------------------------------------------------------------------------
4)Another question might be, do you really need to input so many attributes to get good results? Maybe
only a few would do. For example, you could try just having attributes 2, 3, 5, 7, 10, 17 (and 21, the
class attribute (naturally)). Try out some combinations. Train your decision tree again and report the
Decision tree and cross_validation results
1. Open Weka GUI Chooser, select EXPLORER, and load “credit- a.arff” from the Preprocess
tab.
2. Remove unnecessary attributes (given in the question) to retain only those relevant for
classification.
3. Go to the Classify tab, choose Tree, and select the J48 classifier.
4. Set test options to Use training set and optionally select attributes.
5. Click Start to execute and view results in Classifier output.
6. Right-click on the result list, select Visualize Tree, and compare the output with when all the
attributes where selected.
7. Analyze the effect of attribute removal on model accuracy.
8. Now under test options. Select cross validation
9. Folds set to 10
10. Start weka, check output in classifier output panel.
11. Compare the accuracy.
—------------------------------------------------------------------------------------------------------------------------------
5a)Do you think it is a good idea to prefer simple decision trees instead of having long complex
decision trees? How does the complexity of a Decision Tree relate to the bias of the model?
Ans: When we consider long complex decision trees, we will have many unnecessary attributes in the
tree which results in increase of the bias of the model. Because of this, the accuracy of the model can
also effected.
This problem can be reduced by considering simple decision tree. The attributes will be less and it
decreases the bias of the model. Due to this the result will be more accurate.So it is a good idea to
prefer simple decision trees instead of long complex trees.
—------------------------------------------------------------------------------------------------------------------------------
5b)You can make your Decision Trees simpler by pruning the nodes. One approach is to use Reduced
Error Pruning - Explain this idea briefly. Try reduced error pruning for training your Decision Trees using
cross-validation (you can do this in Weka) and report the Decision Tree you obtain ? Also, report your
accuracy using the pruned model. Does your accuracy increase ?
Theory:
J48:
OneR:
PART:
—------------------------------------------------------------------------------------------------------------------------------
8a)Create a data set Employee.arff by adding required data fields
Description:
We need to create a Employee table with training data set which includes attributes like age, salary,
performance
Type the following training data set with the help of Notepad for Employee Table.
@relation employee
@attribute age {25, 27, 28, 29, 30, 35, 48}
@attributesalary{10k,15k,17k,20k,25k,30k,35k,32k}
@attribute performance {good, avg, poor}
@data
25, 10k, poor
27, 15k, poor
27, 17k, poor
28, 17k, poor
29, 20k, avg
30, 25k, avg
29, 25k, avg
30, 20k, avg
35, 32k, good
48, 35k, good
48, 32k,good
1. Save the file as EMP.arff
2. Start weka gui
3. Click explorer
4. Click on open file and select the Emp.arff
5. Select edit tab to view the data in table format.
—------------------------------------------------------------------------------------------------------------------------------
8b)Apply Association rule mining on dataset Employee.arff (Use Apriori Algorithm)
Description:In data mining, association rule learning is a popular and well researched method for
discovering interesting relations between variables in large databases. It can be described as analyzing
and presenting strong rules discovered in databases using different measures of interestingness. In
market basket analysis association rules are used and they are also employed in many application
areas including Web usage mining, intrusion detection and bioinformatics.
1. Start weka gui
2. Click explorer
3. Click on open file and select the Emp.arff
4. Click on associate tab
5. The default algorithm is set to apriori
6. Click on start and the output will appear in the output panel
—------------------------------------------------------------------------------------------------------------------------------
9a)Create a data set Weather.arff with required fields
Type the following training data set with the help of Notepad for Weather Table.
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
1. Save the file as Weather.arff
2. Start weka gui
3. Click explorer
4. Click on open file and select the Weather.arff
5. Select edit tab to view the data in table format.
—------------------------------------------------------------------------------------------------------------------------------
9b)Apply preprocessing techniques on dataset Weather.arff and normalize Weather table data using
Knowledge flow.
Description:
The Knowledge Flow in Weka offers a graphical interface as an alternative to the Explorer for using
Weka's algorithms. While it is still a work in progress, it provides a dataflow-based approach where
users can select Weka components from a toolbar, place them on a canvas, and connect them to
create a workflow for data processing and analysis. Some functionalities available in the Explorer are
not yet present in Knowledge Flow, but it also offers capabilities not found in Explorer.
1. Open Weka Knowledge Flow from Start Menu.
2. Add Arff Loader, Attribute Selection, Normalize, and Arff Saver components to the canvas.
3. Configure Arff Loader to load Weather.arff.
4. Link Arff Loader to Attribute Selection, then link to Normalize, and finally to Arff Saver.
5. Configure Attribute Selection to choose the best attributes.
6. Configure Arff Saver to set the save path for normalized data (.arff).
7. Start the process by right-clicking Arff Loader and selecting "Start Loading."
8. Verify the output file in the specified path and rename it to a.arff.
9. Open a.arff in MS Excel for review.
—------------------------------------------------------------------------------------------------------------------------------
10a)Demonstration of classification algorithm on dataset student.arff using j48 algorithm
Type the following training data set with the help of Notepad for Student Table.
@relation student
@attribute age {<30,30-40,>40}
@attribute income {low, medium, high}
@attribute student {yes, no}
@attribute credit-rating {fair, excellent}
@attribute buyspc {yes, no}
@data
<30, high, no, fair, no
<30, high, no, excellent, no 30-40, high, no, fair, yes
>40, medium, no, fair, yes
>40, low, yes, fair, yes
>40, low, yes, excellent, no 30-40, low, yes, excellent, yes
<30, medium, no, fair, no
<30, low, yes, fair, no
>40, medium, yes, fair, yes
<30, medium, yes, excellent, yes
30-40, medium, no, excellent, yes
30-40, high, yes, fair, yes
>40, medium, no, excellent, no
1. Start Weka
2. Clik explorer and goto pre process tab
3. click open file and select the student.arff file
4. goto classify tab click on choose and select J48 option
5. under test options select cross validation and set no. of folds to 10
6. click start. the output will appear in the classifier output panel.
7. right click on results list to visualize the tree
—------------------------------------------------------------------------------------------------------------------------------
10b)Demonstration of classification rule process on dataset employee.arff using naïve bayes algorithm
@attribute age {25, 27, 28, 29, 30, 35, 48}
@attributesalary{10k,15k,17k,20k,25k,30k,35k,32k}
@attribute performance {good, avg, poor}
@data
25, 10k, poor
27, 15k, poor
27, 17k, poor
28, 17k, poor
29, 20k, avg
30, 25k, avg
29, 25k, avg
30, 20k, avg
35, 32k, good
48, 34k, good
48, 32k,good
1. Start weka
2. Open explorer and goto preprocess tab
3. Click open file and select Emp.arff
4. Go to classify tab in choose select naive bayes rule under bayes
5. Under test options select cross validation
6. Set no of folds to 10
7. Click on start
8. The output is visible in the classifier output panel
—------------------------------------------------------------------------------------------------------------------------------
11a)Create a dataset customer.arff with required fields
Type the following training data set with the help of Notepad for Customer Table.
@relation customer
@attribute name {x,y,z,u,v,l,w,q,r,n}
@attribute age {youth,middle,senior}
@attribute income {high,medium,low}
@attribute class {A,B}
@data
x,youth,high,A
y,youth,low,B
z,middle,high,A
u,middle,low,B
v,senior,high,A
l,senior,low,B
w,youth,high,A
q,youth,low,B
r,middle,high,A
n,senior,high,A
—------------------------------------------------------------------------------------------------------------------------------
11b)Write a procedure for Clustering Customer data using simple K-Means algorithm
1. Start weka
2. Click explorer and goto preprocess tab
3. Click open file and select customer.arff
4. Goto cluster tab and click on choose and select simple k means
5. Click on text box next to the choose button to see the properties
6. Set no of clusters to 2
7. Leave the seed value as default
8. Under test options use training set and start weka
9. The result window will show the clusters in centriods of eacvh cluster
10. Right click on the result list and click visualize
—------------------------------------------------------------------------------------------------------------------------------
12)Demonstration of clustering rule process on dataset student.arff using simple k- means
@relation student
@attribute age {<30,30-40,>40}
@attribute income {low, medium, high}
@attribute student {yes, no}
@attribute credit-rating {fair, excellent}
@attribute buyspc {yes, no}
@data
<30, high, no, fair, no
<30, high, no, excellent, no 30-40, high, no, fair, yes
>40, medium, no, fair, yes
>40, low, yes, fair, yes
>40, low, yes, excellent, no 30-40, low, yes, excellent, yes
<30, medium, no, fair, no
<30, low, yes, fair, no
>40, medium, yes, fair, yes
<30, medium, yes, excellent, yes
30-40, medium, no, excellent, yes
30-40, high, yes, fair, yes
>40, medium, no, excellent, no
1. Start weka
2. Open explorer and goto preprocess tab
3. Click open file and select the customer.arff
4. Goto cluster tab and select simple k means
5. Click the text box next to the choose button to see the properties
6. Set no of clusters to 2
7. Leave the seed value as default
8. Under test opiton use training set
9. Start weka the results will appear in the output panel
10. Right click on the result list to visuailze