Assignment2 - anuradhasrinivas

advertisement
St. Francis Institute of Technology
ASSIGNMENT-2
DATA WAREHOUSING AND DATA MINING
1. Apply ID3 on the following Training Dataset from All Electronics Customer Database and extract
the classification rule from the tree
Age
<=30
<=30
31..40
>40
>40
>40
31..40
<=30
<=30
>40
<=30
31..40
31..40
>40
Income
High
High
High
Medium
Low
Low
Low
Medium
Low
Medium
Medium
Medium
High
Medium
Student
No
No
No
No
Yes
Yes
Yes
No
Yes
Yes
Yes
No
Yes
No
Credit-rating
Fair
Excellent
Fair
Fair
Fair
Excellent
Excellent
Fair
Fair
Fair
Excellent
Excellent
Fair
Excellent
Class:buys_computer
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
2. Suppose we want ID3 to decide whether the weather is amenable to playing baseball. The target
classification is “should we play baseball?” which can be yes or no
Day
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
D13
D14
Outlook
Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain
Sunny
Overcast
Overcast
Rain
Temperature
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
Humidity
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
Wind
Weak
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Strong
Play Ball
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
3. Consider the following dataset that helps to predict the RISK of a loan application based on the
applicant’s credit history, DEBT and INCOME. Predict the Risk for UNSEEN Tuple X=<unknown, high,
over35, moderate>. Write down the rule used by Naïve Bayes to classify instances and apply it to the
St. Francis Institute of Technology
following instance: <Credit History=bad; Debt =Low; Income =15 to 35> Which class will be returned
by Naïve Bayes?
CREDIT HISTORY DEBT INCOME RISK
Bad
Low 0 to 15
high
Bad
Bad
Unknown
Unknown
Good
Bad
Unknown
Good
Unknown
Unknown
Good
Good
Good
High
Low
High
High
High
Low
Low
High
Low
Low
Low
High
High
15 to 35
0 to 15
15 to 35
0 to 15
0 to 15
over35
15 to 35
15 to 35
over35
over35
over35
over35
over35
high
high
high
high
high
moderate
moderate
moderate
low
low
low
low
low
4. Apply statistical based algorithm to obtain the actual probabilities of each event to classify the
new tuple as tall. Hence classify <Adam, M, 1.95m> as tall
Person ID
1
2
3
4
5
6
7
8
9
Name
Kristina
Jim
Maggie
Martha
John
Bob
Cllinton
Nyssa
Kathy
Gender
Female
Male
Female
Female
Male
Male
Male
Female
Female
Height
1.6m
2m
1.9m
1.85m
2.8m
1.7m
1.8m
1.6m
1.65m
Class
Short
Tall
Medium
Medium
Tall
Short
Medium
Short
Short
Download