DWDM Model Set II - With Solution

Exam Year

Tribhuvan University

Institute of Science and Technology

Model Set

Bachelor Level / seventh-semester / Science

Computer Science and Information Technology( CSC410 )

Data Warehousing and Data Mining

Full Marks: 60 + 20 + 20

Pass Marks: 24 + 8 + 8

Time: 3 Hours

Candidates are required to give their answers in their own words as far as practicable.

The figures in the margin indicate full marks.

Group A

Attempt any TWO questions.

What is the Apriori principle? How is it used by the Apriori algorithm for frequent pattern mining? What are the limitations of Apriori approach? Use the APRIORI algorithm to generate strong association rules from the following transaction database. Use min_sup=40% and min_confidence=75%.

Transaction ID	Items Purchased
T1	Bread, Milk, Eggs, Butter
T2	Bread, Milk, Cheese
T3	Milk, Eggs, Cheese, Yogurt
T4	Bread, Butter, Cheese
T5	Bread, Milk, Butter, Yogurt

What is a rule based classifier? How to extract the rules from the decision tree? What is overfitting? How to detect overfitting? Explain the way to solve the overfitting problem. Train ID3 classifier using the dataset given below. Then predict the class label for the new data sample [Weather=Sunny, Temperature=Hot, Humidity=Normal, Wind=Strong].

Weather	Temperature	Humidity	Wind	Play Tennis
Sunny	Hot	High	Weak	No
Sunny	Hot	High	Strong	No
Overcast	Hot	High	Weak	Yes
Rainy	Mild	High	Weak	Yes
Rainy	Cool	Normal	Weak	Yes
Rainy	Cool	Normal	Strong	No
Overcast	Cool	Normal	Strong	Yes
Sunny	Mild	High	Weak	No
Sunny	Cool	Normal	Weak	Yes
Rainy	Mild	Normal	Weak	Yes

What is centroid based clustering? Why is k-means clustering called a centroid based clustering algorithm? Cluster the following instances of given data with the help of K means algorithm (Take K = 2, use first and last data points as initial centroids):

Instance	X	Y
P1	2	3
P2	3	4
P3	6	8
P4	7	9
P5	8	10
P6	9	11

Group B

Attempt any EIGHT questions.

What is a data warehouse? How is it different from a database? What is data mart?

What is KDD? Explain with a suitable block diagram.

What is data integration? What is data reduction? Why is data preprocessing important?

What is Cube materialization? Define Full cube, Iceberg cube, closed cube and Shell cube.

What is a frequent pattern? What is market basket analysis? Explain it with suitable examples.

What is a confusion matrix? Explain the importance of confusion matrix in measuring the performance of classification models.

What is clustering? How is it different from supervised classification? What is the DBSCAN algorithm?

Define social network analysis. What is the motivation behind link mining?