Tribhuvan University
Institute of Science and Technology
Model Set
Bachelor Level / seventh-semester / Science
Computer Science and Information Technology( CSC410 )
Data Warehousing and Data Mining
Full Marks: 60 + 20 + 20
Pass Marks: 24 + 8 + 8
Time: 3 Hours
Candidates are required to give their answers in their own words as far as practicable.
The figures in the margin indicate full marks.
Group A
Attempt any TWO questions.
What is the Apriori principle? How is it used by the Apriori algorithm for frequent pattern mining? What are the limitations of Apriori approach? Use the APRIORI algorithm to generate strong association rules from the following transaction database. Use min_sup=40% and min_confidence=75%.
| Transaction ID | Items Purchased |
| T1 | Bread, Milk, Eggs, Butter |
| T2 | Bread, Milk, Cheese |
| T3 | Milk, Eggs, Cheese, Yogurt |
| T4 | Bread, Butter, Cheese |
| T5 | Bread, Milk, Butter, Yogurt |
What is a rule based classifier? How to extract the rules from the decision tree? What is overfitting? How to detect overfitting? Explain the way to solve the overfitting problem. Train ID3 classifier using the dataset given below. Then predict the class label for the new data sample [Weather=Sunny, Temperature=Hot, Humidity=Normal, Wind=Strong].
| Weather | Temperature | Humidity | Wind | Play Tennis |
| Sunny | Hot | High | Weak | No |
| Sunny | Hot | High | Strong | No |
| Overcast | Hot | High | Weak | Yes |
| Rainy | Mild | High | Weak | Yes |
| Rainy | Cool | Normal | Weak | Yes |
| Rainy | Cool | Normal | Strong | No |
| Overcast | Cool | Normal | Strong | Yes |
| Sunny | Mild | High | Weak | No |
| Sunny | Cool | Normal | Weak | Yes |
| Rainy | Mild | Normal | Weak | Yes |
What is centroid based clustering? Why is k-means clustering called a centroid based clustering algorithm? Cluster the following instances of given data with the help of K means algorithm (Take K = 2, use first and last data points as initial centroids):
| Instance | X | Y |
| P1 | 2 | 3 |
| P2 | 3 | 4 |
| P3 | 6 | 8 |
| P4 | 7 | 9 |
| P5 | 8 | 10 |
| P6 | 9 | 11 |
Group B
Attempt any EIGHT questions.
What is a data warehouse? How is it different from a database? What is data mart?
What is KDD? Explain with a suitable block diagram.
What is data integration? What is data reduction? Why is data preprocessing important?
What is Cube materialization? Define Full cube, Iceberg cube, closed cube and Shell cube.
What is a frequent pattern? What is market basket analysis? Explain it with suitable examples.
What is a confusion matrix? Explain the importance of confusion matrix in measuring the performance of classification models.
What is clustering? How is it different from supervised classification? What is the DBSCAN algorithm?
Define social network analysis. What is the motivation behind link mining?