Tribhuvan University
Institute of Science and Technology
2081
Bachelor Level / seventh-semester / Science
Computer Science and Information Technology( CSC410 )
Data Warehousing and Data Mining
Full Marks: 60 + 20 + 20
Pass Marks: 24 + 8 + 8
Time: 3 Hours
Candidates are required to give their answers in their own words as far as practicable.
The figures in the margin indicate full marks.
Section A
When do we prefer trim mean for statistical description of data? Justify with an example. Describe about multi-dimensional data model and conceptual modeling of data warehouse.
How do you generate strong association rules? From the following dataset find the frequent item set using FP growth algorithm using 3 as minimum support.
| Transaction ID | Items |
| T1 | {K, E, M, O, Y} |
| T2 | {K, E, O, Y} |
| T3 | {K, E, M} |
| T4 | {K, M, Y} |
| T5 | {K, E, O} |
Define overfitting and under fitting. Train the decision tree classifier using the ID3 algorithm based on the following training data.
| TID | Age | Car Type | Class |
|---|---|---|---|
| 1 | ≤30 | Family | High |
| 2 | ≤30 | Sports | High |
| 3 | >30 | Sports | High |
| 4 | >30 | Family | Low |
| 5 | >30 | Truck | Low |
| 6 | ≤30 | Family | High |
Section B
Describe any two methods of handling noisy data.
Using k-means++ algorithm and Euclidean distance, find the initial 3 cluster centroids from A1 = (3, 11), A2 = (3, 6), A3 = (9, 5), A4 = (6, 9), A6 = (7, 5), A7 = (2, 3), A8 = (5, 10). Choose (3, 11) as one of the initial centroids.
Explain the general strategies for cube computation.
Distinguish between data characterization and data discrimination. What are the challenges of multimedia mining?
Define graph mining. Discuss the conflict between theory of balance and theory of status.
What is support vector? How do you evaluate the accuracy of a classifier? Describe.
Differentiate between k-means and k-medoids clustering algorithm.
List any two OLAP operations with example. How do you compute rule coverage and rule accuracy?
Define link mining. What are the roles of epsilon and MinPts in DBSCAN.