Tribhuvan University
Institute of Science and Technology
2078
Bachelor Level / seventh-semester / Science
Computer Science and Information Technology( CSC410 )
Data Warehousing and Data Mining
Full Marks: 60 + 20 + 20
Pass Marks: 24 + 8 + 8
Time: 3 Hours
Candidates are required to give their answers in their own words as far as practicable.
The figures in the margin indicate full marks.
Section A
Attempt any two questions.
Write down any one advantage and disadvantage of MOLAP over ROLAP. Define signed network and how do you check whether it is balanced or not? How beam search reduces the space complexity? Illustrate with an example.
How concept hierarchy is used in extracting information? Generate the frequent pattern from the following data set using FP growth, where minimum support=3.
How do you compare two classifiers? Given the points A(3,7), B(4,6), C(5,5), D(6,4), E(7,3), F(6,2), G(7,2) and H(8,4), find the core points, border points and outliers using DBSCAN. Take Eps 2.5 and MinPts = 3.
Section B
Attempt any eight questions.
When a pattern is said to be interesting? List the issues of data mining.
Define data discretization. Describe the tasks for data preprocessing.
Define spatial data mining. What are the challenged of multimedia mining? Describe with an example.
Consider the following data set.
Confident | Studied | Sick | Result |
Yes | No | No | Fail |
Yes | No | No | Pass |
No | Yes | Yes | Fail |
No | Yes | Yes | Pass |
Yes | Yes | Yes | Pass |
Find out whether the object with attribute Confident = Yes, Sick = No will Fail or Pass using Bayesian classification.
What are the choices for data cube materialization? Explain the strategies for cube computation.
Show the conflict between theory of balance and status. How do you improve Apriori?
Differentiate between star schema and snow flake schema. List any two methods for data normalization.
How do you evaluate the accuracy of a classifier? Discuss the advantages of using K- fold cross validation.
Apply K(=2)- Means algorithm over the data (185, 72), (170, 56), (168, 60), (179, 68), (182, 72), (188, 77) up to two iterations and show the clusters. Initially choose first two objects as initial centroids.