Tribhuvan University

Institute of Science and Technology

2078

Bachelor Level / seventh-semester / Science

Computer Science and Information Technology( CSC410 )

Data Warehousing and Data Mining

Full Marks: 60 + 20 + 20

Pass Marks: 24 + 8 + 8

Time: 3 Hours

Candidates are required to give their answers in their own words as far as practicable.

The figures in the margin indicate full marks.

Section A

Attempt any two questions.

1

Write down any one advantage and disadvantage of MOLAP over ROLAP. Define signed network and how do you check whether it is balanced or not? How beam search reduces the space complexity? Illustrate with an example.

2

How concept hierarchy is used in extracting information? Generate the frequent pattern from the following data set using FP growth, where minimum support=3.

3

How do you compare two classifiers? Given the points A(3,7), B(4,6), C(5,5), D(6,4), E(7,3), F(6,2), G(7,2) and H(8,4), find the core points, border points and outliers using DBSCAN. Take Eps 2.5 and MinPts = 3.

Section B

Attempt any eight questions.

4

When a pattern is said to be interesting? List the issues of data mining.

5

Define data discretization. Describe the tasks for data preprocessing.

6

Define spatial data mining. What are the challenged of multimedia mining? Describe with an example.

7

Consider the following data set.

Confident Studied Sick Result
Yes No No Fail
Yes No No Pass
No Yes Yes Fail
No Yes Yes Pass
Yes Yes Yes Pass

Find out whether the object with attribute Confident = Yes, Sick = No will Fail or Pass using Bayesian classification.

8

What are the choices for data cube materialization? Explain the strategies for cube computation.

9

Show the conflict between theory of balance and status. How do you improve Apriori?

10

Differentiate between star schema and snow flake schema. List any two methods for data normalization.

11

How do you evaluate the accuracy of a classifier? Discuss the advantages of using K- fold cross validation.

12

Apply K(=2)- Means algorithm over the data (185, 72), (170, 56), (168, 60), (179, 68), (182, 72), (188, 77) up to two iterations and show the clusters. Initially choose first two objects as initial centroids.