Data Warehousing and Data Mining 2078 - With Solution

Exam Year

Tribhuvan University

Institute of Science and Technology

2078

Bachelor Level / seventh-semester / Science

Computer Science and Information Technology( CSC410 )

Data Warehousing and Data Mining

Full Marks: 60 + 20 + 20

Pass Marks: 24 + 8 + 8

Time: 3 Hours

Candidates are required to give their answers in their own words as far as practicable.

The figures in the margin indicate full marks.

Section A

Attempt any two questions.

Write down any one advantage and disadvantage of MOLAP over ROLAP. Define signed network and how do you check whether it is balanced or not? How beam search reduces the space complexity? Illustrate with an example.

How concept hierarchy is used in extracting information? Generate the frequent pattern from the following data set using FP growth, where minimum support=3.

How do you compare two classifiers? Given the points A(3,7), B(4,6), C(5,5), D(6,4), E(7,3), F(6,2), G(7,2) and H(8,4), find the core points, border points and outliers using DBSCAN. Take Eps 2.5 and MinPts = 3.

Section B

Attempt any eight questions.

When a pattern is said to be interesting? List the issues of data mining.

Define data discretization. Describe the tasks for data preprocessing.

Define spatial data mining. What are the challenged of multimedia mining? Describe with an example.

Consider the following data set.

Confident	Studied	Sick	Result
Yes	No	No	Fail
Yes	No	No	Pass
No	Yes	Yes	Fail
No	Yes	Yes	Pass
Yes	Yes	Yes	Pass

Find out whether the object with attribute Confident = Yes, Sick = No will Fail or Pass using Bayesian classification.

What are the choices for data cube materialization? Explain the strategies for cube computation.

Show the conflict between theory of balance and status. How do you improve Apriori?

Differentiate between star schema and snow flake schema. List any two methods for data normalization.

How do you evaluate the accuracy of a classifier? Discuss the advantages of using K- fold cross validation.

Apply K(=2)- Means algorithm over the data (185, 72), (170, 56), (168, 60), (179, 68), (182, 72), (188, 77) up to two iterations and show the clusters. Initially choose first two objects as initial centroids.