Tribhuvan University

Institute of Science and Technology

2079

Bachelor Level / seventh-semester / Science

Computer Science and Information Technology( CSC410 )

Data Warehousing and Data Mining

Full Marks: 60 + 20 + 20

Pass Marks: 24 + 8 + 8

Time: 3 Hours

Candidates are required to give their answers in their own words as far as practicable.

The figures in the margin indicate full marks.

Section A

Attempt any two question.

1

Discuss any two drawbacks of Apriori algorithm. Find frequent item-sets and association rules from the transaction database given below using FP-growth algorithm. Assume minimum support is 50% and minimum confidence is 60%. (Unit 5)

Transaction_ID Items purchased
1 Sausage, peanut, Beer
2 peanut, Beer, Apple
3 Apple, Milk
4 Sausage, peanut, Apple
5 Sausage, peanut, Beer, Milk
6 Sausage, peanut, Beer, Apple
2

When multilayer perceptron is better choice over other classification algorithms? Consider a multilayer feed-forward neural network given below. Let the learning rate be 0.5. Assume initial values of weights and biases as given in the table below. Train the network for the training tuples (1, 1, 0) and (0, 1, 1), where last number is target output. Show weight and bias updates by using back-propagation algorithm. Assume that sigmoid activation function is used in the network. (10)

w13 w14 w23 w24 w35 w45 b3 b4 b5
0.5 0.2 -0.3 0.5 0.1 0.3 0.6 -0.4 0.8
3

Why OLAP operations are used? Discuss various OLAP operation with suitable example of each. (Unit 1)

Section B

Attempt any eight questions.

4

Suppose that we have 5 dimensional data. What will be total number of cuboids generated? If we consider each dimension has 5 levels, what will be the number of cuboids generated?

5

Discuss different types of attributes with suitable example of each. (Unit 2)

6

Why data normalization is important in data mining? Explain min-max and Z-score normalization approach. (Unit 3)

7

What are two categories of hierarchical clustering? Divide the following data points into two clusters using agglomerative clustering. (Unit 7)
{ {(2,10), ((2,5), (8,4), (5,8), (7,5), (6,4))

 

8

Discuss the concept of K-means++ and Mini-batch K-means algorithm. (unit 7)

9

What is confusion matrix? Discuss various classification measures along with their mathematical formulae. (Unit 6)

10

What are application areas of graph mining? Explain the concept behind inductive logic programming with suitable demonstration. (Unit 8)

11

Discuss the concept of text mining with its practical implications. (Unit 9)

12

Write down short notes on:

a. Data Mart (unit 1)

b. Market Basket Analysis (Unit 5)