2 Marks

 

Learning Outcome(s):

Demonstrate a wide range of clustering, estimation, prediction, and classification algorithms to solve a specific program or application.

Question One

By using Cosine Similarity Formula, find the similarity between documents: Document 1 (A) and Document 2 (B), with given value of A and B is as follows:

Document 1: [1, 1, 1, 1, 1, 0] let’s refer to this as A

Document 2: [1, 1, 1, 1, 0, 1] let’s refer to this as B

Above we have two vectors (A and B) that are in a 6-dimension vector space

[Given formula Cosine similarity (CS) = (A . B) / (||A|| ||B||)].

3 Marks

 

Learning Outcome(s):

Demonstrate a wide range of clustering, estimation, prediction, and classification algorithms to solve a specific program or application.

Question Two

1000 people (350 less than or equal to 20 years old, and 650 greater than 20 years old) were asked, “Which take-out food do you prefer – junk food or healthy food?

The results were:

  Junk food Healthy food
Ages <= 20 225 125
Ages > 20 350 300

 

Calculate chi-square

Note :

      Expected value   is calculated using the following equation

    =

2.5 Marks

 

Learning Outcome(s):

Demonstrate a wide range of clustering, estimation, prediction, and classification algorithms to solve a specific program or application.

Question Three

What is the Manhattan distance between different points as shown below? Fill the table with appropriate Manhattan distances. As an example, the distance between points A and C is computed in the appropriate table cell.

  A B C D
A        
B        
C 13      
D        

 

0.5 Marks

 

Learning Outcome(s):

Employ data mining and data warehousing techniques to solve real-world problems.

Question Four

Apply the discretization filter in iris dataset. (Note: iris dataset can be directly loaded into WEKA from the “C:\Program Files\Weka-3-8\data” link). After applying the discretization filter, list all the features (attributes