2 Marks |
Learning Outcome(s):
Demonstrate a wide range of clustering, estimation, prediction, and classification algorithms to solve a specific program or application. |
Question One
By using Cosine Similarity Formula, find the similarity between documents: Document 1 (A) and Document 2 (B), with given value of A and B is as follows:
Document 1: [1, 1, 1, 1, 1, 0] let’s refer to this as A
Document 2: [1, 1, 1, 1, 0, 1] let’s refer to this as B
Above we have two vectors (A and B) that are in a 6-dimension vector space
[Given formula Cosine similarity (CS) = (A . B) / (||A|| ||B||)].
3 Marks |
Learning Outcome(s):
Demonstrate a wide range of clustering, estimation, prediction, and classification algorithms to solve a specific program or application. |
Question Two
1000 people (350 less than or equal to 20 years old, and 650 greater than 20 years old) were asked, “Which take-out food do you prefer – junk food or healthy food?
The results were:
Junk food | Healthy food | |
Ages <= 20 | 225 | 125 |
Ages > 20 | 350 | 300 |
Calculate chi-square
Note :
Expected value is calculated using the following equation
=
2.5 Marks |
Learning Outcome(s):
Demonstrate a wide range of clustering, estimation, prediction, and classification algorithms to solve a specific program or application. |
Question Three
What is the Manhattan distance between different points as shown below? Fill the table with appropriate Manhattan distances. As an example, the distance between points A and C is computed in the appropriate table cell.
A | B | C | D | |
A | ||||
B | ||||
C | 13 | |||
D |
0.5 Marks |
Learning Outcome(s):
Employ data mining and data warehousing techniques to solve real-world problems. |
Question Four
Apply the discretization filter in iris dataset. (Note: iris dataset can be directly loaded into WEKA from the “C:\Program Files\Weka-3-8\data” link). After applying the discretization filter, list all the features (attributes