Data mining and data warehousing

Question One

Define Frequent Pattern Analysis and cite its applications. What are the different methods used?

Question Two

Explain in your own words the difference between “supervised learning” and “unsupervised learning”. Cite some examples of use of each one.

Question Three

Given the testing dataset and the constructed decision tree below. Calculate the accuracy, error rate, sensitivity, specificity, precision, and recall. The model predicts if a person will buy a computer or not based on his/her information.

 

age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair no
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent no
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

Question Four

Explain the k-means algorithm. Cite a software/program (except weka) or an online tool providing the k-means algorithm (screenshot is required).

Question Five

Consider the database containing transaction data as shown in the table below. Apply Apriori algorithm and find the frequent itemsets where min-sup=2.

 

TID Items Bought
1 Daiper, Bread, Juice
2 Eggs, Bread, Pasta
3 Daiper, Eggs, Bread, Pasta
4 Eggs, Pasta
5 Rice