Identify the datasets to be used for your data mining analysis. The project should utilize at least two publicly available datasets, which have not been used in any other assignments in the class.
Submit the following items for this assignment:
A short description of the datasets
Links to the dataset
An explanation of how those datasets can be studied together and how this study will contribute a business or society.
Report accomplishments on each of the following steps of your project.
Identify the best technology to conduct data conversion, data cleaning, and data munging. Apply those techniques to your selected dataset and produce a single merged dataset for further analysis.
Identify the research question/or a broader goal and what characteristics (variables) you will need to study.
Identify the need or a potential for a need in distributed computing in order to store, manipulate, or analyze data.
Conduct the preliminary analysis by running one of the data mining techniques (e.g. clustering, or regression).
Interpret and report the preliminary results of the analysis. Use an appropriate format (e.g. tables, charts) to report the results of the analysis; writing must include results-based response to the research question.