Data Collection
Teams can use any dataset for their final project. Following are two sources of data –
- UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/datasets.php - Kaggle
https://www.kaggle.com/datasets
Data Analysis: Teams can use any software (e.g., MS Excel, R, Python, IBM SPSS, etc.) to analyze their data. Any analysis method (e.g., regression, classification, time–series analysis, etc.) that is appropriate for the dataset and the problem is acceptable.
Final Project Report: Each team will submit a report that discusses the aspects mentioned below.
Report page size should be Letter (8.5” X 11”) with 1” margins all around; font should be Times New Roman 12 pt. single spaced; and maximum page limit for the report is 8 pages. Tables, graphs, and visualizations should be numbered and near the text that describes them. Following are some of the aspects each team needs to include in the report –
I. Introduction to the problem: A brief discussion on the specific real–world problem the team is trying to solve.
II. Introduction to the dataset: A brief description of the data source and the data collection process. An explanation of how the dataset relates to the real–world problem. Explanation of any steps taken to prepare the data for analysis.
III. Explanation of the variables: Definitions and discussions on each variable in the dataset.
IV.Analysis method: Description of the method used to analyze the data. Justification of the method used to analyze the data (i.e., how the chosen method solves the real–world problem).
V. Descriptive statistics: Discussion on the descriptive statistics for each variable.
VI. Results of the analysis: Discussion on the results of the analysis.
VII. Decisions/recommendations: What decisions/recommendations can be made based on the results of the analysis. How does the analysis solve the real–world problem?
VIII. Conclusion: Concluding remarks.