Machine learning
Your Task:
You must develop logistic regression, decision tree and neural network models that will identify whether stores will perform well or poorly. You can use Orange, Python, R, or any data mining package of your choice. The data for the assignment is in a file storedata.csv, which you can download from the same place you found this document. The data dictionary is given at the end of this document. You must follow the correct methodology to use the data to build and test your models.
What to Submit:
You must submit a single page infographic poster showing the results of your analysis. Create the poster using a word processor (like Word) or a presentation package (like PowerPoint). Set the page size to A3 and use a 12pt font. You can choose the layout, but you must include:
1. Put your student number (not your name) at the top of the poster
2. A list of the steps you took to carry out the project, including details of the train / validate / test split that you chose;
3. A table showing which variables you used and whether your model treats them as numeric (continuous or discrete) or categorical. Explain one consequence of your choices;
4. A single example of how you used a histogram to detect an error in the data and what you did to fix that error. State the data cleaning operation you carried out;
5. A table showing the different models and hyper parameters you trained, along with the correct metric for each; Add a sentence on how you chose the hyper parameter values.
6. A justification of the choice of the final model and a confusion matrix showing its results