=====Project===== ==== Task ==== Choose an existing dataset or collect your own data, and carry out data analyses to explain the relationships among the variables involved. You can also compare performance of different statistical tools for prediction or classification, using the chosen dataset. ==== Teams ==== For this project you are supposed to work with 1-2 persons and submit a joint report. If you cannot find a team member, they will be assigned to you. (This is not recommended. Try to find a student who is compatible with your work style.) ==== Grading policies ==== Team members will receive the same grade for the project and it is up to you to make sure that the work is shared equitably. The total points of the project is 100 points, which can be divided into three parts: - Project Proposal (20 pts). - Presentation (40 pts): each team will give a 20 minutes presentation of the project; (dates to be assigned) - Final report (40 pts) ==== Schedule ==== See syllabus. /* * Project proposal and declaration of team members: due by October 30 * Preliminary report: due by November 20 * Project presentations: December 1 - December 8 * Final report: due by December 8 */ ==== Methods ==== You can use methods we covered in class or other statistical learning methods, with preference for more advanced methods. ==== Data ==== Find your own data set online, you will find plenty; Popular collections of publicly-available datasets: * [[https://archive.ics.uci.edu/ml/index.php | UCI Machine Learning Repository]] * [[https://www.kaggle.com/datasets | Kaggle ]] * [[https://academictorrents.com/browse.php?cat=6 | Academic Torrents]] (shares large datasets via bit torrent technology) * [[http://lib.stat.cmu.edu/datasets/ | StatLib ]] (This is an older collection of data which is no longer updated.) Some institional collections of data: * [[https://data.worldbank.org | World Bank]] * [[https://www.who.int/data/collections | World Health Organization]] ==== Guidelines ==== * The proposal should give information about the team, description of the data, potential research questions and possible methods to use. The proposal should not exceed one page. The proposal, preliminary and final reports should be uploaded via Google form: [[https://forms.gle/fQoNw817KDPyZzgDA | Google Form for Project files]]. * The preliminary report is a draft of the final report.  * The final report should not exceed 6 pages, including figures and tables, and must begin with an appropriate title highlighting your choice of topic and analysis. * The final report should include: * Description of research questions / issues. The significance of the problems. * Description of the data. * Preliminary studies: data visualization, dimension reduction, feature extraction, feature selection etc. * Statistical analysis: - Methods: what analyses were done and why. If there is any challenge in analysis, describe your approach to tackle the problem. - Results: A small number of well-designed and tailored tables and graphics may be appropriate.  * Discussion/Conclusion: Convey your findings to a broad audience.  * Try to avoid including too much of the software output. Supporting code can be kept in appendix or a separate document. * The final report will be evaluated on the basis of the following criteria -How interesting is the dataset and the research idea. -How well the dataset was prepared for analysis. -Quality of the statistical analysis -Quality of material presentation [The grammar, orthography and style matter.] * The presentation in class can be done by a single team-member or by all team members, as you prefer it. It should include clear description of the data, research question(s) and findings. The evaluation criteria are similar to the criteria for the final report, with emphasis on the quality of presentation.