Dataset link
Source on Kaggle: Salary Prediction Classification
Checklist of completed tasks:
Task 1
- Select a real-world classification problem
- Select a suitable dataset for the chosen problem (must have 1200 samples after pre-processing)
- Select more than one approriate ML algo
- Evaluate the created models on the selected data
- Tune the models to achieve better performace
Task 2
- 5 min demo video, with execution of all stages and output
- Demo must describe what is happening and give clearn reasoning
- Ensure proper visibility of the Text, Table, Graphs, reasonable font size, no noise, not blurry.
- Must use Jupyter Notebook
Task 3
- Analysing and pre-processing the data
- Applying different algorithms and methods to build learning models
- Making appropriate adjustments to improve the models’ performances
- Evaluating the models (metrics, cross validation, confusion matrices, etc.)
- Comparing the approaches and results of other existing pieces of work on the same problem
Report:
- Problem statement
- Existing approaches or methods and their results
- Similarities and differences between your work and the existing work
- Analysis and Evaluation
- Conclusion presentation:
- Logical structure with clear and appropriate sections and subsections
- Appropriate and consistent format and presentation
- Correct references (datasets, models, figures, etc) and in-text citations
- Good scientific/academic writing
- Complete source code as text in Appendix B
Notes:
- Your reports should focus on how algorithms/methods/techniques are actually applied or developments that are novel and specific to your work rather than how they work theoretically
- Your report should include appropriate outcomes such as data analysis diagrams, outcomes from the models, code snippets, etc. to support your text.
- Include all your source code as text in Appendix B at the end of the report. Do not use screenshots of your code in Appendix B Your code muse be presented as text (see coursework template).
- A course work template is provided as a guide in “Assessment” section on Aula
- The 2000-word limit is the absolute maximum word count for the whole report. Reports that are more than 10% over the word limit will result in a reduction of 10% of the marks e.g., a mark of 60% will be reduced by 6% to 54%. The word limit includes quotations, but excludes the (GitHub, datasets, OneDrive) URLs, bibliography, reference list, and appendices (see coursework template)
Task 4
- One submission file
- Must contain GitHub link at the beginning of the report
- Repo must be accessible by examiners
- Readme present with link of dataset used
- Source code with appropriate comments and annotations
- Demo video