How to do a master’s project
This document is aimed at students attempting a master’s dissertation in Data Science or similar discipline. The guidance here is specifically for students being supervised at the Centre for Computational Science and Mathematical Methods. Our overall aim for such students is to prepare them for entering into a PhD programme should they wish to. For this reason, the recommendations here may differ somewhat from recommendations given to the rest of the faculty.
Doing a master’s project is similar to doing a PhD but whereas a PhD dissertation is required to be novel, a master’s dissertation is definitely not required to involve a novel element. This does not mean that the work should not be rigorous and thorough. Great projects need to demonstrate mastery of both the topic at hand and the skills and techniques used to research that topic.
It can be tempting to think that a few more marks will be available to the student that produces novel research. However, this is a false hope. In fact, many students get led astray in their attempt to `push the envelope’ and end up failing to do a thorough analysis of the comparison of their work with past work because they are so convinced that they need to show that they have produced something better than past work.
So, at the very start, consider this a reminder that the master’s project does not need to have a novel element.
Nonetheless, there may be new knowledge generated. For example, it may be that the student discovers that some past work is incorrect. If this can be done, this is a significant finding. However, some care is also needed here to ensure that there really is a fault with the past work.
Aim to replicate an existing work
Our current scientific environment tends to emphasise publication quantity over quality. Meta-studies of scientific literature have recently found that only a small percentage of existing published research is actually correct. There are many reasons for this but one obvious failure is that there is little incentive for researchers to try to reproduce each other’s work. Thus, in the master’s programme, we actively encourage students to try to reproduce the results in an existing paper rather than produce novel science of their own.
This approach has two benefits. First, there is the obvious benefit of increasing the number of replication studies and thus improving the chance that bad science will be spotted and weeded out. Second, the benefit is to the student. The existing paper acts as a roadmap showing what sort of graphs and tables should be produced for the work to be successful.
How to select a worthwhile paper
When looking for a topic, you will need to search for a research paper that explores that topic. One way to do this is with https://scholar.google.com. This provides a citation count, which is an indication of the quality of the paper (but by no means a guarantee).
To judge whether the paper is worthwhile, go through the following checks:
- Is it readable and understandable? If the writing is unclear (or not in correct English) move on to the next paper. It is also worthwhile looking at the quality of the diagrams and figures and making sure that they look professional as this is a good indicator of quality. Similarly, if it involves a topic that you don’t understand, you should probably not invest too much time trying to understand before looking elsewhere.
- Does it contain graphs and tables of results that you could potentially reproduce? A purely theoretical paper may not be appropriate unless you are that way inclined. If you cannot imagine how to reproduce the results then this may be a stumbling block too.
- Check that source data is available. If the paper relies on private or confidential data-sets, it will be hard (or impossible) to reproduce.
- If you are unsure about a paper, ask your supervisor.
Once you have selected the paper, remember that your aim is to reproduce the work rather than best it. At every stage, you must make sure your results are comparable with the results in the paper. When you discuss the aim of the project in the introduction to your thesis, the primary aim that you write should be to reproduce the paper but you can also discuss the paper’s aim here.
Write using appropriate tools
In this discipline, the appropriate document tool is LaTeX and not MS Word. Please do not submit your thesis or even draft versions using MS Word.
It is not necessary for you to learn how to use LaTeX directly, however. You can instead use some other tools that are simpler to learn but still produce LaTeX as an intermediary step. For example, this document is written using Org-mode, which is available in GNU Emacs. However, another easy option is to use RStudio and Markdown.
Select an appropriate methodology
For some reason, many students seem to think that the `waterfall’ methodology is an appropriate one. The main advantage of this approach to the student seems to be that it is well aligned with `putting things off until the last minute’.
The recommended approach for the master’s project is to a form of agile methodology similar to Feature-driven development. Specifically, you should start by writing a list of features. For example, given your selected paper, one `feature’ might be to produce one line in a table of results. You then develop all parts of that feature. That is, you should:
- Write about any background literature that you needed to reference in your `background’ chapter.
- Describe the method that you are going to use in your `methods’ chapter.
- Write code that produces results and
- Get those results (graphs or tables) into your `results’ chapter along with any associated description.
Only when one feature is finished should you move on to the next feature. At every stage, you can check the other parts of your document to make sure you have something that is ready to be submitted. You may also need to update your plan of features to ensure that it makes sense.
Don’t wait until all the results are in and all the code is developed before you start writing. You can actually start writing a draft from the beginning of the very first meeting. Of course, you won’t have much content yet—just a few headings and perhaps a preliminary title. But that is enough to get started and to make sure you understand how to use the document tools. Bring the PDF, in whatever state it is in, to each meeting.
Draw your own figures
Someone else has done a great job at drawing a diagram of a network with some wireless motes or a neural network or whatever. Why on earth do we want you to do your own version? The first reason is that it is a violation of copyright to copy other’s diagrams and put them in your own document. It may seem like it helps that you add a citation but unless you received authorisation from the publisher, this is always illegal. Unlike downloading “Forest Gump” though, the illegal activity is happening in public with your name on it. So—not just illegal—also stupid.
The second reason that you should create your own version is that your work is being marked by a professor who understands about copyright. They can’t give you marks for cutting and pasting an image from someone else’s work. More likely they will take some marks off because you wasted their time working out that you just cut and pasted an image from somewhere. Citing the source is an improvement over using an image without citation but, unless you obtain permission, does not make your reuse of someone else’s image legal.
What about the “everyone is doing it” defence? Perhaps lots of uninformed undergraduate students do it. But in the publishing world, copyright is a big deal. In fact, copyright is the basis for open source licenses. On this basis, publishing houses, such as the IEEE, have a web-based way of requesting permission for reproducing an image.
If you want to better understand plagiarism, I can highly recommend the article by Paintedfrog, which describes in detail a PhD thesis containing plagiarism.
Develop a pipeline
A common problem when developing research is that results need to be revised as problems with code are resolved. For example, it may be that initially results were generated in grams but later this was changed to kilograms (kilogram is the correct SI unit). Perhaps this change requires replotting a graph or regenerating a table. Thus, it can help if you automate the process of getting your results into the document (are you starting to see why non-MS-Word tools are preferred?). For more information on methods of doing this, see the presentation on making your work replicable.
Get with the github
The university provides access to a Git repository that allows you to share code and draft versions of your thesis with your supervisor. Make sure you understand how to use it effectively so that you can keep track of versions of your code (and thesis) throughout the running of your project. It works best if you install Git onto your computer rather than transfer files via the web interface.
The aim of this document is to help you produce a great master’s dissertation. As with any project, expect the unexpected—difficulties can arise and some parts may take longer than expected. Thus it is important to adopt an approach that produces a document that is ready to submit as early as possible and then, through iteration, gradually improves and deepens the work.