This repository contains a troll detection algorithm for my 6001CEM Final Year Project.
Social media integrated in as part of daily live for substantial number of people. Many people are using internet�based and social media as primary source of information. However, in recent years, anecdotal evidence has emerged linking to state-sponsored actors with an intension to manipulative public opinion. Disinformation transmitters such as trolls are often used to share prepared information or to manipulate individuals’ decisions. Abusing disinformation transmitters, leads to one of the biggest issues of social media influence – manipulation of public opinion. This paper attempts to introduce new solution to identification of trolling accounts on a popular Facebook platform. Solution implements various machine learning techniques and analyses applied group of features based on most common approaches along with the new ideas. As a result, best model achieved high accuracy 95% and high F1 score 83%. However, the model suffered from high variance what may be addressed in the future research. Consequently, based on analysed features, it can be concluded that features based on accounts’ profile content and profile bio information, brings significant information gain in identifying trolling accounts. Project includes brief analysis of collected data. Data has been collected between February 2022 and March 2022, from publicly accessible pages on Facebook. Analysis of acquired data implies that trolling accounts tends to have significantly more politically related content on their profile pages and posting frequency tends to be higher in comparison with legitimate accounts. Furthermore, trolls lean to share more posts from external sources which are mostly based on pictures.
Note that collected data are not present due to GDPR regulations. Data are only accessible for specific people through following link: https://livecoventryac-my.sharepoint.com/:f:/r/personal/hampacha_uni_coventry_ac_uk/Documents/6001CEM?csf=1&web=1&e=Bu1HYh
Code would welcome additional love in terms of optimalization. Feel free to update, play around with provided code.