Language Identification (LID) of spoken audio using Convolutional Neural Networks (CNNs) on Mel-Spectrograms of the audio clips.
Original, raw dataset was pruned to 2000 audio clips per language. Download the full datasets from: https://commonvoice.mozilla.org/en/datasets Curated datasets have been pruned due to high upload sizes
Models haven't been included due to them being ~700MB Find the models at: https://livecoventryac-my.sharepoint.com/:f:/g/personal/shawc15_uni_coventry_ac_uk/Es74pTfbWIpBjVkXBASkYe0BB-Bid47fntVuYAPMLKyXFw?e=kNiJqP (Only Coventry University Outlook accounts have access.) Add the models to the "7088_Spoken_LID_CNN/models/" directory.
FFMPEG Executable files, used for converting MP3 files to WAV files haven't been included in the commit. Find them at: https://ffmpeg.org/ Put these FFMPEG executable files (ffmpeg.exe, ffprobe.exe & ffplay.exe) in the "7088_Spoken_LID_CNN/datasets/" & "7088_Spoken_LID_CNN/prototyping/outsider_dataset/" directories. Or, find another way to convert the MP3 audio files to WAV format.
This repository is submited in conjunction with the project report. Refer to the report and Appendix 1 if any confusion arises.