Skip to content

shawc15/7088_Spoken_LID_CNN

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

7088_Spoken_LID_CNN

Language Identification (LID) of spoken audio using Convolutional Neural Networks (CNNs) on Mel-Spectrograms of the audio clips.

Original, raw dataset was pruned to 2000 audio clips per language. Download the full datasets from: https://commonvoice.mozilla.org/en/datasets Curated datasets have been pruned due to high upload sizes

Models haven't been included due to them being ~700MB Find the models at: https://livecoventryac-my.sharepoint.com/:f:/g/personal/shawc15_uni_coventry_ac_uk/Es74pTfbWIpBjVkXBASkYe0BB-Bid47fntVuYAPMLKyXFw?e=kNiJqP (Only Coventry University Outlook accounts have access.) Add the models to the "7088_Spoken_LID_CNN/models/" directory.

FFMPEG Executable files, used for converting MP3 files to WAV files haven't been included in the commit. Find them at: https://ffmpeg.org/ Put these FFMPEG executable files (ffmpeg.exe, ffprobe.exe & ffplay.exe) in the "7088_Spoken_LID_CNN/datasets/" & "7088_Spoken_LID_CNN/prototyping/outsider_dataset/" directories. Or, find another way to convert the MP3 audio files to WAV format.

This repository is submited in conjunction with the project report. Refer to the report and Appendix 1 if any confusion arises.

About

Language Identification (LID) of spoken audio using Convolutional Neural Networks (CNNs) on Mel-Spectrograms of the audio clips.

Resources

Stars

Watchers

Forks

Releases

No releases published