diff --git a/README.md b/README.md index 6ca2d3d..9c66880 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,16 @@ - -**Datasets have been pruned due to high upload sizes** +**Original, raw dataset was pruned to 2000 audio clips per language. Download the full datasets from: https://commonvoice.mozilla.org/en/datasets +**Curated datasets have been pruned due to high upload sizes** **Models haven't been included due to them being ~700MB** +Find the models at: https://livecoventryac-my.sharepoint.com/:f:/g/personal/shawc15_uni_coventry_ac_uk/Es74pTfbWIpBjVkXBASkYe0BB-Bid47fntVuYAPMLKyXFw?e=kNiJqP +(Only Coventry University Outlook accounts have access.) +Add the models to the "_7088_Spoken_LID_CNN/models/_" directory. **FFMPEG Executable files, used for converting MP3 files to WAV files haven't been included in the commit. Find them at: https://ffmpeg.org/** +Put these FFMPEG executable files (ffmpeg.exe, ffprobe.exe & ffplay.exe) in the "_7088_Spoken_LID_CNN/datasets/_" & "_7088_Spoken_LID_CNN/prototyping/outsider_dataset/_" directories. Or, find another way to convert the MP3 audio files to WAV format. -