GitHub - liy312/6001CEM-Final_Year_Project

6001CEM-Final_Year_Project

Noted:

Because the model weight file obtained after the project training is too large to upload to GitHub, GitHub only contains the training and prediction code. The full project code including the model weight file has been uploaded to Coventry University Onedrive, Link: https://livecoventryac-my.sharepoint.com/:f:/r/personal/liy312_uni_coventry_ac_uk/Documents/6001CEM-Final_Year_Project?csf=1&web=1&e=dkMp6V

ABSA E2E

Sentiment analysis is a highly popular research field in natural language processing. Traditional emotion analysis relies on individual expressions of single emotions, resulting in rough classification of emotional polarity. However, sentiment analysis is much more complex in practice, and coarse-grained sentiment analysis alone is no longer sufficient. This project addresses this issue by comparing the results of different classification layers combined with different pre-training models to find the best match. Ultimately, the RoBerta model is chosen for aspect-level sentiment analysis of a Twitter comment dataset. An interface is designed to connect the front-end application and realize practical value. Agile solutions are applied throughout the development process. Finally, the paper summarizes and reflects on the project results, provides meaningful suggestions, and explores prospects for fine-grained sentiment analysis research methods.

ABSA analysis on SemEval 2014 Task 4 and SemEval 2016 Task 5.

.
├── checkout
│   ├── data_processing_log.txt
│   ├── state_dict  //saved model
│   ├── test_log.txt
│   └── training_log.txt
├── config
│   └── config.py
├── data
│   ├── elmo  //elmo pretrained models
│   │   ├── elmo_2x4096_512_2048cnn_2xhighway_options.json
│   │   └── elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5
│   ├── glove  //glove pretrained embeddings
│   ├── Semeval2014
│   ├── processed //processed files
│   │   ├── Restaurants_dev_v2.csv
│   │   ├── Restaurants_test_v2.csv
│   │   └── Restaurants_Train_v2.csv
│   └── raw  //raw SemEval xml data file
│   │   ├── Laptops_Train.xml
│   │   ├── Laptop_Train_v2.xml
│   │   ├── Restaurants_Train_v2.xml
│   │   └── Restaurants_Train.xml
│   ├── Semeval2016
│   │   ├── processed
│   │   └── raw
│   └── stopwords.txt
├── models
│   ├── downstream.py  //Linear, LSTM, Self-Attention, CRF
│   └── pretrain_model.py  
├── README.md
├── requirements.txt
├── results    
│   ├── total_train_log.csv  // 194 training records
│   └── readme.md  // more results
├── test.py
├── train.py
├── train.sh
└── utils
    ├── data_utils.py
    ├── metrics.py
    ├── processer.py
    └── result_helper.py

Experiment

Main Results:

CE (Co-Extract) F1: Macro f1 for 4 classes during testing. (Not aspect, aspect-pos, aspect-neg, aspect-neu).

AE (Aspect Extract) F1: Macro f1 for 2 classes during testing. (Not aspect term, aspect term)

PC (Polarity Classify) F1: Macro f1 for 3 classes during testing. (aspect-pos, aspect-neg, aspect-neu)

BP (Broken Prediction): Number of predictions with inconsistent polarity for target aspect term. i.e. B-neg, I-pos, E-pos

To Run

Step 1: Process raw data

cd utils
python processer.py --model_name "bert" --seed 6 --max_seq_len 128

--model_name : "bert", "elmo" "Roberta" or "glove"

--seed: selected random seed.

--split_ratio 0.8 0.1 0.1 : split ratio for train, dev, test set

Step 2: train Model (cd E2E_ABSA folder)

python train.py --mode "res14" --downstream "san" --model_name "bert" --seed 6

--mode : res14 , res16 or lap14. The SemEval task to train on.

--downstream : linear, lstm, crf, lstm-crf or san. The downstream model.

--model_name : "bert", "elmo" "Roberta" or "glove", same asstep 1.

--seed : seed for record training log, same as step 1.

some other default settings:

--lr 5e-5 --batch_size 32 --loss "focal" --gamma 2 --alpha 0.75 --max_seq_len 128 --optimizer "adamw" --warmup_steps 300 --max_steps 3000

training log path: ./checkout/training_log.txt

Unzip static.rar

/python main.py  & db.py & Tw.py



## Reference

SemEval official

[2016 task5](https://alt.qcri.org/semeval2016/task5/index.php?id=data-and-tools)

[2014 task4](https://alt.qcri.org/semeval2014/task4/index.php?id=data-and-tools)

Pretrained ELMo File

[weights](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5)

[options](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json)

README.md

6001CEM-Final_Year_Project

Noted:

ABSA E2E

Experiment

To Run

Step 1: Process raw data

Step 2: train Model (cd E2E_ABSA folder)

About

Releases

Languages

liy312/6001CEM-Final_Year_Project

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

6001CEM-Final_Year_Project

Noted:

ABSA E2E

Experiment

To Run

Step 1: Process raw data

Step 2: train Model (cd E2E_ABSA folder)

About

Resources

Stars

Watchers

Forks

Releases

Languages