Skip to content

liy312/6001CEM-Final_Year_Project

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

6001CEM-Final_Year_Project

Noted:

Because the model weight file obtained after the project training is too large to upload to GitHub, GitHub only contains the training and prediction code. The full project code including the model weight file has been uploaded to Coventry University Onedrive, Link: https://livecoventryac-my.sharepoint.com/:f:/r/personal/liy312_uni_coventry_ac_uk/Documents/6001CEM-Final_Year_Project?csf=1&web=1&e=dkMp6V

ABSA E2E

Sentiment analysis is a highly popular research field in natural language processing. Traditional emotion analysis relies on individual expressions of single emotions, resulting in rough classification of emotional polarity. However, sentiment analysis is much more complex in practice, and coarse-grained sentiment analysis alone is no longer sufficient. This project addresses this issue by comparing the results of different classification layers combined with different pre-training models to find the best match. Ultimately, the RoBerta model is chosen for aspect-level sentiment analysis of a Twitter comment dataset. An interface is designed to connect the front-end application and realize practical value. Agile solutions are applied throughout the development process. Finally, the paper summarizes and reflects on the project results, provides meaningful suggestions, and explores prospects for fine-grained sentiment analysis research methods.

ABSA analysis on SemEval 2014 Task 4 and SemEval 2016 Task 5.

.
├── checkout
│   ├── data_processing_log.txt
│   ├── state_dict  //saved model
│   ├── test_log.txt
│   └── training_log.txt
├── config
│   └── config.py
├── data
│   ├── elmo  //elmo pretrained models
│   │   ├── elmo_2x4096_512_2048cnn_2xhighway_options.json
│   │   └── elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5
│   ├── glove  //glove pretrained embeddings
│   ├── Semeval2014
│   ├── processed //processed files
│   │   ├── Restaurants_dev_v2.csv
│   │   ├── Restaurants_test_v2.csv
│   │   └── Restaurants_Train_v2.csv
│   └── raw  //raw SemEval xml data file
│   │   ├── Laptops_Train.xml
│   │   ├── Laptop_Train_v2.xml
│   │   ├── Restaurants_Train_v2.xml
│   │   └── Restaurants_Train.xml
│   ├── Semeval2016
│   │   ├── processed
│   │   └── raw
│   └── stopwords.txt
├── models
│   ├── downstream.py  //Linear, LSTM, Self-Attention, CRF
│   └── pretrain_model.py  
├── README.md
├── requirements.txt
├── results    
│   ├── total_train_log.csv  // 194 training records
│   └── readme.md  // more results
├── test.py
├── train.py
├── train.sh
└── utils
    ├── data_utils.py
    ├── metrics.py
    ├── processer.py
    └── result_helper.py

Experiment

Main Results:

CE (Co-Extract) F1: Macro f1 for 4 classes during testing. (Not aspect, aspect-pos, aspect-neg, aspect-neu).

AE (Aspect Extract) F1: Macro f1 for 2 classes during testing. (Not aspect term, aspect term)

PC (Polarity Classify) F1: Macro f1 for 3 classes during testing. (aspect-pos, aspect-neg, aspect-neu)

BP (Broken Prediction): Number of predictions with inconsistent polarity for target aspect term. i.e. B-neg, I-pos, E-pos

To Run

Step 1: Process raw data

cd utils
python processer.py --model_name "bert" --seed 6 --max_seq_len 128

--model_name : "bert", "elmo" "Roberta" or "glove"

--seed: selected random seed.

--split_ratio 0.8 0.1 0.1 : split ratio for train, dev, test set

Step 2: train Model (cd E2E_ABSA folder)

python train.py --mode "res14" --downstream "san" --model_name "bert" --seed 6

--mode : res14 , res16 or lap14. The SemEval task to train on.

--downstream : linear, lstm, crf, lstm-crf or san. The downstream model.

--model_name : "bert", "elmo" "Roberta" or "glove", same asstep 1.

--seed : seed for record training log, same as step 1.

some other default settings:

--lr 5e-5 --batch_size 32 --loss "focal" --gamma 2 --alpha 0.75 --max_seq_len 128 --optimizer "adamw" --warmup_steps 300 --max_steps 3000

training log path: ./checkout/training_log.txt

Unzip static.rar

/python main.py  & db.py & Tw.py



## Reference

SemEval official

[2016 task5](https://alt.qcri.org/semeval2016/task5/index.php?id=data-and-tools)

[2014 task4](https://alt.qcri.org/semeval2014/task4/index.php?id=data-and-tools)

Pretrained ELMo File

[weights](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5)

[options](https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published