Effects of data balancing and augmention on accuracy

Author: Tereso del Río

Date: October 2022

This repository contains proof of data balancing and data augmentation's impact on accuracy.

It also contains an installable package called dataset_manipulation that can balance and augment polynomial data. It is not necessary to install it.
Running the module main.py, files called ml_tested_in_normal.csv and ml_tested_in_normal.csv (also included in the repository) will be generated showing a comparison between a variety of models trained in data without manipulation, in balanced data and in augmented data.
- For some models, there is not a big difference, but keep in mind that these models worked as well as random (accuracy close to 0.167) when using the hyperparameters found by Florescu in [1].
- However, there is an amazing improvement in random forest and k-nearest-neighbours, where accuracies have an increment of up to 50% when data is augmented.

[1] Florescu, D., England, M. (2020). A Machine Learning Based Software Pipeline to Pick the Variable Ordering for Algorithms with Polynomial Inputs. Bigatti, A., Carette, J., Davenport, J., Joswig, M., de Wolff, T. (eds) Mathematical Software, ICMS 2020. ICMS 2020. Lecture Notes in Computer Science, vol 12097. Springer, Cham. https://doi.org/10.1007/978-3-030-52200-1_30

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
Heuristics		Heuristics
config		config
datasets		datasets
packages		packages
utils		utils
README.md		README.md
basic_ml.py		basic_ml.py
choose_hyperparams.py		choose_hyperparams.py
create_clean_dataset.py		create_clean_dataset.py
find_filename.py		find_filename.py
main.py		main.py
main_heuristics.py		main_heuristics.py
main_regression.py		main_regression.py
main_reinforcement.py		main_reinforcement.py
make_plots.py		make_plots.py
output.txt		output.txt
preprocessing_Dorians_features.py		preprocessing_Dorians_features.py
replicating_Dorians_features.py		replicating_Dorians_features.py
requirements.txt		requirements.txt
run_for_paper.py		run_for_paper.py
test_models.py		test_models.py
test_train_datasets.py		test_train_datasets.py
train_models.py		train_models.py
yaml_tools.py		yaml_tools.py

Repository files navigation

Effects of data balancing and augmention on accuracy

Author: Tereso del Río

Date: October 2022

About

Releases

Languages

delriot/AugmentingMathematicalDataset

Folders and files

Latest commit

History

Repository files navigation

Effects of data balancing and augmention on accuracy

Author: Tereso del Río

Date: October 2022

About

Resources

Stars

Watchers

Forks

Releases

Languages