Skip to content

lopesoll/Traffic_Accident_Severity_Classification_with_Pyspark

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.

Traffic_accident_Severity_Classification_with_Pyspark

  1. Create_undersampled_data
    • Data pre-processing
      • Handles null values (filling na, dropping rows and dropping columns)
      • Randomizes the dataset and splits by severity into 4 CSV files.
      • These are imported in Work_final_balanced with a 5000 limit, balancing the labels equally.
      • Randomizing before repartition is important for diversity in date, city, state, ...
  2. WorkFinal_balanced
    • Data pre-processing
      • Initiate Label Indexer and OneHotEncoder classes of pyspark.ml.feature library (transform categorical features into numerical representations)
      • Initiate a Vector Assembler class of pyspark.ml.feature library (combining the columns into a single column)
      • Build a pipeline with Label Indexer, OneHotEncoder, and Vector Assembler.
      • Fit the data on the pipeline and transform the data.
    • Machine Learning models
      • Train a logistic regression model, decision tree classifier, and random forest classifier
      • Perform Hyperparameter tunning for Logistic regression and Decision tree classifier
    • Evaluate the models
      • Evaluate the models using True positive rate per label, False Positive rate per label, and F Measure per label
      • Evaluate the model using F1 Score, True positive rate, False positive rate, Precision, Recall, and Hamming Loss
  3. Visualizations accidents 2016-2021
    • Tableau visualizations of the data
      • Top 5 cities with most accidents
      • Total number of accidents per Year, Month, and Weekday
      • Number of accidents per weather condition and temperature in Celsius
      • Impact of road elements in the number of accidents
      • Dynamic time series visualization of number of accidents per month in each state

Link to the original Dataset: US Accidents (2016 - 2021).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published