Skip to content
Permalink
main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time

Traffic_accident_Severity_Classification_with_Pyspark

  1. Create_undersampled_data
    • Data pre-processing
      • Handles null values (filling na, dropping rows and dropping columns)
      • Randomizes the dataset and splits by severity into 4 CSV files.
      • These are imported in Work_final_balanced with a 5000 limit, balancing the labels equally.
      • Randomizing before repartition is important for diversity in date, city, state, ...
  2. WorkFinal_balanced
    • Data pre-processing
      • Initiate Label Indexer and OneHotEncoder classes of pyspark.ml.feature library (transform categorical features into numerical representations)
      • Initiate a Vector Assembler class of pyspark.ml.feature library (combining the columns into a single column)
      • Build a pipeline with Label Indexer, OneHotEncoder, and Vector Assembler.
      • Fit the data on the pipeline and transform the data.
    • Machine Learning models
      • Train a logistic regression model, decision tree classifier, and random forest classifier
      • Perform Hyperparameter tunning for Logistic regression and Decision tree classifier
    • Evaluate the models
      • Evaluate the models using True positive rate per label, False Positive rate per label, and F Measure per label
      • Evaluate the model using F1 Score, True positive rate, False positive rate, Precision, Recall, and Hamming Loss
  3. Visualizations accidents 2016-2021
    • Tableau visualizations of the data
      • Top 5 cities with most accidents
      • Total number of accidents per Year, Month, and Weekday
      • Number of accidents per weather condition and temperature in Celsius
      • Impact of road elements in the number of accidents
      • Dynamic time series visualization of number of accidents per month in each state

Link to the original Dataset: US Accidents (2016 - 2021).