Skip to content
Permalink
98df406b54
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
375 lines (375 sloc) 11.2 KB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Is the weather good to play outside?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The folder `datasets` contains two files:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"- `weather.numeric.csv`:\n",
"\n",
"```\n",
"temperature,humidity,windy,play\n",
"85,85,0,no\n",
"80,90,1,no\n",
"83,86,0,yes\n",
"70,96,0,yes\n",
"68,80,0,yes\n",
"65,70,1,no\n",
"64,65,1,yes\n",
"72,95,0,no\n",
"69,70,0,yes\n",
"75,80,0,yes\n",
"75,70,1,yes\n",
"72,90,1,yes\n",
"81,75,0,yes\n",
"71,91,1,no\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- `weather.nominal.csv`:\n",
"\n",
"```\n",
"outlook,temperature,humidity,windy,play\n",
"sunny,hot,high,FALSE,no\n",
"sunny,hot,high,TRUE,no\n",
"overcast,hot,high,FALSE,yes\n",
"rainy,mild,high,FALSE,yes\n",
"rainy,cool,normal,FALSE,yes\n",
"rainy,cool,normal,TRUE,no\n",
"overcast,cool,normal,TRUE,yes\n",
"sunny,mild,high,FALSE,no\n",
"sunny,cool,normal,FALSE,yes\n",
"rainy,mild,normal,FALSE,yes\n",
"sunny,mild,normal,TRUE,yes\n",
"overcast,mild,high,TRUE,yes\n",
"overcast,hot,normal,FALSE,yes\n",
"rainy,mild,high,TRUE,no\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use Decision Tress from the `scikit-learn` library to create accurate models, first for the numerical dataset, then for the nominal dataset.\n",
"\n",
"Explain your reasoning, and justify any choices of the hyperparameters (and/or run experiments to find the optimal ones).\n",
"\n",
"Use the provided datasets for training, and create testing datasets based on your experience.\n",
"\n",
"Evaluate your models, and use visualisation to show the trees and any relevant plots."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Marking scheme\n",
"\n",
"|Item|Mark|\n",
"|:----|---:|\n",
"|**Numerical dataset**:||\n",
"|Explanation, Justification|/4|\n",
"|DT model|/3|\n",
"|Evaluation|/3|\n",
"|**Nominal dataset**:||\n",
"|Explanation, Justification|/4|\n",
"|DT model|/3|\n",
"|Evaluation|/3|\n",
"|||\n",
"|**Total**: |/20|\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2022-10-25T11:37:52.790405Z",
"start_time": "2022-10-25T11:37:50.972952Z"
}
},
"outputs": [
{
"ename": "",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[1;31mRunning cells with 'Python 3.11.0 ('venv': venv)' requires ipykernel package.\n",
"\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
"\u001b[1;31mCommand: '\"c:/Users/Ali Ibrahim/Downloads/7159CEM-Portfolio-main/venv/Scripts/python.exe\" -m pip install ipykernel -U --force-reinstall'"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn import tree\n",
"from sklearn.metrics import accuracy_score\n",
"import warnings\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2022-10-25T11:37:52.837819Z",
"start_time": "2022-10-25T11:37:52.790405Z"
}
},
"outputs": [
{
"ename": "",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[1;31mRunning cells with 'Python 3.11.0 ('venv': venv)' requires ipykernel package.\n",
"\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
"\u001b[1;31mCommand: '\"c:/Users/Ali Ibrahim/Downloads/7159CEM-Portfolio-main/venv/Scripts/python.exe\" -m pip install ipykernel -U --force-reinstall'"
]
}
],
"source": [
"data1 = pd.read_csv('datasets/weather.numeric.csv')\n",
"data2 = pd.read_csv('datasets/weather.nominal.csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2022-10-25T11:37:52.890705Z",
"start_time": "2022-10-25T11:37:52.843562Z"
}
},
"outputs": [
{
"ename": "",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[1;31mRunning cells with 'Python 3.11.0 ('venv': venv)' requires ipykernel package.\n",
"\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
"\u001b[1;31mCommand: '\"c:/Users/Ali Ibrahim/Downloads/7159CEM-Portfolio-main/venv/Scripts/python.exe\" -m pip install ipykernel -U --force-reinstall'"
]
}
],
"source": [
"#data1 \n",
"\n",
"\n",
"X= data1.drop(columns=['play'], axis=1)\n",
"\n",
"y= data1['play']\n",
"\n",
"X_train, X_test, y_train, y_test= train_test_split(X,y,test_size= 0.3)\n",
"\n",
"d1tree= DecisionTreeClassifier()\n",
"d1tree.fit(X_train,y_train)\n",
"predictions= d1tree.predict(X_test)\n",
"\n",
"tree.plot_tree(d1tree)\n",
"plt.show()\n",
"\n",
"from sklearn.metrics import classification_report, confusion_matrix\n",
"print(\"Accuracy:\",accuracy_score(y_test,predictions))\n",
"\n",
"print(confusion_matrix(y_test,predictions))\n",
"print('\\n')\n",
"print(classification_report(y_test,predictions))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Decision Tree:- \n",
"\n",
"![title](numeric.png)\n",
"\n",
"\n",
"Accuracy: 0.6\n",
"[[0 2]\n",
" [0 3]]\n",
"\n",
"\n",
" precision recall f1-score support\n",
"\n",
" no 0.00 0.00 0.00 2\n",
" yes 0.60 1.00 0.75 3\n",
"\n",
" accuracy 0.60 5\n",
" macro avg 0.30 0.50 0.37 5\n",
"weighted avg 0.36 0.60 0.45 5"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2022-10-25T11:37:52.922669Z",
"start_time": "2022-10-25T11:37:52.895684Z"
}
},
"outputs": [
{
"ename": "",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[1;31mRunning cells with 'Python 3.11.0 ('venv': venv)' requires ipykernel package.\n",
"\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
"\u001b[1;31mCommand: '\"c:/Users/Ali Ibrahim/Downloads/7159CEM-Portfolio-main/venv/Scripts/python.exe\" -m pip install ipykernel -U --force-reinstall'"
]
}
],
"source": [
"data2 \n",
"\n",
"\n",
"from sklearn import preprocessing\n",
"string_to_int= preprocessing.LabelEncoder() #encode your data\n",
"data2=data2.apply(string_to_int.fit_transform) #fit and transform it\n",
"\n",
"#To divide our data into attribute set and Label:\n",
"feature_cols = ['outlook','temperature','humidity','windy']\n",
"X = data2[feature_cols] #contains the attribute \n",
"y = data2.play #contains the label\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30) \n",
"\n",
"d2tree =DecisionTreeClassifier(criterion=\"entropy\", random_state=100) # create a classifier object\n",
"d2tree.fit(X_train, y_train) \n",
"\n",
"y_pred= d2tree.predict(X_test) \n",
"\n",
"\n",
"print(\"Accuracy:\",accuracy_score(y_test, y_pred))\n",
"\n",
"print(confusion_matrix(y_test, y_pred)) \n",
"print(classification_report(y_test, y_pred)) \n",
"\n",
"tree.plot_tree(d2tree)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Decision Tree:-\n",
"\n",
"![title](nominal.png)\n",
"\n",
"Accuracy: 0.4\n",
"[[1 1]\n",
" [2 1]]\n",
" precision recall f1-score support\n",
"\n",
" 0 0.33 0.50 0.40 2\n",
" 1 0.50 0.33 0.40 3\n",
"\n",
" accuracy 0.40 5\n",
" macro avg 0.42 0.42 0.40 5\n",
"weighted avg 0.43 0.40 0.40 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Conclusion\n",
"\n",
"By using scikit-learn we implemented the decision tree with the hyper parameters being temperature, humidity and windy for \"Numeric\" as well as outlook, temperature, humdity, windy for \"Nominal\" therefore having the target as play. Firstly, we store the data set into paramters naming the \"Data1 and Data2\" then we store the hyper parameters in variable X and target in variable Y. Then we use the train_ test_split function from Sklearn library to split the dataset into training parameters and test parameters. After that we create the decison tree classfier as d1tree and d2tree for the decision trees of our data sets. Now we use fit function to train the decision tree and make predcition using d1tree.predict. Then we plotted the decision tree and print the accuracy score. \n",
"From obeservation and experimentation we can say that if we increase the size of the train set, the accuracy of the decision tree also increases. \n",
"\n",
"Please refer to DT.py for python code. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# List of references\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How to create a decision tree classification model using scikit-learn. (2022, May 1). https://practicaldatascience.co.uk/machine-learning/how-to-create-a-decision-tree-model-using-scikit-learn\n",
"\n",
"Robinson, S. (2022, July 21). Decision Trees in Python with Scikit-Learn. Stack Abuse. https://stackabuse.com/decision-trees-in-python-with-scikit-learn/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.11.0 ('venv': venv)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"vscode": {
"interpreter": {
"hash": "9646fcfabfca22912ce5fe7fa2239f453c97b6dafcc6a8d175371d4d5afbb8ca"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}