Skip to content
Permalink
main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# <font color='orange'><center>Sentiment Analysis of IMDB reviews</center></font>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### <font color='orange'>Table of Contents:</font> \n",
" 1. Importing the necessary libraries \n",
" 2. Importing the dataset \n",
" 3. Exploring the dataset \n",
" 4. Data preprocessing \n",
" 5. Train and test split \n",
" 6. Creating the model \n",
" 7. Training the model \n",
" 8. Evaluating the model "
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### <font color='orange'><center>1. Importing the necessary libraries</center>"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-03-11 19:18:11.481319: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA\n",
"To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
"2023-03-11 19:18:11.625306: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
"2023-03-11 19:18:11.629162: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/webots/lib/controller:/usr/local/webots/lib/webots\n",
"2023-03-11 19:18:11.629175: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.\n",
"2023-03-11 19:18:12.287767: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/webots/lib/controller:/usr/local/webots/lib/webots\n",
"2023-03-11 19:18:12.287873: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/webots/lib/controller:/usr/local/webots/lib/webots\n",
"2023-03-11 19:18:12.287879: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n"
]
}
],
"source": [
"import numpy as np\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"import warnings\n",
"%matplotlib inline\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"import keras\n",
"import tensorflow as tf\n",
"from keras import layers"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### <font color='orange'><center>3. Importing the dataset</center>"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from keras.datasets import imdb\n",
"\n",
"(training_data, training_targets), (testing_data, testing_targets) = imdb.load_data(num_words=10000)\n",
"\n",
"# combining data into two separate dataframes\n",
"data = np.concatenate((training_data, testing_data), axis=0)\n",
"targets = np.concatenate((training_targets, testing_targets), axis=0)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### <font color='orange'><center>3. Exploring the dataset</center>"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Categories: [0 1]\n",
"Number of unique words: 9998\n"
]
}
],
"source": [
"print( \"Categories:\", np.unique(targets) )\n",
"print( \"Number of unique words:\", len(np.unique(np.hstack(data))) )\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"<font color='orange'>This shows us that we have near 10k words worth of movie reviews, and we have 0, and 1 as categories (in other words true and false)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The Label has these values: [0 1]\n"
]
}
],
"source": [
"print(\"The Label has these values:\", np.unique(targets))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 0, 0, ..., 0, 0, 0])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"targets"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:matplotlib.legend:No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.\n"
]
},
{
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x7ff1f90c5f30>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n<svg xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"424.010625pt\" height=\"297.190125pt\" viewBox=\"0 0 424.010625 297.190125\" xmlns=\"http://www.w3.org/2000/svg\" version=\"1.1\">\n <metadata>\n <rdf:RDF xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n <cc:Work>\n <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\n <dc:date>2023-03-11T19:18:17.476621</dc:date>\n <dc:format>image/svg+xml</dc:format>\n <dc:creator>\n <cc:Agent>\n <dc:title>Matplotlib v3.6.1, https://matplotlib.org/</dc:title>\n </cc:Agent>\n </dc:creator>\n </cc:Work>\n </rdf:RDF>\n </metadata>\n <defs>\n <style type=\"text/css\">*{stroke-linejoin: round; stroke-linecap: butt}</style>\n </defs>\n <g id=\"figure_1\">\n <g id=\"patch_1\">\n <path d=\"M 0 297.190125 \nL 424.010625 297.190125 \nL 424.010625 0 \nL 0 0 \nz\n\"/>\n </g>\n <g id=\"axes_1\">\n <g id=\"patch_2\">\n <path d=\"M 59.690625 273.312 \nL 416.810625 273.312 \nL 416.810625 7.2 \nL 59.690625 7.2 \nz\n\"/>\n </g>\n <g id=\"patch_3\">\n <path d=\"M 77.546625 273.312 \nL 220.394625 273.312 \nL 220.394625 19.872 \nL 77.546625 19.872 \nz\n\" clip-path=\"url(#p3cf0b0ea68)\" style=\"fill: #72b6a1\"/>\n </g>\n <g id=\"patch_4\">\n <path d=\"M 256.106625 273.312 \nL 398.954625 273.312 \nL 398.954625 19.872 \nL 256.106625 19.872 \nz\n\" clip-path=\"url(#p3cf0b0ea68)\" style=\"fill: #e99675\"/>\n </g>\n <g id=\"matplotlib.axis_1\">\n <g id=\"xtick_1\">\n <g id=\"line2d_1\">\n <defs>\n <path id=\"m956d89f0c8\" d=\"M 0 0 \nL 0 3.5 \n\" style=\"stroke: #ffffff; stroke-width: 0.8\"/>\n </defs>\n <g>\n <use xlink:href=\"#m956d89f0c8\" x=\"148.970625\" y=\"273.312\" style=\"fill: #ffffff; stroke: #ffffff; stroke-width: 0.8\"/>\n </g>\n </g>\n <g id=\"text_1\">\n <!-- 0 -->\n <g style=\"fill: #ffffff\" transform=\"translate(145.789375 287.910437) scale(0.1 -0.1)\">\n <defs>\n <path id=\"DejaVuSans-30\" d=\"M 2034 4250 \nQ 1547 4250 1301 3770 \nQ 1056 3291 1056 2328 \nQ 1056 1369 1301 889 \nQ 1547 409 2034 409 \nQ 2525 409 2770 889 \nQ 3016 1369 3016 2328 \nQ 3016 3291 2770 3770 \nQ 2525 4250 2034 4250 \nz\nM 2034 4750 \nQ 2819 4750 3233 4129 \nQ 3647 3509 3647 2328 \nQ 3647 1150 3233 529 \nQ 2819 -91 2034 -91 \nQ 1250 -91 836 529 \nQ 422 1150 422 2328 \nQ 422 3509 836 4129 \nQ 1250 4750 2034 4750 \nz\n\" transform=\"scale(0.015625)\"/>\n </defs>\n <use xlink:href=\"#DejaVuSans-30\"/>\n </g>\n </g>\n </g>\n <g id=\"xtick_2\">\n <g id=\"line2d_2\">\n <g>\n <use xlink:href=\"#m956d89f0c8\" x=\"327.530625\" y=\"273.312\" style=\"fill: #ffffff; stroke: #ffffff; stroke-width: 0.8\"/>\n </g>\n </g>\n <g id=\"text_2\">\n <!-- 1 -->\n <g style=\"fill: #ffffff\" transform=\"translate(324.349375 287.910437) scale(0.1 -0.1)\">\n <defs>\n <path id=\"DejaVuSans-31\" d=\"M 794 531 \nL 1825 531 \nL 1825 4091 \nL 703 3866 \nL 703 4441 \nL 1819 4666 \nL 2450 4666 \nL 2450 531 \nL 3481 531 \nL 3481 0 \nL 794 0 \nL 794 531 \nz\n\" transform=\"scale(0.015625)\"/>\n </defs>\n <use xlink:href=\"#DejaVuSans-31\"/>\n </g>\n </g>\n </g>\n </g>\n <g id=\"matplotlib.axis_2\">\n <g id=\"ytick_1\">\n <g id=\"line2d_3\">\n <defs>\n <path id=\"m47e3c76e60\" d=\"M 0 0 \nL -3.5 0 \n\" style=\"stroke: #ffffff; stroke-width: 0.8\"/>\n </defs>\n <g>\n <use xlink:href=\"#m47e3c76e60\" x=\"59.690625\" y=\"273.312\" style=\"fill: #ffffff; stroke: #ffffff; stroke-width: 0.8\"/>\n </g>\n </g>\n <g id=\"text_3\">\n <!-- 0 -->\n <g style=\"fill: #ffffff\" transform=\"translate(46.328125 277.111219) scale(0.1 -0.1)\">\n <use xlink:href=\"#DejaVuSans-30\"/>\n </g>\n </g>\n </g>\n <g id=\"ytick_2\">\n <g id=\"line2d_4\">\n <g>\n <use xlink:href=\"#m47e3c76e60\" x=\"59.690625\" y=\"222.624\" style=\"fill: #ffffff; stroke: #ffffff; stroke-width: 0.8\"/>\n </g>\n </g>\n <g id=\"text_4\">\n <!-- 5000 -->\n <g style=\"fill: #ffffff\" transform=\"translate(27.240625 226.423219) scale(0.1 -0.1)\">\n <defs>\n <path id=\"DejaVuSans-35\" d=\"M 691 4666 \nL 3169 4666 \nL 3169 4134 \nL 1269 4134 \nL 1269 2991 \nQ 1406 3038 1543 3061 \nQ 1681 3084 1819 3084 \nQ 2600 3084 3056 2656 \nQ 3513 2228 3513 1497 \nQ 3513 744 3044 326 \nQ 2575 -91 1722 -91 \nQ 1428 -91 1123 -41 \nQ 819 9 494 109 \nL 494 744 \nQ 775 591 1075 516 \nQ 1375 441 1709 441 \nQ 2250 441 2565 725 \nQ 2881 1009 2881 1497 \nQ 2881 1984 2565 2268 \nQ 2250 2553 1709 2553 \nQ 1456 2553 1204 2497 \nQ 953 2441 691 2322 \nL 691 4666 \nz\n\" transform=\"scale(0.015625)\"/>\n </defs>\n <use xlink:href=\"#DejaVuSans-35\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"63.623047\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"127.246094\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"190.869141\"/>\n </g>\n </g>\n </g>\n <g id=\"ytick_3\">\n <g id=\"line2d_5\">\n <g>\n <use xlink:href=\"#m47e3c76e60\" x=\"59.690625\" y=\"171.936\" style=\"fill: #ffffff; stroke: #ffffff; stroke-width: 0.8\"/>\n </g>\n </g>\n <g id=\"text_5\">\n <!-- 10000 -->\n <g style=\"fill: #ffffff\" transform=\"translate(20.878125 175.735219) scale(0.1 -0.1)\">\n <use xlink:href=\"#DejaVuSans-31\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"63.623047\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"127.246094\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"190.869141\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"254.492188\"/>\n </g>\n </g>\n </g>\n <g id=\"ytick_4\">\n <g id=\"line2d_6\">\n <g>\n <use xlink:href=\"#m47e3c76e60\" x=\"59.690625\" y=\"121.248\" style=\"fill: #ffffff; stroke: #ffffff; stroke-width: 0.8\"/>\n </g>\n </g>\n <g id=\"text_6\">\n <!-- 15000 -->\n <g style=\"fill: #ffffff\" transform=\"translate(20.878125 125.047219) scale(0.1 -0.1)\">\n <use xlink:href=\"#DejaVuSans-31\"/>\n <use xlink:href=\"#DejaVuSans-35\" x=\"63.623047\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"127.246094\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"190.869141\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"254.492188\"/>\n </g>\n </g>\n </g>\n <g id=\"ytick_5\">\n <g id=\"line2d_7\">\n <g>\n <use xlink:href=\"#m47e3c76e60\" x=\"59.690625\" y=\"70.56\" style=\"fill: #ffffff; stroke: #ffffff; stroke-width: 0.8\"/>\n </g>\n </g>\n <g id=\"text_7\">\n <!-- 20000 -->\n <g style=\"fill: #ffffff\" transform=\"translate(20.878125 74.359219) scale(0.1 -0.1)\">\n <defs>\n <path id=\"DejaVuSans-32\" d=\"M 1228 531 \nL 3431 531 \nL 3431 0 \nL 469 0 \nL 469 531 \nQ 828 903 1448 1529 \nQ 2069 2156 2228 2338 \nQ 2531 2678 2651 2914 \nQ 2772 3150 2772 3378 \nQ 2772 3750 2511 3984 \nQ 2250 4219 1831 4219 \nQ 1534 4219 1204 4116 \nQ 875 4013 500 3803 \nL 500 4441 \nQ 881 4594 1212 4672 \nQ 1544 4750 1819 4750 \nQ 2544 4750 2975 4387 \nQ 3406 4025 3406 3419 \nQ 3406 3131 3298 2873 \nQ 3191 2616 2906 2266 \nQ 2828 2175 2409 1742 \nQ 1991 1309 1228 531 \nz\n\" transform=\"scale(0.015625)\"/>\n </defs>\n <use xlink:href=\"#DejaVuSans-32\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"63.623047\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"127.246094\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"190.869141\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"254.492188\"/>\n </g>\n </g>\n </g>\n <g id=\"ytick_6\">\n <g id=\"line2d_8\">\n <g>\n <use xlink:href=\"#m47e3c76e60\" x=\"59.690625\" y=\"19.872\" style=\"fill: #ffffff; stroke: #ffffff; stroke-width: 0.8\"/>\n </g>\n </g>\n <g id=\"text_8\">\n <!-- 25000 -->\n <g style=\"fill: #ffffff\" transform=\"translate(20.878125 23.671219) scale(0.1 -0.1)\">\n <use xlink:href=\"#DejaVuSans-32\"/>\n <use xlink:href=\"#DejaVuSans-35\" x=\"63.623047\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"127.246094\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"190.869141\"/>\n <use xlink:href=\"#DejaVuSans-30\" x=\"254.492188\"/>\n </g>\n </g>\n </g>\n <g id=\"text_9\">\n <!-- count -->\n <g style=\"fill: #ffffff\" transform=\"translate(14.798438 154.36225) rotate(-90) scale(0.1 -0.1)\">\n <defs>\n <path id=\"DejaVuSans-63\" d=\"M 3122 3366 \nL 3122 2828 \nQ 2878 2963 2633 3030 \nQ 2388 3097 2138 3097 \nQ 1578 3097 1268 2742 \nQ 959 2388 959 1747 \nQ 959 1106 1268 751 \nQ 1578 397 2138 397 \nQ 2388 397 2633 464 \nQ 2878 531 3122 666 \nL 3122 134 \nQ 2881 22 2623 -34 \nQ 2366 -91 2075 -91 \nQ 1284 -91 818 406 \nQ 353 903 353 1747 \nQ 353 2603 823 3093 \nQ 1294 3584 2113 3584 \nQ 2378 3584 2631 3529 \nQ 2884 3475 3122 3366 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-6f\" d=\"M 1959 3097 \nQ 1497 3097 1228 2736 \nQ 959 2375 959 1747 \nQ 959 1119 1226 758 \nQ 1494 397 1959 397 \nQ 2419 397 2687 759 \nQ 2956 1122 2956 1747 \nQ 2956 2369 2687 2733 \nQ 2419 3097 1959 3097 \nz\nM 1959 3584 \nQ 2709 3584 3137 3096 \nQ 3566 2609 3566 1747 \nQ 3566 888 3137 398 \nQ 2709 -91 1959 -91 \nQ 1206 -91 779 398 \nQ 353 888 353 1747 \nQ 353 2609 779 3096 \nQ 1206 3584 1959 3584 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-75\" d=\"M 544 1381 \nL 544 3500 \nL 1119 3500 \nL 1119 1403 \nQ 1119 906 1312 657 \nQ 1506 409 1894 409 \nQ 2359 409 2629 706 \nQ 2900 1003 2900 1516 \nL 2900 3500 \nL 3475 3500 \nL 3475 0 \nL 2900 0 \nL 2900 538 \nQ 2691 219 2414 64 \nQ 2138 -91 1772 -91 \nQ 1169 -91 856 284 \nQ 544 659 544 1381 \nz\nM 1991 3584 \nL 1991 3584 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-6e\" d=\"M 3513 2113 \nL 3513 0 \nL 2938 0 \nL 2938 2094 \nQ 2938 2591 2744 2837 \nQ 2550 3084 2163 3084 \nQ 1697 3084 1428 2787 \nQ 1159 2491 1159 1978 \nL 1159 0 \nL 581 0 \nL 581 3500 \nL 1159 3500 \nL 1159 2956 \nQ 1366 3272 1645 3428 \nQ 1925 3584 2291 3584 \nQ 2894 3584 3203 3211 \nQ 3513 2838 3513 2113 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-74\" d=\"M 1172 4494 \nL 1172 3500 \nL 2356 3500 \nL 2356 3053 \nL 1172 3053 \nL 1172 1153 \nQ 1172 725 1289 603 \nQ 1406 481 1766 481 \nL 2356 481 \nL 2356 0 \nL 1766 0 \nQ 1100 0 847 248 \nQ 594 497 594 1153 \nL 594 3053 \nL 172 3053 \nL 172 3500 \nL 594 3500 \nL 594 4494 \nL 1172 4494 \nz\n\" transform=\"scale(0.015625)\"/>\n </defs>\n <use xlink:href=\"#DejaVuSans-63\"/>\n <use xlink:href=\"#DejaVuSans-6f\" x=\"54.980469\"/>\n <use xlink:href=\"#DejaVuSans-75\" x=\"116.162109\"/>\n <use xlink:href=\"#DejaVuSans-6e\" x=\"179.541016\"/>\n <use xlink:href=\"#DejaVuSans-74\" x=\"242.919922\"/>\n </g>\n </g>\n </g>\n <g id=\"patch_5\">\n <path d=\"M 59.690625 273.312 \nL 59.690625 7.2 \n\" style=\"fill: none; stroke: #ffffff; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n </g>\n <g id=\"patch_6\">\n <path d=\"M 416.810625 273.312 \nL 416.810625 7.2 \n\" style=\"fill: none; stroke: #ffffff; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n </g>\n <g id=\"patch_7\">\n <path d=\"M 59.690625 273.312 \nL 416.810625 273.312 \n\" style=\"fill: none; stroke: #ffffff; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n </g>\n <g id=\"patch_8\">\n <path d=\"M 59.690625 7.2 \nL 416.810625 7.2 \n\" style=\"fill: none; stroke: #ffffff; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n </g>\n <g id=\"legend_1\">\n <g id=\"patch_9\">\n <path d=\"M 316.254375 34.878125 \nL 409.810625 34.878125 \nQ 411.810625 34.878125 411.810625 32.878125 \nL 411.810625 14.2 \nQ 411.810625 12.2 409.810625 12.2 \nL 316.254375 12.2 \nQ 314.254375 12.2 314.254375 14.2 \nL 314.254375 32.878125 \nQ 314.254375 34.878125 316.254375 34.878125 \nz\n\" style=\"opacity: 0.8; stroke: #cccccc; stroke-linejoin: miter\"/>\n </g>\n <g id=\"text_10\">\n <!-- Is The Label Bias? -->\n <g style=\"fill: #ffffff\" transform=\"translate(318.254375 23.798437) scale(0.1 -0.1)\">\n <defs>\n <path id=\"DejaVuSans-49\" d=\"M 628 4666 \nL 1259 4666 \nL 1259 0 \nL 628 0 \nL 628 4666 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-73\" d=\"M 2834 3397 \nL 2834 2853 \nQ 2591 2978 2328 3040 \nQ 2066 3103 1784 3103 \nQ 1356 3103 1142 2972 \nQ 928 2841 928 2578 \nQ 928 2378 1081 2264 \nQ 1234 2150 1697 2047 \nL 1894 2003 \nQ 2506 1872 2764 1633 \nQ 3022 1394 3022 966 \nQ 3022 478 2636 193 \nQ 2250 -91 1575 -91 \nQ 1294 -91 989 -36 \nQ 684 19 347 128 \nL 347 722 \nQ 666 556 975 473 \nQ 1284 391 1588 391 \nQ 1994 391 2212 530 \nQ 2431 669 2431 922 \nQ 2431 1156 2273 1281 \nQ 2116 1406 1581 1522 \nL 1381 1569 \nQ 847 1681 609 1914 \nQ 372 2147 372 2553 \nQ 372 3047 722 3315 \nQ 1072 3584 1716 3584 \nQ 2034 3584 2315 3537 \nQ 2597 3491 2834 3397 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-20\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-54\" d=\"M -19 4666 \nL 3928 4666 \nL 3928 4134 \nL 2272 4134 \nL 2272 0 \nL 1638 0 \nL 1638 4134 \nL -19 4134 \nL -19 4666 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-68\" d=\"M 3513 2113 \nL 3513 0 \nL 2938 0 \nL 2938 2094 \nQ 2938 2591 2744 2837 \nQ 2550 3084 2163 3084 \nQ 1697 3084 1428 2787 \nQ 1159 2491 1159 1978 \nL 1159 0 \nL 581 0 \nL 581 4863 \nL 1159 4863 \nL 1159 2956 \nQ 1366 3272 1645 3428 \nQ 1925 3584 2291 3584 \nQ 2894 3584 3203 3211 \nQ 3513 2838 3513 2113 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-65\" d=\"M 3597 1894 \nL 3597 1613 \nL 953 1613 \nQ 991 1019 1311 708 \nQ 1631 397 2203 397 \nQ 2534 397 2845 478 \nQ 3156 559 3463 722 \nL 3463 178 \nQ 3153 47 2828 -22 \nQ 2503 -91 2169 -91 \nQ 1331 -91 842 396 \nQ 353 884 353 1716 \nQ 353 2575 817 3079 \nQ 1281 3584 2069 3584 \nQ 2775 3584 3186 3129 \nQ 3597 2675 3597 1894 \nz\nM 3022 2063 \nQ 3016 2534 2758 2815 \nQ 2500 3097 2075 3097 \nQ 1594 3097 1305 2825 \nQ 1016 2553 972 2059 \nL 3022 2063 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-4c\" d=\"M 628 4666 \nL 1259 4666 \nL 1259 531 \nL 3531 531 \nL 3531 0 \nL 628 0 \nL 628 4666 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-61\" d=\"M 2194 1759 \nQ 1497 1759 1228 1600 \nQ 959 1441 959 1056 \nQ 959 750 1161 570 \nQ 1363 391 1709 391 \nQ 2188 391 2477 730 \nQ 2766 1069 2766 1631 \nL 2766 1759 \nL 2194 1759 \nz\nM 3341 1997 \nL 3341 0 \nL 2766 0 \nL 2766 531 \nQ 2569 213 2275 61 \nQ 1981 -91 1556 -91 \nQ 1019 -91 701 211 \nQ 384 513 384 1019 \nQ 384 1609 779 1909 \nQ 1175 2209 1959 2209 \nL 2766 2209 \nL 2766 2266 \nQ 2766 2663 2505 2880 \nQ 2244 3097 1772 3097 \nQ 1472 3097 1187 3025 \nQ 903 2953 641 2809 \nL 641 3341 \nQ 956 3463 1253 3523 \nQ 1550 3584 1831 3584 \nQ 2591 3584 2966 3190 \nQ 3341 2797 3341 1997 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-62\" d=\"M 3116 1747 \nQ 3116 2381 2855 2742 \nQ 2594 3103 2138 3103 \nQ 1681 3103 1420 2742 \nQ 1159 2381 1159 1747 \nQ 1159 1113 1420 752 \nQ 1681 391 2138 391 \nQ 2594 391 2855 752 \nQ 3116 1113 3116 1747 \nz\nM 1159 2969 \nQ 1341 3281 1617 3432 \nQ 1894 3584 2278 3584 \nQ 2916 3584 3314 3078 \nQ 3713 2572 3713 1747 \nQ 3713 922 3314 415 \nQ 2916 -91 2278 -91 \nQ 1894 -91 1617 61 \nQ 1341 213 1159 525 \nL 1159 0 \nL 581 0 \nL 581 4863 \nL 1159 4863 \nL 1159 2969 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-6c\" d=\"M 603 4863 \nL 1178 4863 \nL 1178 0 \nL 603 0 \nL 603 4863 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-42\" d=\"M 1259 2228 \nL 1259 519 \nL 2272 519 \nQ 2781 519 3026 730 \nQ 3272 941 3272 1375 \nQ 3272 1813 3026 2020 \nQ 2781 2228 2272 2228 \nL 1259 2228 \nz\nM 1259 4147 \nL 1259 2741 \nL 2194 2741 \nQ 2656 2741 2882 2914 \nQ 3109 3088 3109 3444 \nQ 3109 3797 2882 3972 \nQ 2656 4147 2194 4147 \nL 1259 4147 \nz\nM 628 4666 \nL 2241 4666 \nQ 2963 4666 3353 4366 \nQ 3744 4066 3744 3513 \nQ 3744 3084 3544 2831 \nQ 3344 2578 2956 2516 \nQ 3422 2416 3680 2098 \nQ 3938 1781 3938 1306 \nQ 3938 681 3513 340 \nQ 3088 0 2303 0 \nL 628 0 \nL 628 4666 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-69\" d=\"M 603 3500 \nL 1178 3500 \nL 1178 0 \nL 603 0 \nL 603 3500 \nz\nM 603 4863 \nL 1178 4863 \nL 1178 4134 \nL 603 4134 \nL 603 4863 \nz\n\" transform=\"scale(0.015625)\"/>\n <path id=\"DejaVuSans-3f\" d=\"M 1222 794 \nL 1856 794 \nL 1856 0 \nL 1222 0 \nL 1222 794 \nz\nM 1838 1253 \nL 1241 1253 \nL 1241 1734 \nQ 1241 2050 1328 2253 \nQ 1416 2456 1697 2725 \nL 1978 3003 \nQ 2156 3169 2236 3316 \nQ 2316 3463 2316 3616 \nQ 2316 3894 2111 4066 \nQ 1906 4238 1569 4238 \nQ 1322 4238 1042 4128 \nQ 763 4019 459 3809 \nL 459 4397 \nQ 753 4575 1054 4662 \nQ 1356 4750 1678 4750 \nQ 2253 4750 2601 4447 \nQ 2950 4144 2950 3647 \nQ 2950 3409 2837 3195 \nQ 2725 2981 2444 2713 \nL 2169 2444 \nQ 2022 2297 1961 2214 \nQ 1900 2131 1875 2053 \nQ 1856 1988 1847 1894 \nQ 1838 1800 1838 1638 \nL 1838 1253 \nz\n\" transform=\"scale(0.015625)\"/>\n </defs>\n <use xlink:href=\"#DejaVuSans-49\"/>\n <use xlink:href=\"#DejaVuSans-73\" x=\"29.492188\"/>\n <use xlink:href=\"#DejaVuSans-20\" x=\"81.591797\"/>\n <use xlink:href=\"#DejaVuSans-54\" x=\"113.378906\"/>\n <use xlink:href=\"#DejaVuSans-68\" x=\"174.462891\"/>\n <use xlink:href=\"#DejaVuSans-65\" x=\"237.841797\"/>\n <use xlink:href=\"#DejaVuSans-20\" x=\"299.365234\"/>\n <use xlink:href=\"#DejaVuSans-4c\" x=\"331.152344\"/>\n <use xlink:href=\"#DejaVuSans-61\" x=\"386.865234\"/>\n <use xlink:href=\"#DejaVuSans-62\" x=\"448.144531\"/>\n <use xlink:href=\"#DejaVuSans-65\" x=\"511.621094\"/>\n <use xlink:href=\"#DejaVuSans-6c\" x=\"573.144531\"/>\n <use xlink:href=\"#DejaVuSans-20\" x=\"600.927734\"/>\n <use xlink:href=\"#DejaVuSans-42\" x=\"632.714844\"/>\n <use xlink:href=\"#DejaVuSans-69\" x=\"701.318359\"/>\n <use xlink:href=\"#DejaVuSans-61\" x=\"729.101562\"/>\n <use xlink:href=\"#DejaVuSans-73\" x=\"790.380859\"/>\n <use xlink:href=\"#DejaVuSans-3f\" x=\"842.480469\"/>\n </g>\n </g>\n </g>\n </g>\n </g>\n <defs>\n <clipPath id=\"p3cf0b0ea68\">\n <rect x=\"59.690625\" y=\"7.2\" width=\"357.12\" height=\"266.112\"/>\n </clipPath>\n </defs>\n</svg>\n",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.countplot(x=targets, palette='Set2')\n",
"plt.legend( title = \"Is The Label Bias?\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"<font color='orange'> The data is evenly distributed, which means that the model will not be bias!"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The reviews are in the following format: [1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463, 4369, 5012, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 8163, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 6853, 5, 163, 11, 3215, 2, 4, 1153, 9, 194, 775, 7, 8255, 2, 349, 2637, 148, 605, 2, 8003, 15, 123, 125, 68, 2, 6853, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 2, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 8255, 5, 2, 656, 245, 2350, 5, 4, 9837, 131, 152, 491, 18, 2, 32, 7464, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95]\n"
]
}
],
"source": [
"print(\"The reviews are in the following format: \", data[1])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"<font color='orange'>It seems like the data is encoded. By checking the Dataset documentation, it says that the reviews can be eaxtected as follows:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'fawn': 34701,\n",
" 'tsukino': 52006,\n",
" 'nunnery': 52007,\n",
" 'sonja': 16816,\n",
" 'vani': 63951,\n",
" 'woods': 1408,\n",
" 'spiders': 16115,\n",
" 'hanging': 2345,\n",
" 'woody': 2289,\n",
" 'trawling': 52008,\n",
" \"hold's\": 52009,\n",
" 'comically': 11307,\n",
" 'localized': 40830,\n",
" 'disobeying': 30568,\n",
" \"'royale\": 52010,\n",
" \"harpo's\": 40831,\n",
" 'canet': 52011,\n",
" 'aileen': 19313,\n",
" 'acurately': 52012,\n",
" \"diplomat's\": 52013,\n",
" 'rickman': 25242,\n",
" 'arranged': 6746,\n",
" 'rumbustious': 52014,\n",
" 'familiarness': 52015,\n",
" \"spider'\": 52016,\n",
" 'hahahah': 68804,\n",
" \"wood'\": 52017,\n",
" 'transvestism': 40833,\n",
" \"hangin'\": 34702,\n",
" 'bringing': 2338,\n",
" 'seamier': 40834,\n",
" 'wooded': 34703,\n",
" 'bravora': 52018,\n",
" 'grueling': 16817,\n",
" 'wooden': 1636,\n",
" 'wednesday': 16818,\n",
" \"'prix\": 52019,\n",
" 'altagracia': 34704,\n",
" 'circuitry': 52020,\n",
" 'crotch': 11585,\n",
" 'busybody': 57766,\n",
" \"tart'n'tangy\": 52021,\n",
" 'burgade': 14129,\n",
" 'thrace': 52023,\n",
" \"tom's\": 11038,\n",
" 'snuggles': 52025,\n",
" 'francesco': 29114,\n",
" 'complainers': 52027,\n",
" 'templarios': 52125,\n",
" '272': 40835,\n",
" '273': 52028,\n",
" 'zaniacs': 52130,\n",
" '275': 34706,\n",
" 'consenting': 27631,\n",
" 'snuggled': 40836,\n",
" 'inanimate': 15492,\n",
" 'uality': 52030,\n",
" 'bronte': 11926,\n",
" 'errors': 4010,\n",
" 'dialogs': 3230,\n",
" \"yomada's\": 52031,\n",
" \"madman's\": 34707,\n",
" 'dialoge': 30585,\n",
" 'usenet': 52033,\n",
" 'videodrome': 40837,\n",
" \"kid'\": 26338,\n",
" 'pawed': 52034,\n",
" \"'girlfriend'\": 30569,\n",
" \"'pleasure\": 52035,\n",
" \"'reloaded'\": 52036,\n",
" \"kazakos'\": 40839,\n",
" 'rocque': 52037,\n",
" 'mailings': 52038,\n",
" 'brainwashed': 11927,\n",
" 'mcanally': 16819,\n",
" \"tom''\": 52039,\n",
" 'kurupt': 25243,\n",
" 'affiliated': 21905,\n",
" 'babaganoosh': 52040,\n",
" \"noe's\": 40840,\n",
" 'quart': 40841,\n",
" 'kids': 359,\n",
" 'uplifting': 5034,\n",
" 'controversy': 7093,\n",
" 'kida': 21906,\n",
" 'kidd': 23379,\n",
" \"error'\": 52041,\n",
" 'neurologist': 52042,\n",
" 'spotty': 18510,\n",
" 'cobblers': 30570,\n",
" 'projection': 9878,\n",
" 'fastforwarding': 40842,\n",
" 'sters': 52043,\n",
" \"eggar's\": 52044,\n",
" 'etherything': 52045,\n",
" 'gateshead': 40843,\n",
" 'airball': 34708,\n",
" 'unsinkable': 25244,\n",
" 'stern': 7180,\n",
" \"cervi's\": 52046,\n",
" 'dnd': 40844,\n",
" 'dna': 11586,\n",
" 'insecurity': 20598,\n",
" \"'reboot'\": 52047,\n",
" 'trelkovsky': 11037,\n",
" 'jaekel': 52048,\n",
" 'sidebars': 52049,\n",
" \"sforza's\": 52050,\n",
" 'distortions': 17633,\n",
" 'mutinies': 52051,\n",
" 'sermons': 30602,\n",
" '7ft': 40846,\n",
" 'boobage': 52052,\n",
" \"o'bannon's\": 52053,\n",
" 'populations': 23380,\n",
" 'chulak': 52054,\n",
" 'mesmerize': 27633,\n",
" 'quinnell': 52055,\n",
" 'yahoo': 10307,\n",
" 'meteorologist': 52057,\n",
" 'beswick': 42577,\n",
" 'boorman': 15493,\n",
" 'voicework': 40847,\n",
" \"ster'\": 52058,\n",
" 'blustering': 22922,\n",
" 'hj': 52059,\n",
" 'intake': 27634,\n",
" 'morally': 5621,\n",
" 'jumbling': 40849,\n",
" 'bowersock': 52060,\n",
" \"'porky's'\": 52061,\n",
" 'gershon': 16821,\n",
" 'ludicrosity': 40850,\n",
" 'coprophilia': 52062,\n",
" 'expressively': 40851,\n",
" \"india's\": 19500,\n",
" \"post's\": 34710,\n",
" 'wana': 52063,\n",
" 'wang': 5283,\n",
" 'wand': 30571,\n",
" 'wane': 25245,\n",
" 'edgeways': 52321,\n",
" 'titanium': 34711,\n",
" 'pinta': 40852,\n",
" 'want': 178,\n",
" 'pinto': 30572,\n",
" 'whoopdedoodles': 52065,\n",
" 'tchaikovsky': 21908,\n",
" 'travel': 2103,\n",
" \"'victory'\": 52066,\n",
" 'copious': 11928,\n",
" 'gouge': 22433,\n",
" \"chapters'\": 52067,\n",
" 'barbra': 6702,\n",
" 'uselessness': 30573,\n",
" \"wan'\": 52068,\n",
" 'assimilated': 27635,\n",
" 'petiot': 16116,\n",
" 'most\\x85and': 52069,\n",
" 'dinosaurs': 3930,\n",
" 'wrong': 352,\n",
" 'seda': 52070,\n",
" 'stollen': 52071,\n",
" 'sentencing': 34712,\n",
" 'ouroboros': 40853,\n",
" 'assimilates': 40854,\n",
" 'colorfully': 40855,\n",
" 'glenne': 27636,\n",
" 'dongen': 52072,\n",
" 'subplots': 4760,\n",
" 'kiloton': 52073,\n",
" 'chandon': 23381,\n",
" \"effect'\": 34713,\n",
" 'snugly': 27637,\n",
" 'kuei': 40856,\n",
" 'welcomed': 9092,\n",
" 'dishonor': 30071,\n",
" 'concurrence': 52075,\n",
" 'stoicism': 23382,\n",
" \"guys'\": 14896,\n",
" \"beroemd'\": 52077,\n",
" 'butcher': 6703,\n",
" \"melfi's\": 40857,\n",
" 'aargh': 30623,\n",
" 'playhouse': 20599,\n",
" 'wickedly': 11308,\n",
" 'fit': 1180,\n",
" 'labratory': 52078,\n",
" 'lifeline': 40859,\n",
" 'screaming': 1927,\n",
" 'fix': 4287,\n",
" 'cineliterate': 52079,\n",
" 'fic': 52080,\n",
" 'fia': 52081,\n",
" 'fig': 34714,\n",
" 'fmvs': 52082,\n",
" 'fie': 52083,\n",
" 'reentered': 52084,\n",
" 'fin': 30574,\n",
" 'doctresses': 52085,\n",
" 'fil': 52086,\n",
" 'zucker': 12606,\n",
" 'ached': 31931,\n",
" 'counsil': 52088,\n",
" 'paterfamilias': 52089,\n",
" 'songwriter': 13885,\n",
" 'shivam': 34715,\n",
" 'hurting': 9654,\n",
" 'effects': 299,\n",
" 'slauther': 52090,\n",
" \"'flame'\": 52091,\n",
" 'sommerset': 52092,\n",
" 'interwhined': 52093,\n",
" 'whacking': 27638,\n",
" 'bartok': 52094,\n",
" 'barton': 8775,\n",
" 'frewer': 21909,\n",
" \"fi'\": 52095,\n",
" 'ingrid': 6192,\n",
" 'stribor': 30575,\n",
" 'approporiately': 52096,\n",
" 'wobblyhand': 52097,\n",
" 'tantalisingly': 52098,\n",
" 'ankylosaurus': 52099,\n",
" 'parasites': 17634,\n",
" 'childen': 52100,\n",
" \"jenkins'\": 52101,\n",
" 'metafiction': 52102,\n",
" 'golem': 17635,\n",
" 'indiscretion': 40860,\n",
" \"reeves'\": 23383,\n",
" \"inamorata's\": 57781,\n",
" 'brittannica': 52104,\n",
" 'adapt': 7916,\n",
" \"russo's\": 30576,\n",
" 'guitarists': 48246,\n",
" 'abbott': 10553,\n",
" 'abbots': 40861,\n",
" 'lanisha': 17649,\n",
" 'magickal': 40863,\n",
" 'mattter': 52105,\n",
" \"'willy\": 52106,\n",
" 'pumpkins': 34716,\n",
" 'stuntpeople': 52107,\n",
" 'estimate': 30577,\n",
" 'ugghhh': 40864,\n",
" 'gameplay': 11309,\n",
" \"wern't\": 52108,\n",
" \"n'sync\": 40865,\n",
" 'sickeningly': 16117,\n",
" 'chiara': 40866,\n",
" 'disturbed': 4011,\n",
" 'portmanteau': 40867,\n",
" 'ineffectively': 52109,\n",
" \"duchonvey's\": 82143,\n",
" \"nasty'\": 37519,\n",
" 'purpose': 1285,\n",
" 'lazers': 52112,\n",
" 'lightened': 28105,\n",
" 'kaliganj': 52113,\n",
" 'popularism': 52114,\n",
" \"damme's\": 18511,\n",
" 'stylistics': 30578,\n",
" 'mindgaming': 52115,\n",
" 'spoilerish': 46449,\n",
" \"'corny'\": 52117,\n",
" 'boerner': 34718,\n",
" 'olds': 6792,\n",
" 'bakelite': 52118,\n",
" 'renovated': 27639,\n",
" 'forrester': 27640,\n",
" \"lumiere's\": 52119,\n",
" 'gaskets': 52024,\n",
" 'needed': 884,\n",
" 'smight': 34719,\n",
" 'master': 1297,\n",
" \"edie's\": 25905,\n",
" 'seeber': 40868,\n",
" 'hiya': 52120,\n",
" 'fuzziness': 52121,\n",
" 'genesis': 14897,\n",
" 'rewards': 12607,\n",
" 'enthrall': 30579,\n",
" \"'about\": 40869,\n",
" \"recollection's\": 52122,\n",
" 'mutilated': 11039,\n",
" 'fatherlands': 52123,\n",
" \"fischer's\": 52124,\n",
" 'positively': 5399,\n",
" '270': 34705,\n",
" 'ahmed': 34720,\n",
" 'zatoichi': 9836,\n",
" 'bannister': 13886,\n",
" 'anniversaries': 52127,\n",
" \"helm's\": 30580,\n",
" \"'work'\": 52128,\n",
" 'exclaimed': 34721,\n",
" \"'unfunny'\": 52129,\n",
" '274': 52029,\n",
" 'feeling': 544,\n",
" \"wanda's\": 52131,\n",
" 'dolan': 33266,\n",
" '278': 52133,\n",
" 'peacoat': 52134,\n",
" 'brawny': 40870,\n",
" 'mishra': 40871,\n",
" 'worlders': 40872,\n",
" 'protags': 52135,\n",
" 'skullcap': 52136,\n",
" 'dastagir': 57596,\n",
" 'affairs': 5622,\n",
" 'wholesome': 7799,\n",
" 'hymen': 52137,\n",
" 'paramedics': 25246,\n",
" 'unpersons': 52138,\n",
" 'heavyarms': 52139,\n",
" 'affaire': 52140,\n",
" 'coulisses': 52141,\n",
" 'hymer': 40873,\n",
" 'kremlin': 52142,\n",
" 'shipments': 30581,\n",
" 'pixilated': 52143,\n",
" \"'00s\": 30582,\n",
" 'diminishing': 18512,\n",
" 'cinematic': 1357,\n",
" 'resonates': 14898,\n",
" 'simplify': 40874,\n",
" \"nature'\": 40875,\n",
" 'temptresses': 40876,\n",
" 'reverence': 16822,\n",
" 'resonated': 19502,\n",
" 'dailey': 34722,\n",
" '2\\x85': 52144,\n",
" 'treize': 27641,\n",
" 'majo': 52145,\n",
" 'kiya': 21910,\n",
" 'woolnough': 52146,\n",
" 'thanatos': 39797,\n",
" 'sandoval': 35731,\n",
" 'dorama': 40879,\n",
" \"o'shaughnessy\": 52147,\n",
" 'tech': 4988,\n",
" 'fugitives': 32018,\n",
" 'teck': 30583,\n",
" \"'e'\": 76125,\n",
" 'doesn’t': 40881,\n",
" 'purged': 52149,\n",
" 'saying': 657,\n",
" \"martians'\": 41095,\n",
" 'norliss': 23418,\n",
" 'dickey': 27642,\n",
" 'dicker': 52152,\n",
" \"'sependipity\": 52153,\n",
" 'padded': 8422,\n",
" 'ordell': 57792,\n",
" \"sturges'\": 40882,\n",
" 'independentcritics': 52154,\n",
" 'tempted': 5745,\n",
" \"atkinson's\": 34724,\n",
" 'hounded': 25247,\n",
" 'apace': 52155,\n",
" 'clicked': 15494,\n",
" \"'humor'\": 30584,\n",
" \"martino's\": 17177,\n",
" \"'supporting\": 52156,\n",
" 'warmongering': 52032,\n",
" \"zemeckis's\": 34725,\n",
" 'lube': 21911,\n",
" 'shocky': 52157,\n",
" 'plate': 7476,\n",
" 'plata': 40883,\n",
" 'sturgess': 40884,\n",
" \"nerds'\": 40885,\n",
" 'plato': 20600,\n",
" 'plath': 34726,\n",
" 'platt': 40886,\n",
" 'mcnab': 52159,\n",
" 'clumsiness': 27643,\n",
" 'altogether': 3899,\n",
" 'massacring': 42584,\n",
" 'bicenntinial': 52160,\n",
" 'skaal': 40887,\n",
" 'droning': 14360,\n",
" 'lds': 8776,\n",
" 'jaguar': 21912,\n",
" \"cale's\": 34727,\n",
" 'nicely': 1777,\n",
" 'mummy': 4588,\n",
" \"lot's\": 18513,\n",
" 'patch': 10086,\n",
" 'kerkhof': 50202,\n",
" \"leader's\": 52161,\n",
" \"'movie\": 27644,\n",
" 'uncomfirmed': 52162,\n",
" 'heirloom': 40888,\n",
" 'wrangle': 47360,\n",
" 'emotion\\x85': 52163,\n",
" \"'stargate'\": 52164,\n",
" 'pinoy': 40889,\n",
" 'conchatta': 40890,\n",
" 'broeke': 41128,\n",
" 'advisedly': 40891,\n",
" \"barker's\": 17636,\n",
" 'descours': 52166,\n",
" 'lots': 772,\n",
" 'lotr': 9259,\n",
" 'irs': 9879,\n",
" 'lott': 52167,\n",
" 'xvi': 40892,\n",
" 'irk': 34728,\n",
" 'irl': 52168,\n",
" 'ira': 6887,\n",
" 'belzer': 21913,\n",
" 'irc': 52169,\n",
" 'ire': 27645,\n",
" 'requisites': 40893,\n",
" 'discipline': 7693,\n",
" 'lyoko': 52961,\n",
" 'extend': 11310,\n",
" 'nature': 873,\n",
" \"'dickie'\": 52170,\n",
" 'optimist': 40894,\n",
" 'lapping': 30586,\n",
" 'superficial': 3900,\n",
" 'vestment': 52171,\n",
" 'extent': 2823,\n",
" 'tendons': 52172,\n",
" \"heller's\": 52173,\n",
" 'quagmires': 52174,\n",
" 'miyako': 52175,\n",
" 'moocow': 20601,\n",
" \"coles'\": 52176,\n",
" 'lookit': 40895,\n",
" 'ravenously': 52177,\n",
" 'levitating': 40896,\n",
" 'perfunctorily': 52178,\n",
" 'lookin': 30587,\n",
" \"lot'\": 40898,\n",
" 'lookie': 52179,\n",
" 'fearlessly': 34870,\n",
" 'libyan': 52181,\n",
" 'fondles': 40899,\n",
" 'gopher': 35714,\n",
" 'wearying': 40901,\n",
" \"nz's\": 52182,\n",
" 'minuses': 27646,\n",
" 'puposelessly': 52183,\n",
" 'shandling': 52184,\n",
" 'decapitates': 31268,\n",
" 'humming': 11929,\n",
" \"'nother\": 40902,\n",
" 'smackdown': 21914,\n",
" 'underdone': 30588,\n",
" 'frf': 40903,\n",
" 'triviality': 52185,\n",
" 'fro': 25248,\n",
" 'bothers': 8777,\n",
" \"'kensington\": 52186,\n",
" 'much': 73,\n",
" 'muco': 34730,\n",
" 'wiseguy': 22615,\n",
" \"richie's\": 27648,\n",
" 'tonino': 40904,\n",
" 'unleavened': 52187,\n",
" 'fry': 11587,\n",
" \"'tv'\": 40905,\n",
" 'toning': 40906,\n",
" 'obese': 14361,\n",
" 'sensationalized': 30589,\n",
" 'spiv': 40907,\n",
" 'spit': 6259,\n",
" 'arkin': 7364,\n",
" 'charleton': 21915,\n",
" 'jeon': 16823,\n",
" 'boardroom': 21916,\n",
" 'doubts': 4989,\n",
" 'spin': 3084,\n",
" 'hepo': 53083,\n",
" 'wildcat': 27649,\n",
" 'venoms': 10584,\n",
" 'misconstrues': 52191,\n",
" 'mesmerising': 18514,\n",
" 'misconstrued': 40908,\n",
" 'rescinds': 52192,\n",
" 'prostrate': 52193,\n",
" 'majid': 40909,\n",
" 'climbed': 16479,\n",
" 'canoeing': 34731,\n",
" 'majin': 52195,\n",
" 'animie': 57804,\n",
" 'sylke': 40910,\n",
" 'conditioned': 14899,\n",
" 'waddell': 40911,\n",
" '3\\x85': 52196,\n",
" 'hyperdrive': 41188,\n",
" 'conditioner': 34732,\n",
" 'bricklayer': 53153,\n",
" 'hong': 2576,\n",
" 'memoriam': 52198,\n",
" 'inventively': 30592,\n",
" \"levant's\": 25249,\n",
" 'portobello': 20638,\n",
" 'remand': 52200,\n",
" 'mummified': 19504,\n",
" 'honk': 27650,\n",
" 'spews': 19505,\n",
" 'visitations': 40912,\n",
" 'mummifies': 52201,\n",
" 'cavanaugh': 25250,\n",
" 'zeon': 23385,\n",
" \"jungle's\": 40913,\n",
" 'viertel': 34733,\n",
" 'frenchmen': 27651,\n",
" 'torpedoes': 52202,\n",
" 'schlessinger': 52203,\n",
" 'torpedoed': 34734,\n",
" 'blister': 69876,\n",
" 'cinefest': 52204,\n",
" 'furlough': 34735,\n",
" 'mainsequence': 52205,\n",
" 'mentors': 40914,\n",
" 'academic': 9094,\n",
" 'stillness': 20602,\n",
" 'academia': 40915,\n",
" 'lonelier': 52206,\n",
" 'nibby': 52207,\n",
" \"losers'\": 52208,\n",
" 'cineastes': 40916,\n",
" 'corporate': 4449,\n",
" 'massaging': 40917,\n",
" 'bellow': 30593,\n",
" 'absurdities': 19506,\n",
" 'expetations': 53241,\n",
" 'nyfiken': 40918,\n",
" 'mehras': 75638,\n",
" 'lasse': 52209,\n",
" 'visability': 52210,\n",
" 'militarily': 33946,\n",
" \"elder'\": 52211,\n",
" 'gainsbourg': 19023,\n",
" 'hah': 20603,\n",
" 'hai': 13420,\n",
" 'haj': 34736,\n",
" 'hak': 25251,\n",
" 'hal': 4311,\n",
" 'ham': 4892,\n",
" 'duffer': 53259,\n",
" 'haa': 52213,\n",
" 'had': 66,\n",
" 'advancement': 11930,\n",
" 'hag': 16825,\n",
" \"hand'\": 25252,\n",
" 'hay': 13421,\n",
" 'mcnamara': 20604,\n",
" \"mozart's\": 52214,\n",
" 'duffel': 30731,\n",
" 'haq': 30594,\n",
" 'har': 13887,\n",
" 'has': 44,\n",
" 'hat': 2401,\n",
" 'hav': 40919,\n",
" 'haw': 30595,\n",
" 'figtings': 52215,\n",
" 'elders': 15495,\n",
" 'underpanted': 52216,\n",
" 'pninson': 52217,\n",
" 'unequivocally': 27652,\n",
" \"barbara's\": 23673,\n",
" \"bello'\": 52219,\n",
" 'indicative': 12997,\n",
" 'yawnfest': 40920,\n",
" 'hexploitation': 52220,\n",
" \"loder's\": 52221,\n",
" 'sleuthing': 27653,\n",
" \"justin's\": 32622,\n",
" \"'ball\": 52222,\n",
" \"'summer\": 52223,\n",
" \"'demons'\": 34935,\n",
" \"mormon's\": 52225,\n",
" \"laughton's\": 34737,\n",
" 'debell': 52226,\n",
" 'shipyard': 39724,\n",
" 'unabashedly': 30597,\n",
" 'disks': 40401,\n",
" 'crowd': 2290,\n",
" 'crowe': 10087,\n",
" \"vancouver's\": 56434,\n",
" 'mosques': 34738,\n",
" 'crown': 6627,\n",
" 'culpas': 52227,\n",
" 'crows': 27654,\n",
" 'surrell': 53344,\n",
" 'flowless': 52229,\n",
" 'sheirk': 52230,\n",
" \"'three\": 40923,\n",
" \"peterson'\": 52231,\n",
" 'ooverall': 52232,\n",
" 'perchance': 40924,\n",
" 'bottom': 1321,\n",
" 'chabert': 53363,\n",
" 'sneha': 52233,\n",
" 'inhuman': 13888,\n",
" 'ichii': 52234,\n",
" 'ursla': 52235,\n",
" 'completly': 30598,\n",
" 'moviedom': 40925,\n",
" 'raddick': 52236,\n",
" 'brundage': 51995,\n",
" 'brigades': 40926,\n",
" 'starring': 1181,\n",
" \"'goal'\": 52237,\n",
" 'caskets': 52238,\n",
" 'willcock': 52239,\n",
" \"threesome's\": 52240,\n",
" \"mosque'\": 52241,\n",
" \"cover's\": 52242,\n",
" 'spaceships': 17637,\n",
" 'anomalous': 40927,\n",
" 'ptsd': 27655,\n",
" 'shirdan': 52243,\n",
" 'obscenity': 21962,\n",
" 'lemmings': 30599,\n",
" 'duccio': 30600,\n",
" \"levene's\": 52244,\n",
" \"'gorby'\": 52245,\n",
" \"teenager's\": 25255,\n",
" 'marshall': 5340,\n",
" 'honeymoon': 9095,\n",
" 'shoots': 3231,\n",
" 'despised': 12258,\n",
" 'okabasho': 52246,\n",
" 'fabric': 8289,\n",
" 'cannavale': 18515,\n",
" 'raped': 3537,\n",
" \"tutt's\": 52247,\n",
" 'grasping': 17638,\n",
" 'despises': 18516,\n",
" \"thief's\": 40928,\n",
" 'rapes': 8926,\n",
" 'raper': 52248,\n",
" \"eyre'\": 27656,\n",
" 'walchek': 52249,\n",
" \"elmo's\": 23386,\n",
" 'perfumes': 40929,\n",
" 'spurting': 21918,\n",
" \"exposition'\\x85\": 52250,\n",
" 'denoting': 52251,\n",
" 'thesaurus': 34740,\n",
" \"shoot'\": 40930,\n",
" 'bonejack': 49759,\n",
" 'simpsonian': 52253,\n",
" 'hebetude': 30601,\n",
" \"hallow's\": 34741,\n",
" 'desperation\\x85': 52254,\n",
" 'incinerator': 34742,\n",
" 'congratulations': 10308,\n",
" 'humbled': 52255,\n",
" \"else's\": 5924,\n",
" 'trelkovski': 40845,\n",
" \"rape'\": 52256,\n",
" \"'chapters'\": 59386,\n",
" '1600s': 52257,\n",
" 'martian': 7253,\n",
" 'nicest': 25256,\n",
" 'eyred': 52259,\n",
" 'passenger': 9457,\n",
" 'disgrace': 6041,\n",
" 'moderne': 52260,\n",
" 'barrymore': 5120,\n",
" 'yankovich': 52261,\n",
" 'moderns': 40931,\n",
" 'studliest': 52262,\n",
" 'bedsheet': 52263,\n",
" 'decapitation': 14900,\n",
" 'slurring': 52264,\n",
" \"'nunsploitation'\": 52265,\n",
" \"'character'\": 34743,\n",
" 'cambodia': 9880,\n",
" 'rebelious': 52266,\n",
" 'pasadena': 27657,\n",
" 'crowne': 40932,\n",
" \"'bedchamber\": 52267,\n",
" 'conjectural': 52268,\n",
" 'appologize': 52269,\n",
" 'halfassing': 52270,\n",
" 'paycheque': 57816,\n",
" 'palms': 20606,\n",
" \"'islands\": 52271,\n",
" 'hawked': 40933,\n",
" 'palme': 21919,\n",
" 'conservatively': 40934,\n",
" 'larp': 64007,\n",
" 'palma': 5558,\n",
" 'smelling': 21920,\n",
" 'aragorn': 12998,\n",
" 'hawker': 52272,\n",
" 'hawkes': 52273,\n",
" 'explosions': 3975,\n",
" 'loren': 8059,\n",
" \"pyle's\": 52274,\n",
" 'shootout': 6704,\n",
" \"mike's\": 18517,\n",
" \"driscoll's\": 52275,\n",
" 'cogsworth': 40935,\n",
" \"britian's\": 52276,\n",
" 'childs': 34744,\n",
" \"portrait's\": 52277,\n",
" 'chain': 3626,\n",
" 'whoever': 2497,\n",
" 'puttered': 52278,\n",
" 'childe': 52279,\n",
" 'maywether': 52280,\n",
" 'chair': 3036,\n",
" \"rance's\": 52281,\n",
" 'machu': 34745,\n",
" 'ballet': 4517,\n",
" 'grapples': 34746,\n",
" 'summerize': 76152,\n",
" 'freelance': 30603,\n",
" \"andrea's\": 52283,\n",
" '\\x91very': 52284,\n",
" 'coolidge': 45879,\n",
" 'mache': 18518,\n",
" 'balled': 52285,\n",
" 'grappled': 40937,\n",
" 'macha': 18519,\n",
" 'underlining': 21921,\n",
" 'macho': 5623,\n",
" 'oversight': 19507,\n",
" 'machi': 25257,\n",
" 'verbally': 11311,\n",
" 'tenacious': 21922,\n",
" 'windshields': 40938,\n",
" 'paychecks': 18557,\n",
" 'jerk': 3396,\n",
" \"good'\": 11931,\n",
" 'prancer': 34748,\n",
" 'prances': 21923,\n",
" 'olympus': 52286,\n",
" 'lark': 21924,\n",
" 'embark': 10785,\n",
" 'gloomy': 7365,\n",
" 'jehaan': 52287,\n",
" 'turaqui': 52288,\n",
" \"child'\": 20607,\n",
" 'locked': 2894,\n",
" 'pranced': 52289,\n",
" 'exact': 2588,\n",
" 'unattuned': 52290,\n",
" 'minute': 783,\n",
" 'skewed': 16118,\n",
" 'hodgins': 40940,\n",
" 'skewer': 34749,\n",
" 'think\\x85': 52291,\n",
" 'rosenstein': 38765,\n",
" 'helmit': 52292,\n",
" 'wrestlemanias': 34750,\n",
" 'hindered': 16826,\n",
" \"martha's\": 30604,\n",
" 'cheree': 52293,\n",
" \"pluckin'\": 52294,\n",
" 'ogles': 40941,\n",
" 'heavyweight': 11932,\n",
" 'aada': 82190,\n",
" 'chopping': 11312,\n",
" 'strongboy': 61534,\n",
" 'hegemonic': 41342,\n",
" 'adorns': 40942,\n",
" 'xxth': 41346,\n",
" 'nobuhiro': 34751,\n",
" 'capitães': 52298,\n",
" 'kavogianni': 52299,\n",
" 'antwerp': 13422,\n",
" 'celebrated': 6538,\n",
" 'roarke': 52300,\n",
" 'baggins': 40943,\n",
" 'cheeseburgers': 31270,\n",
" 'matras': 52301,\n",
" \"nineties'\": 52302,\n",
" \"'craig'\": 52303,\n",
" 'celebrates': 12999,\n",
" 'unintentionally': 3383,\n",
" 'drafted': 14362,\n",
" 'climby': 52304,\n",
" '303': 52305,\n",
" 'oldies': 18520,\n",
" 'climbs': 9096,\n",
" 'honour': 9655,\n",
" 'plucking': 34752,\n",
" '305': 30074,\n",
" 'address': 5514,\n",
" 'menjou': 40944,\n",
" \"'freak'\": 42592,\n",
" 'dwindling': 19508,\n",
" 'benson': 9458,\n",
" 'white’s': 52307,\n",
" 'shamelessness': 40945,\n",
" 'impacted': 21925,\n",
" 'upatz': 52308,\n",
" 'cusack': 3840,\n",
" \"flavia's\": 37567,\n",
" 'effette': 52309,\n",
" 'influx': 34753,\n",
" 'boooooooo': 52310,\n",
" 'dimitrova': 52311,\n",
" 'houseman': 13423,\n",
" 'bigas': 25259,\n",
" 'boylen': 52312,\n",
" 'phillipenes': 52313,\n",
" 'fakery': 40946,\n",
" \"grandpa's\": 27658,\n",
" 'darnell': 27659,\n",
" 'undergone': 19509,\n",
" 'handbags': 52315,\n",
" 'perished': 21926,\n",
" 'pooped': 37778,\n",
" 'vigour': 27660,\n",
" 'opposed': 3627,\n",
" 'etude': 52316,\n",
" \"caine's\": 11799,\n",
" 'doozers': 52317,\n",
" 'photojournals': 34754,\n",
" 'perishes': 52318,\n",
" 'constrains': 34755,\n",
" 'migenes': 40948,\n",
" 'consoled': 30605,\n",
" 'alastair': 16827,\n",
" 'wvs': 52319,\n",
" 'ooooooh': 52320,\n",
" 'approving': 34756,\n",
" 'consoles': 40949,\n",
" 'disparagement': 52064,\n",
" 'futureistic': 52322,\n",
" 'rebounding': 52323,\n",
" \"'date\": 52324,\n",
" 'gregoire': 52325,\n",
" 'rutherford': 21927,\n",
" 'americanised': 34757,\n",
" 'novikov': 82196,\n",
" 'following': 1042,\n",
" 'munroe': 34758,\n",
" \"morita'\": 52326,\n",
" 'christenssen': 52327,\n",
" 'oatmeal': 23106,\n",
" 'fossey': 25260,\n",
" 'livered': 40950,\n",
" 'listens': 13000,\n",
" \"'marci\": 76164,\n",
" \"otis's\": 52330,\n",
" 'thanking': 23387,\n",
" 'maude': 16019,\n",
" 'extensions': 34759,\n",
" 'ameteurish': 52332,\n",
" \"commender's\": 52333,\n",
" 'agricultural': 27661,\n",
" 'convincingly': 4518,\n",
" 'fueled': 17639,\n",
" 'mahattan': 54014,\n",
" \"paris's\": 40952,\n",
" 'vulkan': 52336,\n",
" 'stapes': 52337,\n",
" 'odysessy': 52338,\n",
" 'harmon': 12259,\n",
" 'surfing': 4252,\n",
" 'halloran': 23494,\n",
" 'unbelieveably': 49580,\n",
" \"'offed'\": 52339,\n",
" 'quadrant': 30607,\n",
" 'inhabiting': 19510,\n",
" 'nebbish': 34760,\n",
" 'forebears': 40953,\n",
" 'skirmish': 34761,\n",
" 'ocassionally': 52340,\n",
" \"'resist\": 52341,\n",
" 'impactful': 21928,\n",
" 'spicier': 52342,\n",
" 'touristy': 40954,\n",
" \"'football'\": 52343,\n",
" 'webpage': 40955,\n",
" 'exurbia': 52345,\n",
" 'jucier': 52346,\n",
" 'professors': 14901,\n",
" 'structuring': 34762,\n",
" 'jig': 30608,\n",
" 'overlord': 40956,\n",
" 'disconnect': 25261,\n",
" 'sniffle': 82201,\n",
" 'slimeball': 40957,\n",
" 'jia': 40958,\n",
" 'milked': 16828,\n",
" 'banjoes': 40959,\n",
" 'jim': 1237,\n",
" 'workforces': 52348,\n",
" 'jip': 52349,\n",
" 'rotweiller': 52350,\n",
" 'mundaneness': 34763,\n",
" \"'ninja'\": 52351,\n",
" \"dead'\": 11040,\n",
" \"cipriani's\": 40960,\n",
" 'modestly': 20608,\n",
" \"professor'\": 52352,\n",
" 'shacked': 40961,\n",
" 'bashful': 34764,\n",
" 'sorter': 23388,\n",
" 'overpowering': 16120,\n",
" 'workmanlike': 18521,\n",
" 'henpecked': 27662,\n",
" 'sorted': 18522,\n",
" \"jōb's\": 52354,\n",
" \"'always\": 52355,\n",
" \"'baptists\": 34765,\n",
" 'dreamcatchers': 52356,\n",
" \"'silence'\": 52357,\n",
" 'hickory': 21929,\n",
" 'fun\\x97yet': 52358,\n",
" 'breakumentary': 52359,\n",
" 'didn': 15496,\n",
" 'didi': 52360,\n",
" 'pealing': 52361,\n",
" 'dispite': 40962,\n",
" \"italy's\": 25262,\n",
" 'instability': 21930,\n",
" 'quarter': 6539,\n",
" 'quartet': 12608,\n",
" 'padmé': 52362,\n",
" \"'bleedmedry\": 52363,\n",
" 'pahalniuk': 52364,\n",
" 'honduras': 52365,\n",
" 'bursting': 10786,\n",
" \"pablo's\": 41465,\n",
" 'irremediably': 52367,\n",
" 'presages': 40963,\n",
" 'bowlegged': 57832,\n",
" 'dalip': 65183,\n",
" 'entering': 6260,\n",
" 'newsradio': 76172,\n",
" 'presaged': 54150,\n",
" \"giallo's\": 27663,\n",
" 'bouyant': 40964,\n",
" 'amerterish': 52368,\n",
" 'rajni': 18523,\n",
" 'leeves': 30610,\n",
" 'macauley': 34767,\n",
" 'seriously': 612,\n",
" 'sugercoma': 52369,\n",
" 'grimstead': 52370,\n",
" \"'fairy'\": 52371,\n",
" 'zenda': 30611,\n",
" \"'twins'\": 52372,\n",
" 'realisation': 17640,\n",
" 'highsmith': 27664,\n",
" 'raunchy': 7817,\n",
" 'incentives': 40965,\n",
" 'flatson': 52374,\n",
" 'snooker': 35097,\n",
" 'crazies': 16829,\n",
" 'crazier': 14902,\n",
" 'grandma': 7094,\n",
" 'napunsaktha': 52375,\n",
" 'workmanship': 30612,\n",
" 'reisner': 52376,\n",
" \"sanford's\": 61306,\n",
" '\\x91doña': 52377,\n",
" 'modest': 6108,\n",
" \"everything's\": 19153,\n",
" 'hamer': 40966,\n",
" \"couldn't'\": 52379,\n",
" 'quibble': 13001,\n",
" 'socking': 52380,\n",
" 'tingler': 21931,\n",
" 'gutman': 52381,\n",
" 'lachlan': 40967,\n",
" 'tableaus': 52382,\n",
" 'headbanger': 52383,\n",
" 'spoken': 2847,\n",
" 'cerebrally': 34768,\n",
" \"'road\": 23490,\n",
" 'tableaux': 21932,\n",
" \"proust's\": 40968,\n",
" 'periodical': 40969,\n",
" \"shoveller's\": 52385,\n",
" 'tamara': 25263,\n",
" 'affords': 17641,\n",
" 'concert': 3249,\n",
" \"yara's\": 87955,\n",
" 'someome': 52386,\n",
" 'lingering': 8424,\n",
" \"abraham's\": 41511,\n",
" 'beesley': 34769,\n",
" 'cherbourg': 34770,\n",
" 'kagan': 28624,\n",
" 'snatch': 9097,\n",
" \"miyazaki's\": 9260,\n",
" 'absorbs': 25264,\n",
" \"koltai's\": 40970,\n",
" 'tingled': 64027,\n",
" 'crossroads': 19511,\n",
" 'rehab': 16121,\n",
" 'falworth': 52389,\n",
" 'sequals': 52390,\n",
" ...}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"word_index = imdb.get_word_index()\n",
"word_index"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### <font color='orange'><center>4. Data preprocessing</center>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"<font color='orange'>The documentation says that this line will contain the indecies of the words used in the reviews, we only need to map them, in order to read the reviews!"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def convert(sequences):\n",
" results = np.zeros((len(sequences), 10000))\n",
" for i, sequence in enumerate(sequences):\n",
" results[i, sequence] = 1\n",
" return results\n",
" \n",
"data = convert(data)\n",
"targets = np.array(targets).astype(\"float32\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### <font color='orange'><center>5. Train and test split</center>"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# Splitting test data to be the first 10_000 items of the dataframe\n",
"X_test = data[:10000]\n",
"y_test = targets[:10000]\n",
"\n",
"# Splitting test data to be the last 10_000 items of the dataframe\n",
"X_train = data[10000:]\n",
"y_train = targets[10000:]"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### <font color='orange'><center>6. Creating the model</center>"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model: \"sequential\"\n",
"_________________________________________________________________\n",
" Layer (type) Output Shape Param # \n",
"=================================================================\n",
" dense (Dense) (None, 32) 320032 \n",
" \n",
" dropout (Dropout) (None, 32) 0 \n",
" \n",
" dense_1 (Dense) (None, 64) 2112 \n",
" \n",
" dropout_1 (Dropout) (None, 64) 0 \n",
" \n",
" dense_2 (Dense) (None, 64) 4160 \n",
" \n",
" dense_3 (Dense) (None, 1) 65 \n",
" \n",
"=================================================================\n",
"Total params: 326,369\n",
"Trainable params: 326,369\n",
"Non-trainable params: 0\n",
"_________________________________________________________________\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-03-11 19:18:19.417778: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/webots/lib/controller:/usr/local/webots/lib/webots\n",
"2023-03-11 19:18:19.417805: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)\n",
"2023-03-11 19:18:19.417828: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (Johnny): /proc/driver/nvidia/version does not exist\n",
"2023-03-11 19:18:19.418125: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA\n",
"To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n"
]
}
],
"source": [
"model = keras.Sequential()\n",
"model.add( layers.Dense(32, activation = tf.keras.activations.relu, input_shape=(10000, )) )\n",
"model.add( layers.Dropout(0.3) )\n",
"model.add( layers.Dense(64, activation = tf.keras.activations.gelu) )\n",
"model.add( layers.Dropout(0.2) )\n",
"model.add( layers.Dense(64, activation = tf.keras.activations.selu) )\n",
"model.add( layers.Dense(1, activation = tf.keras.activations.sigmoid) )\n",
"model.summary()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"model.compile( optimizer = \"adam\",\n",
" loss = \"binary_crossentropy\",\n",
" metrics = [\"accuracy\"] )"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### <font color='orange'><center>7. Training the model</center>"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/3\n",
"40/40 [==============================] - 2s 38ms/step - loss: 0.4378 - accuracy: 0.8041 - val_loss: 0.2722 - val_accuracy: 0.8907\n",
"Epoch 2/3\n",
"40/40 [==============================] - 1s 25ms/step - loss: 0.2279 - accuracy: 0.9111 - val_loss: 0.2610 - val_accuracy: 0.8964\n",
"Epoch 3/3\n",
"40/40 [==============================] - 1s 21ms/step - loss: 0.1695 - accuracy: 0.9365 - val_loss: 0.2791 - val_accuracy: 0.8932\n"
]
}
],
"source": [
"results = model.fit( X_train, y_train, epochs= 3, batch_size = 1024, validation_data = (X_test, y_test) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### <font color='orange'><center>8. Evaluating the model</center>"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The average accuarcy of the model is: 89.34\n",
"The average loss of the modeli is: 27.08\n"
]
}
],
"source": [
"print(\"The average accuarcy of the model is: \", np.mean(results.history[\"val_accuracy\"]).round(4) * 100)\n",
"print(\"The average loss of the modeli is: \", np.mean(results.history[\"val_loss\"]).round(4) * 100)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}