Permalink
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
7159CEM-Portfolio-2024/DT.ipynb
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
371 lines (371 sloc)
8.68 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Linear Congruential Random Number Generators" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
">A **linear congruential generator** (**LCG**) is an algorithm that yields a sequence of pseudo-randomized numbers calculated with a discontinuous piecewise linear function. The method represents one of the oldest and best-known pseudorandom number generator algorithms. The theory behind them is relatively easy to understand, and they are easily implemented and fast, especially on computer hardware which can provide modular arithmetic by storage-bit truncation.\n", | |
">\n", | |
">The generator is defined by the recurrence relation:\n", | |
">$$X_{n+1} = \\left( a X_n + c \\right)\\bmod m$$\n", | |
">where $X$ is the sequence of pseudo-random values, and\n", | |
">- $m,\\, 0\\lt m$ is the \"modulus\",\n", | |
">- $a,\\,0 \\lt a \\lt m$ is the \"multiplier\",\n", | |
">- $c,\\,0 \\le c \\lt m$ is the \"increment\",\n", | |
">- $X_0,\\,0 \\le X_0 \\lt m$ is the \"seed\" or \"start value\",\n", | |
">These are integer constants that specify the generator.\n", | |
">If $c=0$, the generator is often called a \"multiplicative congruential generator\" (MCG), or *Lehmer RNG*.\n", | |
">If $c≠0$, the method is called a \"mixed congruential generator\".\n", | |
">\n", | |
">[[Wikipedia](https://en.wikipedia.org/wiki/Linear_congruential_generator)]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Tasks\n", | |
"\n", | |
"- Create an LCG with your Student ID as the modulus $m$, and suitable random values for $a, c$, and the *seed*. (See starter code below.)\n", | |
"- Use Decision Tress (DTs) from the `scikit-learn` library to assess the quality of your chosen PRNG. (If it is easy to predict the next digits then it is less random.)\n", | |
" - Select 3 hyper-parameters and study their effect.\n", | |
"\n", | |
"Explain your reasoning, and justify any choices of the hyperparameters (and/or run experiments to find the optimal ones).\n", | |
"\n", | |
"Evaluate your models, and use visualisation to show the trees and any relevant plots.\n", | |
"\n", | |
"Write a conclusion that summarises your findings, and makes recommendations." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"ExecuteTime": { | |
"end_time": "2022-10-25T11:37:52.790405Z", | |
"start_time": "2022-10-25T11:37:50.972952Z" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"from math import log\n", | |
"from random import randint\n", | |
"from matplotlib import pyplot as plt\n", | |
"from sklearn import tree\n", | |
"from sklearn.model_selection import train_test_split\n", | |
"from sklearn.model_selection import cross_val_score\n", | |
"from sklearn.metrics import r2_score\n", | |
"from sklearn.ensemble import RandomForestClassifier" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Initialisation of the LCG parameters" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Assign suitable values to the fllowing variables." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"#.#.#.#.#.#.# IMPORTANT #.#.#.#.#.#.#\n", | |
"\n", | |
"MODULUS = ............. # Set this to your Student ID\n", | |
"\n", | |
"#.#.#.#.#.#.# IMPORTANT #.#.#.#.#.#.#" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"A = 101\n", | |
"C = 13\n", | |
"SEED = 321" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Base $b$ representation of numbers" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def base_b(n, b):\n", | |
" \"\"\" Get a list representing the number n written in base 'b' \"\"\"\n", | |
" bitlength = 1+int(log(MODULUS)/log(b))\n", | |
" r = []\n", | |
" for _ in range(bitlength):\n", | |
" r.insert(0, n%b)\n", | |
" n //= b\n", | |
" return r" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2]" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"base_b(11,3) # Example: 11 in base 3 is: 2+0*3+1*3^2 --> 102 --> [0,0,...,1,0,2]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## LCG" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def lcg(seed, modulus, a, c):\n", | |
" \"\"\" Linear congruential generator: 𝑋_{𝑛+1} = (𝑎𝑋_𝑛+𝑐) mod 𝑚 \"\"\"\n", | |
" while True:\n", | |
" seed = (a * seed + c) % modulus\n", | |
" yield seed" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"generator = lcg(SEED, MODULUS, A, C)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Data generation" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[32434, 65991, 121936, 93405, 51262, 115779, 88828, 82809, 92170, 49983]" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"stream = [next(generator) for _ in range(10_000)]\n", | |
"stream[:10] # Example" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def get_features(stream, base):\n", | |
" ''' Repalce each random number from 'stream' by a vector of its base b digits '''\n", | |
" return [base_b(n, base) for n in stream]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"data = get_features(stream, base=3)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(32434, [0, 1, 1, 2, 2, 1, 1, 1, 0, 2, 1])" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"stream[0], data[0] # Example" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(9999, 9999)" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"X = data[:-1]\n", | |
"y = data[1:]\n", | |
"len(X), len(y)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"([0, 1, 1, 2, 2, 1, 1, 1, 0, 2, 1], [1, 0, 1, 0, 0, 1, 1, 2, 0, 1, 0])" | |
] | |
}, | |
"execution_count": 12, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"X[0], y[0] # Example" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(7499, 2500)" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)\n", | |
"len(X_train), len(X_test)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# ..................." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"tags": [] | |
}, | |
"source": [ | |
"# Conclusion\n", | |
"\n", | |
"........" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.11.1" | |
}, | |
"toc": { | |
"base_numbering": 1, | |
"nav_menu": {}, | |
"number_sections": false, | |
"sideBar": true, | |
"skip_h1_title": false, | |
"title_cell": "Table of Contents", | |
"title_sidebar": "Contents", | |
"toc_cell": false, | |
"toc_position": {}, | |
"toc_section_display": true, | |
"toc_window_display": false | |
}, | |
"vscode": { | |
"interpreter": { | |
"hash": "6d1e45cadc3597bb8b6600530fbdf8c3eefe919a24ef54d9d32b318795b772e0" | |
} | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |