Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
James_BrownCoursework2/380CT final REVISED.ipynb
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
212 lines (212 sloc)
19.2 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"380CT Coursework (Revised):\n", | |
"\n", | |
"1. Introduction and Definition of the problem\n", | |
"\n", | |
"Let G = (V, E) be a directed, connected graph with x>2 vertexes and n edges. G has no assumed multiple edges and no loops.\n", | |
"\n", | |
"First posed by Sir William Rowan Hamilton in 1857 as a design for a toy (Dalgety, 2002), the Directed Hamiltonian Cycle Problem, (DHCP) is the problem of attempting to find a path through a set of nodes that only contains each node once (Gopalan, 2014). The problem has been posed in many different forms, but all have similarities in scope to the “travelling salesman” problem. \n", | |
"\n", | |
"The Travelling Salesman problem (TSP) revolves around a fictional “salesman” who wishes to go on the shortest possible route between multiple cities, without visiting any city twice. This is similar to the “7 bridges of Konigsberg” problem. This problem is based around the old city of Konigsberg in Prussia, which had 7 different bridges. The problem was created by various people who wanted to find a way of only crossing each bridge once and arriving back at where they started. This was proven to be impossible by Swiss mathematician Leonard Euler in 1741 (Wilkinson, 2018). \n", | |
"\n", | |
"The 7 bridges problem was proven to be impossible, unlike the DHCP and the travelling salesman problems. The NP-completeness of the DHCP was proven by Bertossi and Bonuccelli (Bertossi and Bonuccelli, 1986) and research conducted by Garey and Johnson indicate that any version or permutation of the TSP is classed as NP-Hard (Garey and Johnson, 1990). NP Completeness means that a problem can be computed in polynomial time for any case, whereas NP-hard means a problem is only NP-complete in theory, as there does not exist the computational power to fully calculate the nth degree result of the problem. The TSP is NP hard as the calculation of the paths becomes n!, so the rate of growth in total number of calculations grows exponentially and becomes nearly impossible.\n", | |
"\n", | |
"2. Graph Representation\n", | |
"\n", | |
"When trying to compute a problem featuring random graph instances, it is important that random graphs can be quickly obtained without any hint of bias regarding the construction. Many external libraries exist in python to do this, one of these being NetworkX.\n", | |
"\n", | |
"NetworkX is an open-source package created for python that allows for creation, control and study of networks as required. It also features a multitude of different graph generators and algorithms (NetworkX developers, 2018). Using NetworkX, (or any other similar package), graphs can be created programmatically within pre-set parameters, such as total number of nodes or probability of edge creation between any 2 nodes.\n", | |
"\n", | |
"In order to keep the graphs unbiased, a random set of nodes and edges can be created using the code to do so. The issue arises in the fact that, for the majority of the time, there is inherent bias produced as the programmer has to manually write the code to produce the graph. Truly random graphs are difficult to produce and can be unpredictable in nature.\n", | |
"\n", | |
"In the context of this research, a directed graph that is guaranteed to have a Hamiltonian cycle is required. One of the easiest ways of doing so is by ensuring that the graph is connected. A connected graph is where there are paths between all vertices and there are not any unreachable nodes (Weisstein, n.d.). In the case of directed graphs, this would require paths of both in and out degrees to be coming from each node to every other.\n", | |
"\n", | |
"Another way to ensure a Hamiltonian cycle is to force the graph generators, (via pre-built parameters), to produce graphs with hamiltonian cycles. In his research into cycle generation, Wild proposes a method for the generation of all edges contained in any given graph. He then tweaks his method to produce all hamiltonian cycles contained in the aforementioned graph (Wild, 2007). A solution to the issue of producing hamiltonian graphs could be to do the opposite. Using his algorithms, a cycle could be produced, then any other edges could be added using a generator's more standard functionality, (adding edges by probability). \n", | |
"\n", | |
"\n", | |
"3. Exhaustive Search\n", | |
"\n", | |
"The exhaustive search algorithm is a method that involves testing all possiblities in sequence, in the case of directed graphs, it involves searching through every possible path until a hamiltonian path is found . This is also seen as the worst case scenario for NP problems, as it often takes a large amount of processing (Weisstein, n.d.).\n", | |
"\n", | |
"The big O notation of the exhaustive search is O(n) if used on a flat collection (Kershaw, 2015), meaning that the time it takes to do correlates directly to the size of n. This makes it perfectly ok for small data sets, but as soon as the value of n increases by a large degree, the algorithm can become very heavy on resources. In the case of the DHCP, depth first searching can be used to create the list of candidates (candidates in this case meaning the nodes in the graph and the edges between them). \n", | |
"\n", | |
"Depth first searching is a graph traversal technique that follows the first path that it finds until it reaches a point where it cannot continue, then goes back to a point where it can and does so, until the target node is reached (Papamanthou, 2004). If it were applied to the DHCP, the search would start on one node and traverse each connected node until it could no longer go any further unless it backtracked. At this point, the search would restart and change the nodes it traverses until it either finds a path or it exhausts all of its options. The search then collates all of the candidates of the graph into a list, which is then used by the exhaustive search. I would suggest depth first instead of breadth first, as it can provide a better idea of the entirety of the graph structure, as breadth first is more often used if only part of the graph is required.\n", | |
"\n", | |
"The big O notation of the depth first search is O(n+m), as each node is only visited once, and all edges are only passed once also. Even in the worst case scenario, where the first node is the start point and the end is the very last, the complexity doesn't change as the algorithm only traverses each edge and node once.\n", | |
"\n", | |
"However, as it is the most inefficient algorithm in this report, (as it is not very useful for a large value of V), other solutions are better for the DHCP instead of the exhaustive search.\n", | |
"\n", | |
"\n", | |
"\n", | |
"Pseudocode for Exhaustive search, (algorithmist.com, 2012):\n", | |
"\n", | |
"backtrack(int sol, int depth)\n", | |
"{\n", | |
" if (issolution(sol))\n", | |
" printsolution(sol)\n", | |
" else{\n", | |
" solgenerated = generatesolution()\n", | |
" backtrack(solgenerated, depth+1)\n", | |
" }\n", | |
"}\n", | |
"\n", | |
"In plain English, if it finds the solution right away, it prints it to the screen, else it loops over itself and adds to the counter, (in this case depth). Big O for the above is O(n).\n", | |
"\n", | |
"4. Greedy Algorithm\n", | |
"\n", | |
"Given the inefficiency of the exhaustive search with high values of V and E, other heuristic approaches are required. A greedy algorithm is defined as a solution that makes the best decision based on the local domain of the node, meaning that it chooses the best solution at each node to try and create an optimised solution (Sharma, 2017). Greedy algorithms can be produced in many different ways and in this case I propose that the best method of doing so is to try and maximise the number of vertexes visited whilst not repeating any. The longest path will represent the nearest to the hamiltonian cycle (if it connects back to the starting node).\n", | |
"\n", | |
"As the hamiltonian cycle must feature all nodes in the graph once, finding the longest path of unique nodes using a greedy algorithm is a good approximation of the graph, without wholly analysing every single permutation of all edges and paths. Also, as the DHCP is based around connected graphs, there is a high chance that this approach will provide a viable answer. Connected graphs provide a set of edges that can provide a high probability that the algorithm will find a long path that is similar to the ideal. \n", | |
"\n", | |
"The only major problems facing such an approach are based around the fact that it only attempts pathfinding once. If the algorithm starts in a place that does not lead to a successful path, it will return an incorrect result as it can only cycle once so it cannot try a different path if the path concludes prematurely.\n", | |
"\n", | |
"The main benefit of the greedy algorithm over the exhaustive search is that the time complexity is greatly reduced when n is a large value. The complexity is much lower, as it only attempts the path once, and there are not any repetitions of nodes so there are not any infinite loops caused by the algorithm looking to repeat nodes and making the same decisions. \n", | |
"\n", | |
"\n", | |
"\n", | |
"Pseudocode for a common greedy algorithm problem, The coloured edges (Pemmaraju, 2017):\n", | |
" \n", | |
"Set nonblack be an empty object to host non-black vertices\n", | |
"Let ds be an empty set for hosting the dominating set\n", | |
"Let color be a length-n array, all of whose slots are initialized to white\n", | |
"for each vertex i in the graph do\n", | |
" nonblack.insert(i, L[i].length+1)\n", | |
"end for\n", | |
"(v, whiteDeg) ← nonblack.getmax()\n", | |
"while whiteDeg > 0 do\n", | |
" Save v to ds\n", | |
" if color[v] == white then\n", | |
" for each neighbor j of vertex v do\n", | |
" nonblack.decreaseValue(j, 1)\n", | |
" end for\n", | |
" end if\n", | |
" for each neighbor j of vertex v do\n", | |
" if color[j] == white then\n", | |
" for each neighbor k of vertex j do\n", | |
" nonblack.decreaseValue(k, 1)\n", | |
" end for\n", | |
" color[j] ← gray\n", | |
" end if\n", | |
" end for\n", | |
" color[v] ← black\n", | |
" (v, whiteDeg) ← nonblack.getmax()\n", | |
"end while\n", | |
"return ds\n", | |
"\n", | |
"In plain English, each unvisited node is set as white, and every time a white node is visited it decrements the nonblack counter. The loop ends when enough iterations have occurred, in this case that number is directly linked to the total number of white vertexes that remain unvisited (the (v, whiteDeg) value)\n", | |
"\n", | |
"\n", | |
"5. Meta-Heuristics\n", | |
"\n", | |
"Meta-heuristics are strategies that are designed to create global solutions to problems. The solutions above are merely heuristics; specialised solutions to individual problems, rather than theoretical blanket methods that meta-heuristics produce. In order to use a heuristic method, a meta-heuristic strategy must be established (Bentahar and England, 2018).\n", | |
"\n", | |
"In relation to the DHCP, I feel that GRASP is a better way of approaching it compared to the previously mentioned methodologies. \n", | |
"\n", | |
"GRASP is the Greedy Randomized Adaptive Search Procedure. It has a very similar approach to a standard greedy algorithm, (hence its in the name), but with one fundamental difference. GRASP uses a standard greedy algorithm, but loops until the criterion set is fulfilled and only the best solution is saved. In this case, this increases the likelihood of the hamiltonian cycle being found, as this reduces the effect for the biggest problem regarding the standard greedy approach. The multiple loops mean that if it starts on a node that would previously only lead to a dead-end, there is a chance that the algorithm will break out of the dead-end path and find a better solution.\n", | |
"\n", | |
"\n", | |
"\n", | |
"Pseudocode of a generic GRASP solution (Festa, 2002):\n", | |
"\n", | |
"Procedure GRASP (MAX_ITERATIONS, SEED)\n", | |
"Best_Solution = 0;\n", | |
"Read_Input ( );\n", | |
"for k = 1,2,…, MAX_ITERATIONS do\n", | |
" Solution = GreedyRandomizedConstruction (SEED);\n", | |
" Solution = LocalSearch (Solution);\n", | |
" if (Solution is better than Best_Solution) then\n", | |
" UpdateSolution (Solution, Best_Solution);\n", | |
" endif\n", | |
"endfor\n", | |
"return (Best_Solution);\n", | |
"end GRASP\n", | |
" \n", | |
"In English, the GRASP has a value for the “best solution”, which stores the shortest path. It only updates this if the path discovered is “better”, (could be longer or shorter depending on the criteria). The looping comes from the for loop, with a maximum iteration number to prevent it from continuing ad infinitum. The search itself is merely a greedy local search, but with different start points as provided by the GreedyRandomizedConstruction variable.\n", | |
"\n", | |
"6. Special Cases\n", | |
"\n", | |
"Regarding the DHCP, there are various special cases that can completely change the behaviour and plausibility of a hamiltonian cycle emerging from the graph.\n", | |
"\n", | |
"One such case is the Tournament graph. A tournament graph is where any given pair of vertexes are connected by one directed edge (Wagner, 2007). In his research, László Rédei posed the theory and successfully proved that all tournament graphs have hamiltonian cycles (Rédei, 1934). \n", | |
"\n", | |
"The proof is as follows; the assumption is made that any tournament on the vertexes of (n-1) have a hamiltonian path. The path is then assumed to be (v1, ..., Vn-1). For each new vertex added to the graph, a new edge is drawn. As the edge is drawn either from v to v1 or v(n-1) to v, the graph remains hamiltonian because either of the aforementioned edges completes the cycle regardless of any others (Bhide, 2012).\n", | |
"\n", | |
"\n", | |
"7. Conclusion and Reflection\n", | |
"\n", | |
"In conclusion, given the problem at hand, if n is not a large number then the exhaustive algorithm is a perfectly serviceable albeit not the most elegant solution. The most consistently effective solution (that I have found) is the GRASP meta-heuristic. It provides the best approximation of the graph's hamiltonian path, whilst also not taking an extortionate amount of time to compute it. The use of a more standard greedy method is best when the graph is fully connected and the speed of the calculation is more important than outright accuracy.\n", | |
"\n", | |
"In conducting this research, I feel that I have learned more about the complexities of networks and the importance of graphs even in our daily lives. Problems such as the TSP are faced everyday by millions who commute to work and need to figure out their optimal route, or even the tournament graphs which can be applied to sports tournament structures or even mating schedules and statistics, (Wagner, 2007).\n", | |
"\n", | |
"If I were to conduct this again, I would ensure that I could research more into the more practical uses of these algorithms and the problems that they solve. If I could use these algorithms in day to day life, (such as figuring out the most efficient way of traversing motorway junctions), it could be of great use and, in the case of driving, a good money saving tactic.\n", | |
"\n", | |
"\n", | |
"8. References\n", | |
"\n", | |
"Dalgety, J., (2002), “The Icosian Game” [online], available from <https://www.puzzlemuseum.com/month/picm02/200207icosian.htm> [23 April 2018]\n", | |
"\n", | |
"Gopalan, K., (2014), “The Hamiltonian Cycle Problem is NP-Complete”, [online] available from <https://www.cs.umd.edu/~gasarch/COURSES/452/F14/hamtalk.pdf>, [24 April 2018]\n", | |
"\n", | |
"Wilkinson, C.J., (2018), “Solving The Unsolvable – The Seven Bridges of Konigsberg: Euler’s Path to Genius” [online] available from <https://europebetweeneastandwest.wordpress.com/2018/01/28/solving-the-unsolvable-the-seven-bridges-of-konigsberg-eulers-path-to-genius/, [24 April 2018]>\n", | |
"\n", | |
"Bertossi, A.A., Bonuccelli, M.A., (1985) \"Hamiltonian circuits in interval graph generalizations\". Information Processing Letters. (23:4) pp. 195-200\n", | |
"\n", | |
"Garey, M.R., Johnson, D.S., (1990), “Computers and Intractability; a Guide to the Theory of NP-Completeness”. New York, USA: W. H. Freeman & Co.\n", | |
"\n", | |
"NetworkX developers, (2018), “Software for complex networks”. [online] available from <https://networkx.github.io>, [24 April 2018]\n", | |
"\n", | |
"Weisstein, E.W., (n.d.), ”Connected Graph”. [online] available from <http://mathworld.wolfram.com/ConnectedGraph.html>. [24 April 2018]\n", | |
"\n", | |
"Wild, M., (2007) \"Generating all cycles, chordless cycles, and Hamiltonian cycles with the principle of exclusion\". Journal of Discrete Algorithms, (6:1) pp. 93-102\n", | |
"\n", | |
"Weisstein, E.W., (n.d.) \"Exhaustive Search\". [online] available from <http://mathworld.wolfram.com/ExhaustiveSearch.html>. [25 April 2018]\n", | |
"\n", | |
"Kershaw, R., (2015) \"Which Search Algorithm - Part 1\". [online] available from <http://www.rudikershaw.com/articles/whichsearch>. [25 April 2018]\n", | |
"\n", | |
"Papamanthou, C., (2004) \"Depth First Search & Directed Acyclic Graphs\". [online] available from <https://www.csd.uoc.gr/~hy583/reviewed_notes/dfs_dags.pdf>. [25 April 2018]\n", | |
"\n", | |
"algorithmist.com, (2012) \"Exhaustive Search\". [online] available from <http://www.algorithmist.com/index.php/Exhaustive_Search> [25 April 2018]\n", | |
"\n", | |
"Sharma, A., (2017) \"Basics of Greedy Algorithms\". [online] available from <https://www.hackerearth.com/practice/algorithms/greedy/basics-of-greedy-algorithms/tutorial/>. [25th April 2018]\n", | |
"\n", | |
"Kemmaraju, S., (2017) \"Pseudocode and Analysis of the Greedy Algorithm for the Minimum Dominating Set problem\". [online] available from <http://homepage.cs.uiowa.edu/~sriram/3330/spring17/greedyMDS.pdf>. [25 April 2018]\n", | |
"\n", | |
"Bentahar, K., England, M., (2018), \"Tackling NP-Hard Problems\". [online] available from <https://cumoodle.coventry.ac.uk/pluginfile.php/2124884/mod_resource/content/4/L10.pdf>. [25th April 2018]\n", | |
"\n", | |
"Festa, P., (2002), \"\"Greedy Randomized Adaptive Search Procedures\"[online] available from <https://pdfs.semanticscholar.org/0ed5/cd60ee943801c60f42987f1eda2e106ba6be.pdf> [26 April 2018]\n", | |
"\n", | |
"Wagner, F., (2007) \"Hardness Results for Tournament Isomorphism and Automorphism\" [online] available from <https://www.uni-ulm.de/fileadmin/website_uni_ulm/iui.inst.190/Mitarbeiter/wagner/TourIsoLB030907.pdf> [26 April 2018]\n", | |
"\n", | |
"Rédei, L., (1934) \"Ein kombinatorische Satz\", Acta Litteraria Szeged, (7) pp 39-43\n", | |
"\n", | |
"Bhide, R., (2012) \"Redei's theorem\". November 18. available from <http://ravi-bhide.blogspot.co.uk/2012/11/redeis-theorem.html> [17 April 2018]\n" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.4" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |