This project aims for you to develop a tool that will convert data from one format to another. As it currently stands, it is incomplete; and therefore you have been employed to help finish it. For this project, you can use the 4061CEM Virtual Machine, or you can use your local-machine.
In Cybersecurity, you will often have to work with data that comes in a variety of representations. For example, when creating payloads for exploits; it may be the case that the data switches between bytes that are either represented as their ASCII value, or their binary, decimal and hexadecimal (hex) values.
Using the Linux terminal in the left-hand pane, you can view the data in different formats using the hexdump and xxd commands. Here is an example of using xxd command:
$ xxd
This is a line of text, followed by ABC.
00000000: 5468 6973 2069 7320 6120 6c69 6e65 206f This is a line o
00000010: 6620 7465 7874 2c20 666f 6c6c 6f77 6564 f text, followed
00000020: 2062 7920 4142 432e 0a by ABC..
The output from the command consists of an address in the leftmost column. The first address line begins at position zero, whilst the next address line begins at position ten (in a hexadecimal format). In decimal, this would be position sixteen; and this can be verified by counting the number of characters in the string: 'This is a line o'. From the excerpt above, you can easily ascertain the characters \(A\), \(B\), and \(C\). The hexadecimal values for these are: \(41\), \(42\), and \(43\), respectively.
The same command (xxd) can also be used to output the string as binary using the -b flag/option alongside the command:
$ xxd -b
This is a line of text, followed by ABC.
00000000: 01010100 01101000 01101001 01110011 00100000 01101001 This i
00000006: 01110011 00100000 01100001 00100000 01101100 01101001 s a li
0000000c: 01101110 01100101 00100000 01101111 01100110 00100000 ne of
00000012: 01110100 01100101 01111000 01110100 00101100 00100000 text,
00000018: 01100110 01101111 01101100 01101100 01101111 01110111 follow
0000001e: 01100101 01100100 00100000 01100010 01111001 00100000 ed by
00000024: 01000001 01000010 01000011 00101110 00001010 ABC..
From the excerpt above, you can easily distinguish the characters \(A\), \(B\), and \(C\). This can be achieved by converting the first binary value at address 24 (\(01000001\)) to hexadecimal (\(41\)). Subsequent binary values, \(01000010\) and \(01000011\), are both \(42\) and \(43\) in hexadecimal form; respectively.
The hexdump command provides a similar response. Without an option provided on the command you will get a hexadecimal dump, but using the -b flag/option will provide an output in octal:
$ hexdump -b
This is a line of text, followed by ABC.
0000000 124 150 151 163 040 151 163 040 141 040 154 151 156 145 040 157
0000010 146 040 164 145 170 164 054 040 146 157 154 154 157 167 145 144
0000020 040 142 171 040 101 102 103 056 012
0000029
From the excerpt above, you should be able to see that the characters \(A\), \(B\), and \(C\) are represented in their octal form as \(101\), \(102\) and \(103\), respectively. Converting these to hexadecimal form will provide: \(41\), \(42\) and \(43\) in their respective order.
Inside the project source code you will find guidance in the form of comments and docstrings. You can find more information on these concepts by searching the internet or referring to the module material in week eleven.
Generally, comments are used for assisting the user that is reading your code. They will describe the intention and use-case of the code. Docstrings on the other hand, are used to describe what each unit of the code does. They would apply to classes, functions, modules and any data that is passed to or stored by these units.
Lines that start with the pound/hashtag symbol (\(\#\)) are the comments, and depending upon the IDE of choice can be highlighted in green or gray (or not at all).
Lines that start with a triple quote (\("""\) or \('''\)) are the docstrings. Using triple quotes enable you to spread documentation across multiple lines. These forms of documentation typically come straight after the name of a function, class or module.
Whilst you are working on this project, it will be expected that you will provide documentation and comments in the body of your implementation.
For the purpose of this project, PyTest will be used to ensure that the code you write works. Before you make any amendments to the project, you should use PyTest to ensure that the code works beforehand. Inside the root directory of the project the tests can be run using the following terminal command:
$ python3 -m pytest -v ./tests/
This will run the tests in verbose (-v) mode, this will essentially output the tests in a more readable format and will display an output similar to:
$ pytest -v ./tests/
================================================================= test session starts ==================================================================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /mnt/c/Users/me/IdeaProjects/transcoder/venv/bin/python3
cachedir: .pytest_cache
rootdir: /mnt/c/Users/me/IdeaProjects/transcoder
collected 3 items
tests/test_transcoder.py::test_asHex PASSED [ 33%]
tests/test_transcoder.py::test_asOctal PASSED [ 66%]
tests/test_transcoder.py::test_asBinary PASSED [100%]
================================================================== 3 passed in 0.34s ===================================================================
To begin this project, you will need to clone the Transcoder repository, which is available at the following URL:
https://github.coventry.ac.uk/CUEH/4601CEM_Transcoder/
You should create a fork of the repository first, and then clone it. This will create a local copy which enables you to make edits.
As Python is a popular programming language, there are a lot of libraries/modules available to help you achieve certain functionality with minimal effort. However, when you write code that rely upon these libraries/modules there may be instances where they do not exist on the target platform, or they are incompatible with other libraries/modules on the target system.
In this instance, a virtual environment is used. It is not like a virtual machine, a virtual environment is a directory that consists of a copy of the Python interpreter and libraries; and they are localised for that project only.
For this project, you will use a virtual environment. In order to enable this, you will need to create the new environment using the following command in the root directory of the repository 4061CEM_Transcoder:
$ python3 -m venv venv
This action is only performed once, so there is no need to recall this action each time the project is worked upon. The only instance in which you would repeat this process is when the development machine has been changed.
To use the virtual environment, you need to activate it from within the root directory of the project using the following command:
$ . ./venv/bin/activate
There is an extra period (.) at the start, this tells the shell to import the environment variables from within the file venv/bin/activate/ into the current environment. This action may need to be repeated each time you begin working on the project.
When it comes to using the particular libraries/modules required for this project, you can install them using the following command:
(venv) $ pip install -r requirements.txt
or
(venv) $ python3 -m pip install -r requirements.txt
Before the dollar symbol ($) is (venv); this denotes that you are working inside the virtual environment.
pip is the Python package installer, and it will read the contents of requirements.txt and download/install the libraries/modules that are listed within it. This action is only performed once, so there is no need to recall this action each time the project is worked upon. The only instance in which you would repeat this process is when the development machine has been changed, or if you have added libraries/modules to the file.
Included as part of the project is a Makefile. This file consists of functions that the make command can read and execute in the terminal. This file will not be discussed in-depth, but it enables you to set up the virtual environment and install requirements with ease by using two commands:
$ make venv
This will create the virtual environment, and once activated you can install the pre-requisite libraries and modules by using:
$ make prereqs
Your first task will be to look at the code, and identify what each piece of the code does. Once you have gained some familiarity with the project, you are required to add another method of transcoding.
Currently, the project can handle transcoding for: hexadecimal, octal and binary. However, it does not transcode decimal ASCII values. Consider the following input:
>>> ABC
The output for this input would be the ASCII values:
<<< 65 66 67
For this task, you are required to create a function called as_ASCII to provide the conversion. You are then expected to call this function in the same way as you see the existing functions in transcoder.py.
To test whether your function behaves correctly, you can run the program. However, if you examine the tests in tests/test_transcoder.py there are set of tests for the function. They are currently commented out, but if you uncomment them and re-run the test: pytest -v ./tests/, and it will ensure whether the function works correctly.
There is a built-in (or predefined) function in Python that can accept user input. The function is simply called input() and can be used to alert the user to type in a value. For example:
name = input('What is your name?')
print(f"Hello {name}, and welcome to 4061CEM!")
Essentially, the script above will ask for the user for their name and store this in a variable called name. The print statement will then print a string which include the name captured from the user input.
For this task, you need to adapt the transcoder application, so it accepts a user input, rather than it always using the same starting text.
The transcoder project you are working on is useful for short strings, or pieces of information. However, when it comes to longer strings it is not very useful. Looking back at the Linux commands xxd and hexdump, it used addresses and wrapping so that the output was manageable.
For this task, you will adopt a similar method of layout for the output. You may consider using the following output:
$ Transcoder v0.2
>>> Any text could go here...
Hex
0x0: 0x41 0x6e 0x79 0x20 0x74 0x65 0x78 0x74
0x1: 0x20 0x63 0x6f 0x75 0x6c 0x64 0x20 0x67
0x2: 0x6f 0x20 0x68 0x65 0x72 0x65 0x2e 0x2e
0x3: 0x2e
Octal
0x0: 0o101 0o156 0o171 0o40 0o164 0o145 0o170 0o164
0x1: 0o40 0o143 0o157 0o165 0o154 0o144 0o40 0o147
0x2: 0o157 0o40 0o150 0o145 0o162 0o145 0o56 0o56
0x3: 0o56
Binary
0x0: 0b1000001 0b1101110 0b1111001 0b100000 0b1110100 0b1100101 0b1111000 0b1110100
0x1: 0b100000 0b1100011 0b1101111 0b1110101 0b1101100 0b1100100 0b100000 0b1100111
0x2: 0b1101111 0b100000 0b1101000 0b1100101 0b1110010 0b1100101 0b101110 0b101110
0x3: 0b101110
ASCII
0x0: 65 110 121 32 116 101 120 116
0x1: 32 99 111 117 108 100 32 103
0x2: 111 32 104 101 114 101 46 46
0x3: 46