Coventry University Logo
4061CEM - Programming and Algorithms 1
  • Project: Transcoder
    7

Project: Transcoder

This project aims for you to develop a tool that will convert data from one format to another. As it currently stands, it is incomplete; and therefore you have been employed to help finish it. For this project, you can use the 4061CEM Virtual Machine, or you can use your local-machine.

In Cybersecurity, you will often have to work with data that comes in a variety of representations. For example, when creating payloads for exploits; it may be the case that the data switches between bytes that are either represented as their ASCII value, or their binary, decimal and hexadecimal (hex) values.

Viewing Different Formats in the Linux Terminal

Using the Linux terminal in the left-hand pane, you can view the data in different formats using the hexdump and xxd commands. Here is an example of using xxd command:

$ xxd
  This is a line of text, followed by ABC.
  00000000: 5468 6973 2069 7320 6120 6c69 6e65 206f  This is a line o
  00000010: 6620 7465 7874 2c20 666f 6c6c 6f77 6564  f text, followed
  00000020: 2062 7920 4142 432e 0a                    by ABC..

The output from the command consists of an address in the leftmost column. The first address line begins at position zero, whilst the next address line begins at position ten (in a hexadecimal format). In decimal, this would be position sixteen; and this can be verified by counting the number of characters in the string: 'This is a line o'. From the excerpt above, you can easily ascertain the characters \(A\), \(B\), and \(C\). The hexadecimal values for these are: \(41\), \(42\), and \(43\), respectively.

The same command (xxd) can also be used to output the string as binary using the -b flag/option alongside the command:

$ xxd -b
  This is a line of text, followed by ABC.
  00000000: 01010100 01101000 01101001 01110011 00100000 01101001  This i
  00000006: 01110011 00100000 01100001 00100000 01101100 01101001  s a li
  0000000c: 01101110 01100101 00100000 01101111 01100110 00100000  ne of
  00000012: 01110100 01100101 01111000 01110100 00101100 00100000  text,
  00000018: 01100110 01101111 01101100 01101100 01101111 01110111  follow
  0000001e: 01100101 01100100 00100000 01100010 01111001 00100000  ed by
  00000024: 01000001 01000010 01000011 00101110 00001010           ABC..

From the excerpt above, you can easily distinguish the characters \(A\), \(B\), and \(C\). This can be achieved by converting the first binary value at address 24 (\(01000001\)) to hexadecimal (\(41\)). Subsequent binary values, \(01000010\) and \(01000011\), are both \(42\) and \(43\) in hexadecimal form; respectively.

The hexdump command provides a similar response. Without an option provided on the command you will get a hexadecimal dump, but using the -b flag/option will provide an output in octal:

$ hexdump -b
  This is a line of text, followed by ABC.
  0000000 124 150 151 163 040 151 163 040 141 040 154 151 156 145 040 157
  0000010 146 040 164 145 170 164 054 040 146 157 154 154 157 167 145 144
  0000020 040 142 171 040 101 102 103 056 012
  0000029

From the excerpt above, you should be able to see that the characters \(A\), \(B\), and \(C\) are represented in their octal form as \(101\), \(102\) and \(103\), respectively. Converting these to hexadecimal form will provide: \(41\), \(42\) and \(43\) in their respective order.

Comments and Docstrings

Inside the project source code you will find guidance in the form of comments and docstrings. You can find more information on these concepts by searching the internet or referring to the module material in week eleven.

Generally, comments are used for assisting the user that is reading your code. They will describe the intention and use-case of the code. Docstrings on the other hand, are used to describe what each unit of the code does. They would apply to classes, functions, modules and any data that is passed to or stored by these units.

Lines that start with the pound/hashtag symbol (\(\#\)) are the comments, and depending upon the IDE of choice can be highlighted in green or gray (or not at all).

Lines that start with a triple quote (\("""\) or \('''\)) are the docstrings. Using triple quotes enable you to spread documentation across multiple lines. These forms of documentation typically come straight after the name of a function, class or module.

Whilst you are working on this project, it will be expected that you will provide documentation and comments in the body of your implementation.

Testing

For the purpose of this project, PyTest will be used to ensure that the code you write works. Before you make any amendments to the project, you should use PyTest to ensure that the code works beforehand. Inside the root directory of the project the tests can be run using the following terminal command:

$ python3 -m pytest -v ./tests/

This will run the tests in verbose (-v) mode, this will essentially output the tests in a more readable format and will display an output similar to:

$ pytest -v ./tests/
  ================================================================= test session starts ==================================================================
  platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /mnt/c/Users/me/IdeaProjects/transcoder/venv/bin/python3
  cachedir: .pytest_cache
  rootdir: /mnt/c/Users/me/IdeaProjects/transcoder
  collected 3 items

  tests/test_transcoder.py::test_asHex PASSED                                                                                                      [ 33%]
  tests/test_transcoder.py::test_asOctal PASSED                                                                                                    [ 66%]
  tests/test_transcoder.py::test_asBinary PASSED                                                                                                   [100%]

  ================================================================== 3 passed in 0.34s ===================================================================

Setting up the Project

To begin this project, you will need to clone the Transcoder repository, which is available at the following URL:

https://github.coventry.ac.uk/CUEH/4601CEM_Transcoder/

You should create a fork of the repository first, and then clone it. This will create a local copy which enables you to make edits.

Setting up a Virtual Environment

As Python is a popular programming language, there are a lot of libraries/modules available to help you achieve certain functionality with minimal effort. However, when you write code that rely upon these libraries/modules there may be instances where they do not exist on the target platform, or they are incompatible with other libraries/modules on the target system.

In this instance, a virtual environment is used. It is not like a virtual machine, a virtual environment is a directory that consists of a copy of the Python interpreter and libraries; and they are localised for that project only.

For this project, you will use a virtual environment. In order to enable this, you will need to create the new environment using the following command in the root directory of the repository 4061CEM_Transcoder:

$ python3 -m venv venv

This action is only performed once, so there is no need to recall this action each time the project is worked upon. The only instance in which you would repeat this process is when the development machine has been changed.

To use the virtual environment, you need to activate it from within the root directory of the project using the following command:

$ . ./venv/bin/activate

There is an extra period (.) at the start, this tells the shell to import the environment variables from within the file venv/bin/activate/ into the current environment. This action may need to be repeated each time you begin working on the project.

Installing the Libraries/Modules

When it comes to using the particular libraries/modules required for this project, you can install them using the following command:

(venv) $ pip install -r requirements.txt

or

(venv) $ python3 -m pip install -r requirements.txt

Before the dollar symbol ($) is (venv); this denotes that you are working inside the virtual environment.

pip is the Python package installer, and it will read the contents of requirements.txt and download/install the libraries/modules that are listed within it. This action is only performed once, so there is no need to recall this action each time the project is worked upon. The only instance in which you would repeat this process is when the development machine has been changed, or if you have added libraries/modules to the file.

Automating the Environment Set-up

Included as part of the project is a Makefile. This file consists of functions that the make command can read and execute in the terminal. This file will not be discussed in-depth, but it enables you to set up the virtual environment and install requirements with ease by using two commands:

$ make venv

This will create the virtual environment, and once activated you can install the pre-requisite libraries and modules by using:

$ make prereqs

Beginner Task

Your first task will be to look at the code, and identify what each piece of the code does. Once you have gained some familiarity with the project, you are required to add another method of transcoding.

Currently, the project can handle transcoding for: hexadecimal, octal and binary. However, it does not transcode decimal ASCII values. Consider the following input:

>>> ABC

The output for this input would be the ASCII values:

<<< 65 66 67

For this task, you are required to create a function called as_ASCII to provide the conversion. You are then expected to call this function in the same way as you see the existing functions in transcoder.py.

Testing Your Function

To test whether your function behaves correctly, you can run the program. However, if you examine the tests in tests/test_transcoder.py there are set of tests for the function. They are currently commented out, but if you uncomment them and re-run the test: pytest -v ./tests/, and it will ensure whether the function works correctly.

Intermediate Task

There is a built-in (or predefined) function in Python that can accept user input. The function is simply called input() and can be used to alert the user to type in a value. For example:

name = input('What is your name?')
print(f"Hello {name}, and welcome to 4061CEM!")

Essentially, the script above will ask for the user for their name and store this in a variable called name. The print statement will then print a string which include the name captured from the user input.

For this task, you need to adapt the transcoder application, so it accepts a user input, rather than it always using the same starting text.

Advanced Task

The transcoder project you are working on is useful for short strings, or pieces of information. However, when it comes to longer strings it is not very useful. Looking back at the Linux commands xxd and hexdump, it used addresses and wrapping so that the output was manageable.

For this task, you will adopt a similar method of layout for the output. You may consider using the following output:

$ Transcoder v0.2

    >>> Any text could go here...

    Hex
        0x0:        0x41         0x6e         0x79         0x20         0x74         0x65         0x78         0x74
        0x1:        0x20         0x63         0x6f         0x75         0x6c         0x64         0x20         0x67
        0x2:        0x6f         0x20         0x68         0x65         0x72         0x65         0x2e         0x2e
        0x3:        0x2e

    Octal
        0x0:        0o101        0o156        0o171         0o40        0o164        0o145        0o170        0o164
        0x1:         0o40        0o143        0o157        0o165        0o154        0o144         0o40        0o147
        0x2:        0o157         0o40        0o150        0o145        0o162        0o145         0o56         0o56
        0x3:         0o56

    Binary
        0x0:    0b1000001    0b1101110    0b1111001     0b100000    0b1110100    0b1100101    0b1111000    0b1110100
        0x1:     0b100000    0b1100011    0b1101111    0b1110101    0b1101100    0b1100100     0b100000    0b1100111
        0x2:    0b1101111     0b100000    0b1101000    0b1100101    0b1110010    0b1100101     0b101110     0b101110
        0x3:     0b101110

    ASCII
        0x0:           65          110          121           32          116          101          120          116
        0x1:           32           99          111          117          108          100           32          103
        0x2:          111           32          104          101          114          101           46           46
        0x3:           46