Documentation

Documentation is one of the most important things in programming and it is also the most underappreciated. Well documented code to read requires no frowns or "hold on a minute..." internal monologue. It is liken to the majestic soul caresses of lo-fi jams, while cuddling puppies, that makes the world a better place. Badly written code is like tearing the tapestry of the mind with a bucket of angry kittens while playing experimental jazz and deathcore at the same time.

If you remember nothing else from this section then remember this...

Quote

"Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live" ― John Woods

When you're learning to code, it's always better to over-document than it is to under-document. This is because when you later look back at what you coded it will make much more sense if you have comments and docstrings everywhere, and you'll be able to pick up the thread of what you were thinking a lot faster. A piece of code is also read far more than it is written, so good documentation will allow people to understand what's going on easier and may even go on to reuse the code you write in their future projects.

So how do we write good python code with good documentation??

Writing good python code in general, with the right amount and placing of white space and general good coding practices, are all detailed in the PEP8 style guides for python programming here. There is even a section for comments where you can see the style preferences for comments and how they should be used.

Docstrings

Good documentation starts with good docstrings however, the PEP257 guide to docstrings is the definitive and immortalised conventions to docstring style. But what is a docstring??

A docstring is a section of comments that happen inside a function that starts and ends with three double quotes """. Here is an example of a well-written docstring using the PEP257 style guide.

example-1.py from the beginning to the end   
#!/usr/bin/env python3
# example-1.py
# Small program showing well written docstrings and documentation. 

def counter(array):
    """Return the total of all the integer and float elements of the passed list."""
    total = 0 
    for i in array:
        # This is a comment about testing. You should do it!!
        if isinstance(i, int) or isinstance(i, float):
            total += i
    return(total)

def counter2(array):
    """Returns the total of all the integer and float elements of the passed list.
    Any element not an integer or a float is discarded. 

    args: 
        array: assorted array of data types, all non integer and floats are discarded by the algorithm. 

    returns: 
        total: the variable which is the sum of all integers and floats from the inputted array.
    """
    total = 0 
    for i in array:
        # This is a comment about testing. You should do it!!
        if type(i) is int or type(i) is float:
            print(i)
            total += i
        else:
            print(i)
    return(total)

def array_maker():
    """Returns an assorted array of data types. 

    TODO:
        * add randomness to the selection.
    """
    array_of_random_data = [6, 9, 3.1, "joker", 3, 
                            77, True, "King", 3.14 
                            ]
    return(array_of_random_data)

print(counter(array_maker()))
print(counter2(array_maker())) 

Example

In the 'counter' function, we can see the docstring is very short, comprising of only one line to tell you what the function intends to accomplish. In the 'counter2' function the docstring is much larger and comprises of much more detail. This includes the arguments being passed to the function and their assumed types, with what is returned from the function and a brief description detailing what the use case of the function is. In truth, both functions do much the same thing and both are good docstrings as this is an arbitrary example. However, 'counter2' has a much better docsting detailing the expected input is to have a mixed array of data and that data is lost in the use of this function. This code is much easier to maintain later because if you were looking for bottlenecks in the processing speed of your program you could easily see that if the arrays here were quite large it would be a prime candidate for a bottleneck function. If you were looking for where data is also going missing you could find it easily with this docstring.

Caution

Although it is good practice when you are learning to code to add lots of comments as a study aid, good documentation in production code as a rule doesn't include things that could be worked out easily by introspection. In example-1, all of the docstrings could technically be worked out through introspection of the code, but they facilitate understanding and readability for someone new to the code or having to maintain it, and they facilitate your learning. As a rule, because these functions technically introduce side effects by discarding data, or if the functions modified data in some other non-arbitrary way, it is good practice to write a docstring for them. A good example is that adding two integers is something everyone can do so if a function does only this then it probably doesn't need a docstring, if it does a series of complex arithmetic that then returns its result then it might be worth a short docstring explaining why the calculation is necessary and what its use cases are. When in doubt, add documentation.

Comments

Programmatically, comments can be added to your program in any language in two distinct ways, single line comments which can also sometimes be added to the end of a line, and multiline or block comments. Python only supports writing comments using the hashtag symbol (#, pronounced 'hash' not 'pound') but depending on how you use them they can be considered a block comment or a single line comment. Consider this short example.

example-2.py from the beginning to the end   
#!/usr/bin/env python3
# example-2.py
# Small program showing well written comments for documentation. 

import inspect 

# This is a block comment because it starts at the beginning of a line. 
# it describes the code (some or all of the code) that comes after it. 
# 
# if you need multiple paragraphs then make sure they are seperated by 
# a blank line with only the comment/hashtag symbol as above. 

def addition_of_integers(a, b):
    total = 0
    if isinstance(a, int) and isinstance(b, int):   # This is an inline comment describing only this line. 
        total = a + b
    else: 
        print(f"You must use integers with this function: {inspect.currentframe().f_code.co_name}")  # Prints its own function name using inspect.currentframe().
    return total


addition_of_integers(4.5, 2) 

As you can see, comments are easy to implement in python, just make sure that whatever comments you put in your code are useful and don't just point out the obvious. The inline comment in our example used a full sentence to point out that inspect.currentframe() will allow us to print out the functions name. This comment is useful because it's rare you will want to implement something like this in your own code, it's not obvious from the name what it does unlike "addition_of_integers", and even if you have never seen or implemented code like this before you now know what it does thanks to the comment. Consider these points when writing your own code.

Comments should: - Enhance understanding of the code below it or in the same line as it. - Form complete sentences to avoid misinterpretation. - Use explicit language to to state plainly what is happening in the code. - Use the 'TODO' tag to remind you of functionality to be added later, or to remind you to come back to something (also true in docstrings).

Comments shouldn't: - Reiterate a point that can be gained through brief introspection. e.g. if a function is called 'adds_two_integers' and the comment says: "# adds two integers together" - Form vague broken sentences that can be misinterpreted such as: "# the i above" - Use in-jokes or implicit language that implies something rather than being specific about the comment's intent. e.g. "# the meaning of life + the thing from that_function" - Be used in the place of a good docstring.

Take these points with a pinch of salt while learning, but as you become more proficient try to adhere to them more as this is what would be expected of your code in a production environment.

Documentation generation

So why all these rules? Can't you just write whatever and as long as it runs it's fine? Is documentation really worth it if I know what I coded? These are all the questions that I often here when I tout the reasons for good documentation. The truth is that it's a good habit to get into if you are going to write production code, or work on a project that you want to share with the hacker community at some point. If it's just a quick reverse shell you copy and pasted then you probably don't need even a single comment. If you get into these good habits however, you will save yourself a huge amount of time when it comes to documentation auto-generators as they'll be able to read and extract all of your docstrings and comments and everything with minimal effort.

Pdoc

There are a lot of automated documentation tools out there but we'll concentrate on pdoc as its cross compatible across python versions, available through pip, and runs straight from the command line. You can find other popular tools here if you want to try some other auto-generation. The other advantage of pdoc is that it creates the html needed to make an api webserver directly.

Note

As a hacker you may not want to have very well documented code if it's used in an exploit. Nicely named variables, comments and docstrings can all be found when the exploit is disassembled from a python executable. This obviously makes it easier to understand what's going on for the security professional that is reverse engineering your exploit. Consider the use case of your code when writing it and really think about how much documentation should be included, when in doubt, document everything.