Compilers

Introduction

Topics

  • Compilers
  • Compiler Stages
  • Linking

Compilers

Overview

  • Take High level Human Readable source code
  • Converts to code ready for execution on a system

Types

  • Compile to Binary code (C / Rust)
  • Intermediate Byte Code (Java, .net)

History

  • Grace Hopper (1952)
  • FORTRAN / John Backus (1957)

History

Very few people are really symbol manipulators. If they are they become professional mathematicians, not data processors. It’s much easier for most people to write an English statement than it is to use symbols. So I decided data processors ought to be able to write their programs in English, and the computers would translate them into machine code.

Compiler Stages

GCC Style Compiler Stages

GCC See stages

$gcc --save-temps -masm=intel example.c

High Level Code

  • Write Code in whatever language
  • High Level “Source Code” for the program

High Level Code


#include 
#include 

#define BUFFSIZE 30
#define PASSWORD "EasyPassword"

void main(void){
  char buffer[BUFFSIZE];

  printf("Enter Password> ");
  fgets(buffer, BUFFSIZE, stdin);
  //Remove the Newlines
  buffer[strcspn(buffer, "\r\n")] = 0;
  printf("\nYou put %s\n", buffer);

  if (strcmp(buffer, PASSWORD) == 0){
    printf("Win!!\n");
  }
  else
    printf("Lose\n");
     
}

Prepossessing

  • .i file
  • Create temporary file containing combined code
  • Gives us a work area and avoids modifying source

Prepossessing

  • Remove Comments
  • Load Libraries / references to library functions
  • Expanding Macros

Compiling

  • Generate Intermediate Code
    • Several Stages to this Process
    • Different Compilers have different type of intermediate code
  • This point Syntax Errors are found.

Compiling Lexing

  • Convert code into a series of “Tokens”
  • Tokens have specific meaning and help understand code contents

Compiling: Lexing

  • printf("Hello World");
    • printf
    • ( and )
    • "Hello World"
    • ;

Compiling: Lexing

  • identifier: x, y, printf
  • keyword: if, while
  • separator: {, (, ;
  • operator: +, -, *
  • literal: true, 3.14, "hello"

Compiling Lexing

  • Breaks into tokens based on a set of rules
    • White space
    • Separators
    • Regular Expressions

Compiling: Parsing

  • Can be blurry where Lexing ends and Parsing Starts.
  • Constructs meaning from the Tokens
  • Takes a sequence of tokens and forms a Parse Tree / Abstract Syntax Tree

Compiling: Parsing

  • Parsing Process.
    • Context Free Grammar
    • Finite State Autonoma.

Compiling: AST

https://keleshev.com/abstract-syntax-tree-an-example-in-c/

Compiling: Intermediate Code

  • Trees are converted into intermediate code form
  • Clang LLVM
  • GCC uses pure Assembly.

Compiling: Optimisation

  • Optimisation takes place here
    • Loop Unrolling
    • Redundancy Optimisation
    • Shifting branches out of loops.
    • Rearranging loops as “do, while”

Compiling: Optimising

  • https://www.msreverseengineering.com/blog/2014/6/23/compiler-optimizations-for-reverse-engineers

Compiling: Linking

  • Convert intermediate to a executable

  • Intermediate full of References to functions

    • Can be in our main program
    • May be shared libraries

Compiling: Linking

  • Static and Dynamic Linking
    • Static: Include library function in compiled code
    • Dynamic: Load library code at runtime.

Dynamic Linking / Loader

Why

  • May have references to addresses in code
    • PIE
    • dynamically linked libraries

Lookup Tables: PLT

  • Procedural Linkage Table
    • Array containing details of dynamic function calls
    • References the GOT

Lookup Tables: GOT

  • Global Offset Table
    • Maps the references in the PLT to actual memory address it represents
  • We will play with this next term

Dynamic Section

  • Has details of various dynamic elements of a binary
  • First set of details for Shared Library entries

Dynamic Section

$ readelf -d Password                                                                                              DDynamic section at offset 0x2de0 contains 26 entries:
Tag        Type                         Name/Value
0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

Dynamic Section

Labs/Asm$ ldd Password                                                   
        linux-vdso.so.1 => linux-vdso.so.1 (0x00007fff5bff4000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007ffa1fcdd000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007ffa1fefc000)

Resolving Addresses

  • Gloss over some of the magic here of Lazy loading here
  • Start State
    • PLT has the Stubs for GOT
    • GOT is empty

Resolving Addresses

  • Examine PLT to get details of GOT entry

  • Ask GOT for address

    • If Exists return it
    • Otherwise, calculate offset, and update the GOT.

Next Week

Next Time

  • Examining Live Memory and GDB.