Introduction to Computer Vision

Dr Ian Cornelius

Hello

Hello (1)

Learning Outcomes

Understand the concept of computer vision
Understand and apply the concept and theory behind a common computer vision framework
Demonstrate knowledge on how to use a computer vision framework in a body of work

Computer Vision

Computer Vision (1)

A field to gain a level of understanding from digital media
- represents an object and their characteristics
Situated within Artificial Intelligence (AI)
Initial computer vision experimentation began in 1959
- cats where shown an array of images to correlate a response in its bran
- they responded to hard edges or lines
- computers were able to scan images to digitize them during this time
1963 saw the transformation of 2D images into 3D
1974 saw the introduction of optical character recognition (OCR)
1982 saw the introduction of algorithms to discern edges and shapes
2000 focused upon object recognition
- applications released in 2001 focussing on facial recognition

Computer vision is a subset within artificial intelligence, and it enables computers and systems to derive meaningful information from digital images, videos or other visual artefacts. You may consider AI to enable computers to think, whereas computer vision enables a computer to see and observe its surroundings. It can be used to represent a given object in the real world and present their characteristics in a manner understood by a machine.

The initial experimentation with computer vision began in 1959 with cats. They were shown a collection of images and attempted to correlate a response in their brains. Scientists discovered that cats often discern hard edges or lines from the images, and this led to image processing, where it began with simple shapes such as straight edges. It was during this time that the first computer image scanning technology was being developed. This enabled computers to digitize and acquire images.

This set off a catalyst of research in the realm of computer vision. Briefly:

in 1963, computers were able to transform two-dimensional images into three-dimensional
the 1960’s marked the beginning of the AI quest to solve human vision problems
in 1974, we were introduced to optical character recognition (OCR) which would recognise text printed in any font or typeface
in 1982, a neuroscientist was able to establish that vision works hierarchically and algorithms were introduced to detect edges, corners, curves and basic shapes
the year 2000 saw a shift in focus on object recognition, and 2001 saw the first real-time facial recognition applications being introduced

Computer Vision (2)

Computer Vision vs. Image Processing

Computer vision is distinct from image processing
Image Processing creates a new image from pre-existing images
- i.e. simplification or enhancing of content
- often not concerned with the content of an image
Computer Vision is concerned with automating tasks
- i.e. object recognition and tracking of a said object

We have looked very briefly at what computer vision is and the history of it. But how does it and image processing correlate. Well, firstly, they are both different things.

Image processing is concerned with the creation of new images from pre-existing images. This could be simplifying the image to only show edges or lines that exist within it. Or it could be concerned with enhancing the content of the image, i.e. adjusting its brightness and contrast.

Computer vision on the other hand is where a computer or machine is able to gain a high-level understanding from the input of digital images or videos. This is done with a purposeful task in mind, such as automating tasks that the visual system by humans can do (i.e. object recognition/tracking). It will use many techniques to do this, and image processing is only one of those techniques.

Computer Vision (3)

Why do we need Computer Vision?

Thousands of images/videos are now publicly available
- cameras exist on smartphones and laptops
- sharing it is becoming easier
Digital world enabled to interact with physical
Indexing and searching text is easy, images are not
- knowledge on what an image contains is required
Machines need to see an image and understand its content
It allows us to understand the content of digital images
- via the use of algorithms to reproduce human vision
- thus being able to discern objects and people

In this day and age, cameras are readily available. You may have one beside you or in your pocket right now. This enables us to capture images whilst we go about completing the mundane tasks of our daily life. These images are often shared amongst each other, whether it be via social media or messaging applications. This means that the real world is now able to interact with the digital world.

However, some major caveats to this explosion of images being captured is the amount of data that is stored within them. A digital image is essentially a matrix full of numbers (more on this later), and this makes it inherently difficult to search within them, unlike text. To search an image, we need to know the context of that image, i.e. what is it showing? and what are the colours inside it?

To do this, computers require a method to see the image and understand the content from it. Researchers over time have provided us with algorithms that have been able to reproduce human vision and discern a digital image. For example, we now have the ability to be able to recognise people or objects from an image and/or video.

Computer Vision (4)

Human Vision

Humans can easily perceive the world, machines on the other hand cannot
- it is second nature for us to gather information from our surroundings
Objects can be perceived in less than a second
- describe the content of photos and videos after a single glance

Computer vision is able to work in much the same way as human vision does. Us, as humans, are able to easily perceive the world we are within. Machines on the other hand cannot, that is without the use of algorithms. Humans have a major advantage over machines, and that we have the lifetime of context to train us on telling objects apart. For example, we know the difference between a cat and a dog.

It is almost a second nature for us to be able to gather information from our surroundings. We have the ability to be able to distinguish objects in less than a second, and we can describe the contents of photo from after a single glance. For example, the image on the right-hand-side of the screen is that of the Engineering and Computing building. From this image we can see a blue sky, some clouds, and many, many people. A computer on the other hand would not see this without the use of an algorithm.

Computer Vision (5)

Computer Vision

It is more complex for computers to see an image
- instead, it processes text and images as numbers
These numbers are otherwise known as pixels

It is a complex task for computers to see an image. As such, computer vision is used to train machines to perform actions such as seeing the number of people from the image on the previous slide. To do this, the computer will process an image as a series of numbers. These numbers are otherwise known as pixels, whereby each pixel will consist of a sequence of numbers.

On the right-hand-side of the screen, you can see a blown-up part of an image. Each square (or block of colour) is a pixel, and this pixel will have numbers associated with it to give it a colour. Often, these are referred to as RGB values, where RGB stands for red, green and blue, respectively.

Structure of an Image

Structure of an Image (1)

Images are made up of lots of small elements known as pixels
Each pixel corresponds to a given value
- a single-bit pixel is grayscale
- a three-bit pixel is colour
  - i.e. red, green, blue (RGB) or blue, green, red (BGR)
Each bit of a pixel is interpreted as an integer
- for grayscale, a value between 0 and 255
  - i.e. (255)
- for RGB, a value between 0 and 255, but in three components
  - i.e. (255, 201, 154)

In the previous slide, we touched upon pixels in brief, and now we shall go into a little more detail regarding them. As I have mentioned, each pixel in an image will correspond to a given value. This value is a collection of numbers; however, when we are concerned with gray-scale images, it will consist of a single-bit value. Whereas coloured images are made up of three-bit values. A coloured image will typically be made up of either red, green and blue (RGB) values or blue, green and red (BGR) values; more on this shortly.

Each bit found in a pixel is interpreted as an integer value, generally between the values of zero and two-hundred and fifty-five. For a grayscale image, it is a single integer value, whereas an RGB would be a set of three values: one for red, one for green and the other for blue.

For example, in a grayscale image, value 255 would represent white, whereas 0 would represent black. For a coloured image, and using the RGB colour model, a set consisting of the values 255, 201 and 154 would represent a shade of the colour orange.

Structure of an Image (2)

Images can be defined as a two-dimensional matrix
The matrix on the screen represents a ten-by-ten image
- known as a single channel image
Each value in this image is \(0\), therefore, the resulting image is a black square

blackSquare = [
    # Row 1                         # Row 2
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    # Row 3                         # Row 4
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
]

You will have heard me refer to an image as a matrix, and that would be a two-dimensional array or list. Let’s consider the black square on the screen, and imagine this to be a ten-by-ten image. This means there are ten rows and ten columns. You should know that when we multiply the width and height of an image, we get the number of pixels that exist inside it. In this instance, the image on our screen will consist of one hundred pixels. We can work this out manually by counting the number of zeroes in the multidimensional list on the screen.

Each row in the matrix will consist of ten elements, representing the integer value zero. Remember from earlier that our value zero in a single channel will represent the colour black. Therefore, the first line of our image will be ten black pixels. As the image is completely black, then each other row in our matrix will be exactly the same. However, what would happen if we changed just one of these values in our matrix to another value, such as 255? Well, in this case, that individual pixel will be white instead of black.

Structure of an Image (3)

The matrix on the screen represents a 10x10 image
- known as a three channel image
Each value in this image is a different value, which results in the image being a red square
- a tone of red…

redSquare = [
    # Row 1
    [   # Column 1     # Column 2     # Column 3     # Column 4     # Column 5
        [31, 31, 187], [31, 31, 187], [31, 31, 187], [31, 31, 187], [31, 31, 187],
        # Column 6     # Column 7     # Column 8     # Column 9     # Column 10
        [31, 31, 187], [31, 31, 187], [31, 31, 187], [31, 31, 187], [31, 31, 187]
    ],
    "...",
    # Row 10
    [   # Column 1     # Column 2     # Column 3     # Column 4     # Column 5
        [31, 31, 187], [31, 31, 187], [31, 31, 187], [31, 31, 187], [31, 31, 187],

The previous example would be considered to be a gray-scale image, as it consisted of a one-bit pixel. However, when it comes to a colour image, it is exactly the same as before, but instead we are dealing three bits. Each bit will represent each colour value of red, green and blue. When it comes to this method, it can become quite messy with how things are structured.

On the screen we can see a red square. To the right of this square we have a Python list representation of it. We can see that each row of the matrix consists of a sub-list, which holds three values: 31, 31 and 187. These are the individual values for red, green and blue, respectively, to make up the red colour for that pixel, and it is repeated for each column in our pixel (otherwise known was the width).

The three dots (or ellipsis) in this manner represent some truncated information. If I tried to repeat this ten times in a row, and then as a column, it would be quite long, messy and confusing. However, take my word, for what it is, and understand the list is repeated ten times in a row, and the row is then repeated ten times itself. Just like our previous example for the grayscale image.

Color Models

Color Models (1)

A protocol for representing colours, making them easily reproducible
Popular models in computer vision are RGB, BGR and gray-scale
Other color models such as HSV may be used
- video compression and device independent storage
Colour models discussed in this lecture:
- RGB
- CMYK
- Grayscale
- HSV/HSB

This leads us nicely onto our colour models. I have already discussed three-colour models in this lecture, red, green and blue (RGB), blue, green and red (BGR) and grayscale. However, what do we mean by colour model? Well, a color model is used to define a color. The model describes how the color will appear on the computer screen, and there are three popular color models in computer vision:

red, green blue, aka RGB
blue, green red, aka BGR
and gray-scale

Over the next few slides, we shall look at each of these colour models and how they work. We shall also look at two different colour models, known as CMYK and HSV/HSB. Do not worry about what these acronyms stand for, as these will be introduced to you later in this lecture.

Color Models (2)

RGB

Abbreviation for: Red, Green and Blue
An additive color model
Uses a collection of three intensities for each pixel
- red, green and blue
- intensities of each value are mixed in this color space

Red = 165
Green = 34
Blue = 90

rgbColor = (165, 34, 90)

You should be aware by now that the RGB colour model utilises each color of red, green and blue to make a color. The model is often referred to as an additive model and is often referred to as primary colors. These primary colors can be added to produce a secondary color, such as magenta (by adding red and blue together), cyan (by adding green and blue together), or yellow (by adding red and green together).

RGB is important as the color model, as it relates closely to the way a human eye can perceive colour. It is considered to be one of the basic color models for computer graphics as monitors will use RGB values to create the desired color. Unfortunately, RGB is not very efficient when dealing with real-world images.

Color Models (2)

CMYK

Abbreviation for: Cyan, Magenta, Yellow, and Black
A subtractive color model
Calculate colors by a process of subtraction

C = 0
M = 79%
Y = 45%
K = 35%

cmykColor = (0, 0.79, 0.45, 0.35)

The CMYK model is a subset of the RBG model and is primarily used in commercial and home printing devices. It is an acronym for cyan, magenta, yellow and black. This model is a subtractive model, meaning that each of the colour pigments or inks are applied to a white surface to subtract some color from the white surface; this will result in the final color.

For example, for us to get cyan, we would need to subtract red from white. Each color channel in CMYK is measured from zero to one-hundred percent, and this will tell the printer the relative density of each ink that is required. This model is not often used in computer vision applications, but it was an important color model to include as it is often used in printing.

Color Models (3)

Grayscale

Uses a single intensity for each pixel

Red = 115
Green = 115
Blue = 115

grayscaleColor = (115, 115, 115) # or (115)

Color Models (4)

HSV/HSB

Abbreviation for: Hue Saturation Value
- or Hue Saturation Brightness
Colours and intensity are provided separately
- provides a robustness to any lighting changes that may occur

Hue = 334°
Saturation = 79%
Value/Brightness = 65%

The HSV color models were developed to provide a more intuitive model for approximating the way that we, humans, perceive and interpret color. HSV stands for hue saturation model, and may sometimes be referred to as the hue saturation brightness model.

The hue will define the color itself, and the values for this axis will vary from zero to three-hundred and sixty. This will begin with the red and run through to green, blue and other intermediary colors.

The saturation indicates the degree to which the hue will differ from a neutral grey. These values will run from zero, meaning no colour saturation, to one-hundred percent, which is the fullest saturation for the given hue at a given illumination.

The value component indicates the illumination level. This value will vary from zero to one-hundred percent, with zero being black, i.e. no light and one-hundred being white, i.e. full illumination.

This colour model is sometimes used in computer vision libraries. This is because the HSV model separates the image intensity from the colour information. This is most useful in computer vision, as it can aid in the removal of shadows or any lighting changes in the image.

Computer Vision Library: OpenCV

Computer Vision Library: OpenCV (1)

OpenCV is an abbreviation for Open Source Computer Vision
- originally a research project at Intel
Library consisting of computer vision and machine learning tools
- Consists of over 2,500 algorithms
The library consists of interfaces for Python and C++
- and Java! (👍 or 👎?)
Not a requirement for computer vision
- however, it is one of the easiest, capable and well-supported options

OpenCV Contribution - Extended Modules

A separate collection of modules that consists of ‘non-free’ algorithms
- SIFT, SURF etc.
It can be unstable as it is not well-tested

You now know about computer vision, its brief history and how an image is constructed and represented on a machine via various color models. We can now get to the interesting part and discuss a popular framework used by programmers when it comes to working with images and creating computer vision projects, and that is OpenCV.

OpenCV is an abbreviation for Open Source Computer vision, and it was originally a research project at Intel. The framework is licensed under a BSD licence, and as such, it is free for use in an academic and commercial use-case. However, there are some algorithms that are non-free meaning they are not for use in commercial use-cases. However, for academic use, it is perfectly fine.

The library consists of over 2,500 algorithms for computer vision and machine-learning. There is an interface for C, C++, Python and the Java-programming languages. It is also cross-platform compatible, working for Windows, Linux, macOS, iOS and Android. During its initial development, it was concerned with real-time applications for computational efficiency, which resulted in all the libraries being written in an optimised C/C++ language to take advantage of multicore processing.

OpenCV is not a requirement for computer vision, there are other frameworks and libraries that can be used. However, it is one of the easiest and one of the most capable options that are available. There is also an abundance of support through a variety of websites, such as StackOverflow and third-party blogging websites.

There is an additional library for OpenCV known as the contribution module. This has a separate collection of algorithms that are considered non-free. Due to some licensing agreements, certain algorithms are not allowed for commercial products, and as such they have been separated from the main OpenCV framework and bundled in a separate module. However, these algorithms are perfectly fine to be used in an academic use-case, i.e. research and learning.

Some algorithms in this library may also be untested and considered to be unstable, so as usual, the methodology of “user-beware” is most important.

Computer Vision Library: OpenCV (2)

Applications of OpenCV

Used for a wide variety of applications, such as:
- Image and Video Processing
  - i.e. color space/model conversion, image smoothing, and transformations
- Facial Recognition
- Object Recognition

Some popular applications that utilise OpenCV are image and vide processing. For example, the library can be used to change the color model of an image. You may have a colour image, but want to convert it to a grayscale image to increase the efficiency of your algorithm. A grayscale image has a lot less data to read than a color image (you should remember this from those matrices earlier…).

Facial recognition is becoming popular in terms of security (especially on mobile devices). The OpenCV framework provides an entire library to assist in the detection of faces from an image or video. Going as far to discern whether a person is happy or not. Finally, libraries are also available for object recognition. My own PhD thesis was utilising OpenCV to detect and track an object moving in the scene and predict the next direction of movement.

Computer Vision Library: OpenCV (3)

Installing OpenCV

Installation will depend upon your platform
- major platforms will have pre-built libraries

Windows Installation

Three methods of installation for Windows:
1. third party installer if you are using C++
2. via the Python package manager
  - i.e. python3 -m pip install python-opencv
3. compiling/building from the OpenCV sources

Linux Installation

Two methods of installation for Linux:
1. via the package manager of your distribution
  - i.e. apt install libopencv-dev python3-opencv
2. compiling/building from the OpenCV sources

When it comes to installing OpenCV, there are multiple methods. In some cases, the installation process can be horrific and deeply frustrating… or it can be easy. As a computer scientist, I always like to take the hard route and compile the sources of OpenCV. However, for yourselves, you may want to take the easier route and install the framework via a package manager of your Linux distribution or Python.

However, if you decided to go hard-mode then guides have been written and distributed in lab activities for you to follow. Keep in mind, this is not the faint-hearted. However, a major advantage of compiling the libraries from source is that you can customise it to the architecture of your machine, and in some cases it can be more efficient.

Computer Vision Library: OpenCV (4)

Compiling and Building from Source

OpenCV is an open-source computer vision framework
- hence the open in its name
Compilation of the latest version from source is an option for enthusiasts
Extra benefits are provided:
- i.e. CUDA and CuDNN support, as well as cross-compilation for different architectures
Feeling brave?
- follow this guide for instructions on how to compile from source

Now, if you are like me and want to get a few grey hairs, then we can go down the hard route and compile the framework from its sources. Through this method, you are able to compile the latest (and in some cases not released) version of the framework. You also get additional benefits such as configuring the library to make use of CUDA and CuDNN libraries. These additional libraries can make use of the GPU to perform some calculations and thus speeding up the overall efficiency of your code.

If you are feeling brave, I have provided a guide which will walk you through the process of compiling OpenCV from sources on a Linux and Windows platform. These guides compile the OpenCV library for both Python and C++.

Goodbye

Goodbye (1)

Questions and Support

Questions? Post them on the Community Page on Aula
Additional Support? Visit the Module Support Page
Contact Details:
- Dr Ian Cornelius, ab6459@coventry.ac.uk