Version Control

Writing software is not easy. Very soon after "Hello World", it becomes difficult to finish software in one sitting, and keeping everything related to the current project in your mind all the time becomes impossible. Working with other people adds even more complexity. Edits can conflict each other, and sometimes you can spend a long time working on something that has to be undone, or you find that two developers have made changes to the same section of code and it needs to be merged somehow.

The idea of version control is not new. It has evolved from a simple record of what has happened to a code-base into a variety of systems that have different ways of dealing with versions, releases, collaboration and so on.

Git is one of the most common version control systems in use today. It was developed for helping the Linux kernel developers work together, but now is used in many software projects and has become an important part of modern software development. Info on the origin of git can be found here: https://git-scm.com/book/en/v2/Getting-Started-A-Short-History-of-Git

This is the quick start info, and there is so much more that git can do. For more information, this article is very good. For a quick reference, this cheat-sheet is recommended.

The Key Concepts

Here we will explore the concepts involved in using git.

The Repository

This is the package of files that you are version-controlling, and the record of what you've done. In the example below, you see that I was in a repository that had no changes untracked. The repository stores a version of the files and can tell when any have changed.

Example of a repository with no changes

Changes

As we write more code, edit documentation and so on, we move further from the original state of the repository. This can be seen with the git status command.

Example of a repository with changes

You can see that README.md has been edited, and code.py has been created.

We can add these changes to the repository by staging and committing them.

Staging Changes

This lets us select which changes we want to make part of the record. If we'd decided our edits were wrong, here we could forget them and go back to the last version (or some older version, even).

Here, the changes to README.md are staged:

Example of staging a change

More than one can be added:

Example of staging another change

Committing Changes

Now we can approve these changes and have them recorded by the repository. We always give a "commit message" when doing this, which should justify the collected changes.

Example of committing a series of changes

From this point, you can continue adding to the changes and developing your work. If you need to check for differences, undo changes and so on, the git cheat-sheet (https://www.atlassian.com/git/tutorials/atlassian-git-cheatsheet) is a good reference.

Working with a Remote Origin

So far, we have worked with a repository that is "local". We aren't really seeing how this helps with group work.

If we have a repository on a host like https://github.coventry.ac.uk, then we can clone that repository and keep it linked in a sense that we can push changes back and pull new changes made by others.

Cloning

Here, you can see the web interface to the remote repository for a project called Transcoder:

Web interface to the Transcoder repository

Clicking the "clone or download" button gives us a reference for the repository we can use to clone it. In this case, it is git@github.coventry.ac.uk:CUEH/transcoder.git.

At the command prompt, we can clone this with git clone git@github.coventry.ac.uk:CUEH/transcoder.git.

Cloning a remote repository

As you can see, the content of the repository is now in the directory transcoder.

We can make changes in this directory, stage them and commit them as normal. The difference is that now, as long as we have access rights, we can push those changes back.

Pushing Changes

If the repository was cloned as described above, then it is very simple to push changes back. Below a new file is created, staged, committed and pushed:

Pushing to a remote repository

Pulling Changes

The final key process in working with a repository is pulling. This fetches changes that exist in the remote repository but not your local one. So, if someone else had pushed changes, you could now incorporate them into your own local copy.

Below, we see that although the local repository had no changes, there were changes in the remote version that were applied locally using git pull.

Pulling changes

Forking

Often, we want to build on the work of others. If that work is in a publicly readable repository, or one we have access to, then we can create a fork. This is essentially making a new remote copy that we can pull from, push to and so on.

The simplest way to do this is from the web interface. In the top right corner of the screenshot below, you can see the "fork" button.

Forking a repository

Clicking on this will allow you to create a new remote repository that is identical to the original, but owned by you. You will then be able to use it as if it were a repository you had created yourself, but will include the development history up to the point of the fork.

Branches

A useful overview: https://www.nobledesktop.com/learn/git/git-branches

A branch is a different version of your code. You can start a branch at any point, and work on your code knowing the other branch still exists as you left it. Later, you can merge branches if you want to combine your new code into the other branch, or take parts selectively.

Branches are used for allowing people to work on different features of code separately without destabilising each other during development; or to allow multiple people to work on the same code-base and merge later; or any number of other reasons.

Listing branches

git branch shows what branches you have locally. If you didn't create a branch locally, you might not have downloaded it with your git clone command and so you can list the remote branches with git branch -r

Switching between branches

If your branch is already local, you can switch with git checkout blah, where blah is the name of the branch.

If the branch is not local, then you can still switch to it with git checkout --track origin/blah. The origin tells git to look where the repo came from, and the --track option means it will stay related to the remote branch rather than being a local branch that starts off the same and has the same name¹.

You can get back to the main branch with git checkout master. The master branch is the one you use by default if you have no other branches.

Creating branches

This is as simple as git checkout -b blah. The new blah branch will be created and any commits now will go into that branch until you change branch again.

Merging branches

Unless things are very different, you can often merge a branch into master just by switching to master (git checkout master) and performing a merge command like git merge blah. If there are conflicting edits, git will save the files with highlighted areas for you to resolve and commit to the repository.

This is not quite true, but it makes sense practically. In reality you can have diverging branches even with --track and you can make branches push to the remote repository even if they didn't start there. ↩