Version Control
Writing software is not easy. Very soon after "Hello World", it becomes difficult to finish software in one sitting, and keeping everything related to the current project in your mind all the time becomes impossible. Working with other people adds even more complexity. Edits can conflict each other, and sometimes you can spend a long time working on something that has to be undone, or you find that two developers have made changes to the same section of code and it needs to be merged somehow.
The idea of version control is not new. It has evolved from a simple record of what has happened to a code-base into a variety of systems that have different ways of dealing with versions, releases, collaboration and so on.
Git is one of the most common version control systems in use today. It was developed for helping the Linux kernel developers work together, but now is used in many software projects and has become an important part of modern software development. Info on the origin of git can be found here: https://git-scm.com/book/en/v2/Getting-Started-A-Short-History-of-Git
This is the quick start info, and there is so much more that git can do. For more information, this article is very good. For a quick reference, this cheat-sheet is recommended.
The Key Concepts
Here we will explore the concepts involved in using git.
The Repository
This is the package of files that you are version-controlling, and the record of what you've done. In the example below, you see that I was in a repository that had no changes untracked. The repository stores a version of the files and can tell when any have changed.
Changes
As we write more code, edit documentation and so on, we move further
from the original state of the repository. This can be seen with the
git status
command.
You can see that README.md
has been edited, and code.py
has been
created.
We can add these changes to the repository by staging and committing them.
Staging Changes
This lets us select which changes we want to make part of the record. If we'd decided our edits were wrong, here we could forget them and go back to the last version (or some older version, even).
Here, the changes to README.md
are staged:
More than one can be added:
Committing Changes
Now we can approve these changes and have them recorded by the repository. We always give a "commit message" when doing this, which should justify the collected changes.
From this point, you can continue adding to the changes and developing your work. If you need to check for differences, undo changes and so on, the git cheat-sheet (https://www.atlassian.com/git/tutorials/atlassian-git-cheatsheet) is a good reference.
Working with a Remote Origin
So far, we have worked with a repository that is "local". We aren't really seeing how this helps with group work.
If we have a repository on a host like https://github.coventry.ac.uk, then we can clone that repository and keep it linked in a sense that we can push changes back and pull new changes made by others.
Cloning
Here, you can see the web interface to the remote repository for a project called Transcoder:
Clicking the "clone or download" button gives us a reference for the repository we can use to clone it. In this case, it is git@github.coventry.ac.uk:CUEH/transcoder.git
.
At the command prompt, we can clone this with git clone git@github.coventry.ac.uk:CUEH/transcoder.git
.
As you can see, the content of the repository is now in the directory transcoder
.
We can make changes in this directory, stage them and commit them as normal. The difference is that now, as long as we have access rights, we can push those changes back.
Pushing Changes
If the repository was cloned as described above, then it is very simple to push changes back. Below a new file is created, staged, committed and pushed:
Pulling Changes
The final key process in working with a repository is pulling. This fetches changes that exist in the remote repository but not your local one. So, if someone else had pushed changes, you could now incorporate them into your own local copy.
Below, we see that although the local repository had no changes, there
were changes in the remote version that were applied locally using
git pull
.
Forking
Often, we want to build on the work of others. If that work is in a publicly readable repository, or one we have access to, then we can create a fork. This is essentially making a new remote copy that we can pull from, push to and so on.
The simplest way to do this is from the web interface. In the top right corner of the screenshot below, you can see the "fork" button.
Clicking on this will allow you to create a new remote repository that is identical to the original, but owned by you. You will then be able to use it as if it were a repository you had created yourself, but will include the development history up to the point of the fork.
Branches
A useful overview: https://www.nobledesktop.com/learn/git/git-branches
A branch is a different version of your code. You can start a branch at any point, and work on your code knowing the other branch still exists as you left it. Later, you can merge branches if you want to combine your new code into the other branch, or take parts selectively.
Branches are used for allowing people to work on different features of code separately without destabilising each other during development; or to allow multiple people to work on the same code-base and merge later; or any number of other reasons.
Listing branches
git branch
shows what branches you have locally. If you didn't
create a branch locally, you might not have downloaded it with your
git clone
command and so you can list the remote branches with git
branch -r
Switching between branches
If your branch is already local, you can switch with git checkout
blah
, where blah
is the name of the branch.
If the branch is not local, then you can still switch to it with git
checkout --track origin/blah
. The origin
tells git to look where
the repo came from, and the --track
option means it will stay
related to the remote branch rather than being a local branch that
starts off the same and has the same name1.
You can get back to the main branch with git checkout master
. The
master
branch is the one you use by default if you have no other
branches.
Creating branches
This is as simple as git checkout -b blah
. The new blah
branch
will be created and any commits now will go into that branch until you
change branch again.
Merging branches
Unless things are very different, you can often merge a branch into
master just by switching to master (git checkout master
) and
performing a merge command like git merge blah
. If there are
conflicting edits, git will save the files with highlighted areas for
you to resolve and commit to the repository.
-
This is not quite true, but it makes sense practically. In reality you can have diverging branches even with
--track
and you can make branches push to the remote repository even if they didn't start there. ↩