TITLE(«
The reason people have trouble wrapping their heads around git is
because they have been braindamaged by Github and Gitlab.
», __file__)
SECTION(«Version Control Systems»)
The term version control (also revision control
or source control) refers to the change management
of files. For example, some sort of change management is needed if a
team of geographically dispersed people concurrently make changes to
the files that comprise a document, fixing mistakes and adding new
content over time.
A simple form of change management is "version control by email"
where the collaborators send revised copies of the document to each
other. While this approach can work for a small group of authors, it
quickly gets messy as more people are involved. One problem arises
when more than one person change the current version at the same
time because it is unclear how these changes should be combined,
in particular if the changes conflict with each other. For example,
a conflict arises if one person adds a reference to a portion of text
while a second person rewords the referenced text and moves it to a
different chapter. One way to work around this problem is to require
that only a single person is supposed to make changes at any given
time. This person edits the files and sends the revised version to
the next author. While this approach avoids conflicts, it is highly
inefficient. Version control systems (VCSs) are software tools
which help the collaborators to maintain different versions of a set of
files and to deal with concurrent and potentially conflicting changes.
SUBSECTION(«Centralized and Distributed Version Control Systems»)
The idea of a VCS is to track the changes that are made to a tree of
files over time. All revisions of all files are stored in a database
which is called the repository of the project. The
recorded changes are organized as commits. Besides the
file contents, each commit carries metadata about the change, like
date and time and the name and the email address of the person who
made the change. Moreover, each time a commit is being created, the
author is asked to provide a commit message, a text which
is supposed to document why this particular change was made.
Most VCSs are content agnostic in that they do not know or
care about the types of the files that are stored in the repository. In
order to visualize the difference between two versions of a file,
they have to rely on third party tools which understand the file
format. For plain text files, they usually employ the diff
algorithm and file format. The exercises of this section invite you to
take a look at the diff(1)
command and its counterpart,
patch(1)
. A rough understanding of the diff format is
fundamental for using any VCS.
The basic operations offered by VCSs are to retrieve ("check out")
old versions of the tree, to list the difference ("diff") between
two revisions, and to create new revisions from modified versions
of tracked files ("check in"). Most VCSs also have the concept of
branches. Branching becomes necessary if there is no single
"master" version, but two or more different versions that have to be
maintained concurrently. For example, in a repository which contains
the source code of some software project, there might be one "stable"
branch where only bugs get fixed, while new features are developed
in another "development" branch. Another feature each VCS needs to
implement is some kind of download service which lets (authenticated)
collaborators download a copy of the repository.
VCSs can be classified as being either centralized or
distributed. A centralized VCS is characterized by taking a
client-server approach to change management. For a centralized VCS, the
basic operations outlined above all involve the server. For example, to
check in a change, the server is contacted to record the new revision
in its database. With distributed VCSs, on the other hand, there is
no central instance. Instead, all repositories are self-contained
in that they contain the full change database. Repositories are
synchronized in a peer-to-peer fashion, which has many advantages,
including speed and scalability. This is why many people consider
centralized VCSs obsolete.
SUBSECTION(«VCS History»)
Probably the earliest VCS was the Source Code Control System
(SCCS) which was originally developed at Bell Labs in 1972. Although
it was single file based and hence did not have the concept of
a repository, it was the dominant VCS for Unix in the 1970s and
1980s. SCCS was also used as the "backend" for newer VCSs, notably RCS
(1982) and CVS (1990). The latter was the dominant VCS of the 1990s,
at least in the open source world. It was eventually superseded
by Subversion (SVN), initially released in 2000, which is
conceptually similar to CVS. In the late 1990s the distributed VCS
emerged, and have rendered the older centralized VCSs like CVS and SVN
obsolete. As of 2018, there are several distributed VCSs under active
development, of which git (started 2005 by Linus Torvalds,
the creator of Linux) is the most popular by far. We won't discuss
other VCSs further.
EXERCISES()
v1.txt
and the new text in v2.txt
.
Then run diff -u v1.txt v2.txt
to produce a diff. You
will notice that the diff output contains some additional lines.
Explain the meaning of these lines. diff -u v1.txt v2.txt
> v1-v2.diff
. Then run patch v1.txt < v1-v2.diff
.
Examine v1.txt
, then run diff v1.txt v2.txt
to confirm that the two files are identical. diff
and patch
utilities which are unrelated to version control.
»)
SECTION(«Basic Git Usage»)
SUBSECTION(«Getting Help»)
- CMD(«git help») shows commands.
- CMD(«git help pull») help for pull.
- CMD(«git pull -h») short overview of pull options.
SUBSECTION(«clone, init»)
- get started
- init: get a new repository
- clone: copy a repository
EXERCISES()
- read CMD(«git help init») and CMD(«git help clone»)
- create an empty repository and clone it
SUBSECTION(«add, commit»)
- add files
- commit changes
EXERCISES()
- add files to both repositories
- commit changes, write commit summary
- change files again
- commit changes again
HOMEWORK(«
- Initialize a new repository. Create an empty file
CMD(«fruits.txt»), add it to the staging area with CMD(«git
add») and commit it.
- Use CMD(«printf "apple\npear\n" >fruits.txt») to add some
fruits to the file. Add the modified file to the staging
area.
- Use CMD(«printf "orange\n" >>fruits.txt») to modify the
file again.
- CMD(«git status») will show the file twice, why?
- Which version of the file (which fruits) will be committed
by CMD(«git commit -m "new fruits arrived"»)?
- How do you get the version with oranges commited?
», «
The second CMD(«git add») command adds the "apple and pear" version
to the staging area. Appending CMD(«orange») does not change what
has been staged, so the first version is listed under "Changes to be
committed", while the just modified version is listed under "Changes
not staged for commit". A simple CMD(«git commit») will commit the
staged version. To commit the other version (with oranges) one must
add the file again and then run CMD(«git commit»).
»)
SUBSECTION(«log»)
- view commit history
EXERCISES()
- Look at log in both repositories
SUBSECTION(«fetch, merge, pull»)
- get changes from others
- pull is fetch + merge
EXERCISES()
- use 'git pull' to get both repositories into the same state
- try create an edit conflict: change both repositories and use pull
- resolve the edit conflict
SUBSECTION(«checkout, reset»)
- reset: move HEAD
- checkout: undo changes, get older version
EXERCISES()
- Use checkout to look at older versions of your project
SUBSECTION(«tags, branches»)
- tag a release
- branch to start a new experimental feature
EXERCISES()
- Create a new branch, modify files
- Use checkout to switch between master and new branch
SUBSECTION(«alias»)
- remote: manage aliases for remote repositories
EXERCISES()
- use CMD(«git remote -v») on both repositories
SECTION(«Commit Graph»)
The git version control system has been designed for
EMPH(«distributed») development where more than one person
makes changes to the source tree simultaneously and each does
so independently of the other. The history of a source tree that
evolves in this manner can not be described by a simple linked list
of changes which could sequentially be applied to the original source
tree in order to obtain the "current version" of the tree. In fact,
there is no such thing as a "current version". Moreover, in general
two commits are not related to each other in the sense that the second
commit comes after the first, or vice versa. Instead, the relationship
between commits can only be described adequately by a structure known
in graph theory as EMPH(«directed, acyclic graph») (DAG).
Many git commands operate on the DAG that corresponds to the commits
of the repository at hand. It is therefore useful to have a rough
understanding of the basic concepts of graph theory, and of DAGs in
particular. The exercises of this section ask the reader to translate
between the abstract, mathematical notion of a graph and its concrete
realization as commits in a git repository. We cover the partial order
of a DAG, and the derived concepts of reachabilty and infimum. Another
exercise aims to get the reader fluent in git's way of specifying
sets of commits.
EXERCISES()
git log --graph
to verify. git show topic1..topic2
git show topic1...topic2
git log -1 topic2~2
git log -1 topic2^2
git log topic1...master
A^0, A^, A^1, A^^^2, B^3^, A~1, A^2, A^^, A^1^1, A^^2, B^3^2, A^^3^2. A~2^2, A^^3^, A^^^, A^1^1^1, A~3.», « The suffix
^
to a revision parameter means the
first parent. ^n
means the n-th parent. The suffix
~n
means the n-th generation ancestor, following only
the first parents. See gitrevisions(7)
.
A^0 = A, A^ = A^1 = B, A^^^2 = H, B^3^ = I, A~1 = B, A^2 = C, A^^ = A^1^1 = D, A^^2 = E, B^3^2 = A^^3^2 = J, A~2^2 = H, A^^3^ = I, A^^^ = A^1^1^1 = A~3 = G») SECTION(«Git Objects and Refs») Unlike centralized version control systems like CVS and SVN, each copy of a git repository contains the full history of the source tree, rather than only a few recent revisions. This speeds up operations like CMD(«git log») or CMD(«git diff») because all operations are local. It also makes it possible to work offline as no network connection is needed for most operations. The git database, which is hidden inside the CMD(«.git») subdirectory of the repository, contains all revisions of all tracked files as well as meta data like file names, access permissions and commit messages. All contents are stored as EMPH(«git objects») and the database is indexed by the SHA1 hash value of the objects' contents. This indexing method is called EMPH(«content-based addressing») because the hash value of the contents of an object is used as the lookup key for the database. Depending on the size of the repository, the git database may contain millions of objects, but there are only four different types of objects: blob, tree, commit, and tag. The exercises of this section invite the reader to look at each object type in more detail. Another aim is to demystify the differences between heads, tags, refs and branches, which all denote a reference to a commit object. EXERCISES() - Recall the properties of a XREFERENCE(«https://en.wikipedia.org/wiki/Cryptographic_hash_function», «cryptographic hash function»). - How many objects of each type exist in the repo created by this REFERENCE(two_branches.bash, script)? Check with CMD(git fsck -v). - Clone the user-info repository with CMD(«git clone git://ilm.eb.local/user-info») and explore all files in the CMD(«.git/refs») directory. HOMEWORK(« - Learn how to manually create a commit with CMD(«git hash-object»), CMD(«git update-index»), CMD(«git write-tree»), and CMD(«git commit-tree»). ») SECTION(«The Index») Every version control system needs some kind of EMPH(«tree object») which records the information about one particular state of the source tree. A commit then corresponds to a transition from one tree object to another and is described by an edge in the commit graph. Git exposes one tree object in a special staging area called the EMPH(«index»). One can think of the index as a table which contains one row for each tracked file, which contains the information necessary to generate a tree object. Under normal circumstances each row of the index has three columns: The permission bits of the file, the file name, and the hash value of the file's contents. When resolving merge conflicts, however, it is handy to have additional columns which contain the hash values of the two conflicting versions of the file plus the hash value of a common anchestor. Many git commands operate on the index. For example the command CMD(«git commit») (with no arguments) creates a commit from the index. It does not even look at the working tree. Another example is CMD(«git add foo»), which updates the hash column of CMD(«foo») in the index to match the version of CMD(«foo») in the working tree. From the above it should be clear that the concept of an index is quite natural in the context of version control systems. The fact that git exposes the index, rather than hiding it as other version control systems do, gives the user a great deal of control over the next commit. Being able to tweak the index as needed is a good thing not only for conflict handling. The exercises of this section try to convince the reader that the index is by no means an advanced concept that is so hard to understand that it should be hidden from the user. EXERCISES() - In any repository, add a modified tracked file and run CMD(«git diff»), and CMD(«git diff --cached»). - Make two unrelated changes to the same file, then run CMD(«tig»), CMD(«git gui») or CMD(«git add -i») to record only one of the changes to the index. Run CMD(«git diff --cached») to verify before you commit. - During a merge, the index contains references to up to three versions of each file. Explain to which commits these three versions correspond. SECTION(«Reset») Resetting a branch means to let the branch head point to a different commit. This so-called EMPH(«soft») reset operates only on the commit graph, but it touches neither the index nor the working tree. By default git performs a EMPH(«medium») reset which additionally resets the index to make it match the tree object of the new commit. Finally, a EMPH(«hard») reset additionally updates the working tree accordingly. The exercises of this section try to clarify the difference between the three different flavors of resetting a branch. EXERCISES() - In the repo created with REFERENCE(«two_branches.bash», «script»), create a new temporary branch with CMD(«git checkout -b tmp topic2»). Reset this branch to its parent commit with CMD(«git reset --hard HEAD^») Repeat using the CMD(«--soft») and CMD(--medium) options. Examine the index at each step. - When given one or more paths, CMD(«git reset») has a different meaning: It copies named entries from the given revision to the index. In the two-branches repo, run CMD(«git reset HEAD^ h») and investigate the working copy and the index with CMD(«git diff») and CMD(«git diff --cached»). SECTION(«Stashing»)
The command git reset --hard
throws away any
uncommitted changes in the working tree and the index. It returns to
a clean state where index and working tree match the tree
of the HEAD commit. Sometimes, however, one would like to return to
a clean state without losing or committing the local changes.
For example, suppose that your working tree has several modified files because you are in the middle of something. Then you notice an unrelated flaw in one of the files. Fixing this flaw has higher priority than your current work and should be quick and easy. But you don't want to lose your local changes and you don't want to commit them either because this work is not yet complete.
In this situation git stash
can help you out. This
command records the current state of the working directory and the
index. The modifications can be restored later, possibly on top of
a different commit.
Stashes are stored in a git repository as illustrated in the graph
to the left. H stands for the HEAD
commit, I for a commit
that records the state of the index. W is a commit which includes
the changes of the working tree, relative to the HEAD
commit. It is reasonable to store W as a child of I since usually
the staged version corresponds to an earlier version of the tree.
After git stash
the index and the working tree
are both reset to H so that git status
reports
a clean state. git stash pop
and git stash
apply
apply the changes between H and W to the current working
directory. Since the working directory might be completely different
at this point, this operation can fail. Note that neither git
stash pop
nor git stash apply
restore the changes
to the index recorded in the stash. For this you need to specify
the --index
option. Consult git-stash(1)
for details.
#!/bin/bash set -e GD=$(mktemp -d /tmp/ct-git-XXXXXX) cd "$GD" git init echo cd "$GD"SUBSECTION(«two_branches.bash»)
#!/bin/bash set -e GD=$(mktemp -d /tmp/ct-git-XXXXXX) cd "$GD" git init echo hello > h echo 'apples, peas' > fruits git add h fruits git commit -m initial git checkout -b topic1 echo world >> h echo apples > fruits git commit -am 'add world, peas are no fruits' git checkout -b topic2 master echo people >> h git commit -am 'add people' echo cd "$GD"SUBSECTION(«merge.bash»)
#!/bin/bash set -e GD=$(mktemp -d /tmp/ct-git-XXXXXX) cd "$GD" git init echo hello > h git add h git commit -m initial git checkout -b topic1 echo 'apples' > fruits git add fruits git commit -m fruits echo 'pears' >> fruits git commit -am 'more fruits' git checkout -b topic2 master echo 'peas' > vegetables git add vegetables git commit -m vegetables git merge --no-edit topic1 echo Created merge example repository in: echo "$PWD"SUBSECTION(«stash.bash»)
#!/bin/bash set -e GD=$(mktemp -d /tmp/ct-git-XXXXXX) f='apple-definition' cd "$GD" git init echo 'The apple tree (Malus domestica) is a deciduous tree in the rose family best known for its sweet, pomacous fruit, the apple.' > "$f" git add "$f" git commit -m 'initial draft of apple definition' echo 'The apple tree (Malus domestica) is a deciduous tree in the rose family best known for its sweet, pomacous fruit, the apple. The tree originated in Central Asia, where its wild ancestor, Malus sieversii, is still found today.' > "$f"SUBSECTION(«rebase_example.bash»)
#!/bin/bash set -e GD=$(mktemp -d /tmp/ct-git-XXXXXX) cd "$GD" git init f1='apfelwein' f2='culture' echo 'Apfelwein or Most are German words for cider. It is also regionaly known as Ebbelwoi, Äppler, Stöffsche, Apfelmost Viez, and saurer Most. ' > "$f1" git add "$f1" git commit -m 'Add initial definition of Ebbelwoi.' echo ' In the Frankfurt area, berries from the service tree (Sorbus domestica), are added to increase astringency. This specific type of Apfelwein is called Speierling. ' >> "$f1" git commit -am 'Add section on Speierling.' git checkout -b 'bembel' echo ' Apfelwein is served in a "Geripptes", a glass with a lozenge cut that refracts light and improves grip. ' > "$f2" git add "$f2" git commit -m 'Initial draft of culture file.' git checkout master echo ' The juice or must is fermented with yeast to produce an alcoholic beverage usually around 6% abv. ' >> "$f1" git commit -am 'Mention that Apfelwein is an alcoholic beverage.' git checkout 'bembel' echo ' Most establishments will also serve Apfelwein by the Bembel (a specific Apfelwein jug), much like how beer can be purchased by the pitcher in many countries. ' >> "$f2" git commit -am 'Add section on bembel to culture file.' sed -i 's/regionaly/regionally/g' "$f1" git commit -am 'Fix typo in apfelwein section.' sed -i '/^Most establishments/,$d' "$f2" echo ' Most establishments will also serve Apfelwein by the Bembel (a specific Apfelwein jug). The paunchy bembel is made from salt-glazed stoneware and always has a basic grey colour with blue-painted detailing. ' >> "$f2" git commit -am 'Rewrite section on Bembel.' sed -i 's/bembel/Bembel/g' "$f2" git commit -am 'Always spell Bembel in upper case.' echo "cd $GD"