2.4 commits

Tree objects effectively capture and freeze a hierarchical set of files and directories. Put another way, a tree object is a snapshot in time of a set of blob to file path mappings. This is useful to us when we want to capture a history, all we need do is capture tree objects representing the start and end of any operation and then somehow tell Git that the first snapshot precedes the second. We can now look at these two snapshots as a history of the files and directories captured by the tree objects.

We already have two snapshots we can use to start our history.

1git ls-tree -r b7e8fa 
2git ls-tree -r 0139f0

I’ve used the -r option to list the tree object recursively. This has no effect on the first tree but the second tree shows blob object 83baae mapped to the file path dir1/file11.txt whereas without the -r option we would see only the tree object 337f38 mapped to directory dir1 (as above).

git ls-tree -r b7e8fa
1100644 blob 83baae61804e65cc73a7201a7252750c76066a30    file1.txt
git ls-tree -r 0139f0
1100644 blob b0b9fc8f6cc2f8f110306ed7f6d1ce079541b41f  another_file.txt 
2100644 blob 83baae61804e65cc73a7201a7252750c76066a30  dir1/file11.txt 
3100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a  file1.txt

The first (b7e8fa) tree contains only file1.txt. In the second we have added file another_file.txt, the directory dir1 and within that the file file11.txt (file1.txt has different content to that referred to in b7e8fa, we know this because it is mapped to a different blob (1f7a7a) rather than 83baae).

So Git provides a simple mechanism for showing that tree b7e8fa is historically before tree 0139f0? Yes …and no. Although we know that we added and modified the files between creating tree objects b7e8fa and 0139f0 there is nothing reflecting this history. We could just as easily claim that 0139f0 was created first and then we modified file1.txt and removed dir1 and it’s content, the results would be the same.

To create a history we must first create a new type of object, the commit object. It will not surprise you that these objects are also stored under .git/objects.

Commit objects contain special metadata (that is data about data, in this case data about a tree object). To create a commit object we use the commit-tree command.

1git commit-tree -m "First commit" b7e8fa

As with other commands that create new objects the commit-tree command returns the hash of the new commit object. This is the first time you will notice your commit will have a different hash to my commit object. Pause for a second and consider why this might be.

PIC

Figure 2.6:Single commit

We can now inspect this object, first confirming its type and then pretty printing it.

1git cat-file -t f871b5 
2git cat-file -p f871b5
git cat-file -t f871b5
1commit
git cat-file -p f871b5
1tree b7e8fac7e3e35d93d39d2fa2260868f025a9efb4 
2author vagrant <vagrant@debian-10.7-amd64> 1615399633 +0000 
3committer vagrant <vagrant@debian-10.7-amd64> 1615399633 +0000 
4 
5First commit

And here we see another difference between what you see and what I see. What causes this difference? After all we have, so far, started with the same setup and created the same objects in the repository. Compare closely the output of cat-file. At the end of the lines starting author and committer are two numbers, these are timestamps and since you and I created our commit objects at different times we have different timestamps and consequently these commit objects have different hashes.

We can demonstrate this clearly by repeating the commit-tree with no changes.

1git commit-tree -m "First commit" b7e8fa

Git returns a different hash. If we compare the two commit objects (remembering that your commit objects’ hashes will be different to mine!), we see they differ only in the timestamps recorded.

1diff <(git cat-file -p f871b5) <(git cat-file -p e3004b)

We now have two commit objects, but they are not very interesting as they refer to the same tree object (and hence the same ’version 1’ of file1.txt). Let’s create some more interesting commit objects.

We previously created a tree object (0139f0) that captured the files file1.txt (’version 2’), another_file.txt, and dir1/file11.txt. We now what to create a history in which this configuration of files and directories follows from the ’version 1’ file1.txt.

1git commit-tree -m "Second commit" -p f871b5 0139f0

In this commit-tree we added the -p option to indicate that commit object f871b5 is the parent of the commit object we are creating for tree object 0139f9. As before we can examine the new commit object (in my case 0715e7) with the cat-file command.

PIC

Figure 2.7:Commit with parent
1git cat-file -p 0715e7
1tree 0139f016af84acd889e2f707ef9eca2140e0222e 
2parent f871b58596491e15ee1da91eaf0a4a6c1da3e573 
3author vagrant <vagrant@debian-10.7-amd64> 1615399872 +0000 
4committer vagrant <vagrant@debian-10.7-amd64> 1615399872 +0000 
5 
6Second commit

On line 2 we see that this commit has a parent (f871b5).

Now let’s quickly create one more commit. First we create a new version of file1.txt, then create a new tree object, and finally a new commit.

1echo 'version 3' > file1.txt 
2git update-index file1.txt 
3git write-tree 
4git commit-tree -m "Third commit" -p 0715e7 fd97ab

2.4.1 Progress review: blobs, trees, and commits

PIC

Figure 2.8:Three commit history

Let us review the content of our objects store4. We have create three tree objects using the write-tree command. These were:

  1. Version one of file1.txt on its own.
  2. Adding another_file.txt alongside version one of dir1/file11.txt and updating file1.txt to version two.
  3. Update file1.txt to version 3

To create a version chain of these three tree objects we use three commit-tree commands. The first commit object has no parent as it is the first entry, it contains the four pieces of data:

  1. tree—the hash of the tree object to which this commit refers.
  2. author—a record of the author’s name and email (the person who write the changes in the tree object), along with the time the commit was authored
  3. committer—a record of the committer (the user who actually executed the commit-tree)
  4. A blank line, followed by the text of any comment we want to associate with the commit (in these example, supplied by the -m option to the commit-tree command).

Author versus Committer

Why the two entries ‘author’ and ‘committer’?

The ‘author’ of a change is the individual who edited the files making up the change.

The ‘committer’ is the user who created the commit object.

In private use these two field normally contain the same information. The same user created the commit and makes the changes. However, suppose a user submits a change as a patch file using e-mail? That person is the author of the change but not the person who puts those changes into the Git repository. This is why there is a distinction between the ‘author’ and ‘committer’.

The second commit specifies the first commit object (the hash returned by the first commit-tree). Looking at this commit object you can see one additional piece of data over the initial commit:

  1. parent—the hash reference to parent commit object.

Finally we created a third commit object referencing the second as its parent.

The entire chain we just created can be displayed using the log command; the hash e27aa being the hash of the last (third) commit object we just created. The --stat option shows summary statistics of each commit and the --patch shows the changes to the files in each commit.

1git log --stat --patch e27aaa
1commit e27aaa8c158e6f261f4c03aaaf173a149ad61d81 
2Author: vagrant <vagrant@debian-10.7-amd64> 
3Date:   Wed Mar 10 18:13:55 2021 +0000 
4 
5    Third commit 
6--- 
7 file1.txt | 2 +- 
8 1 file changed, 1 insertion(+), 1 deletion(-) 
9 
10diff --git a/file1.txt b/file1.txt 
11index 1f7a7a4..7170a52 100644 
12--- a/file1.txt 
13+++ b/file1.txt 
14@@ -1 +1 @@ 
15-version 2 
16+version 3 
17 
18commit 0715e707b906d30c9e395448ddc9e96acd89d5f7 
19Author: vagrant <vagrant@debian-10.7-amd64> 
20Date:   Wed Mar 10 18:11:12 2021 +0000 
21 
22    Second commit 
23--- 
24 another_file.txt | 1 + 
25 dir1/file11.txt  | 1 + 
26 file1.txt        | 2 +- 
27 3 files changed, 3 insertions(+), 1 deletion(-) 
28 
29diff --git a/another_file.txt b/another_file.txt 
30new file mode 100644 
31index 0000000..b0b9fc8 
32--- /dev/null 
33+++ b/another_file.txt 
34@@ -0,0 +1 @@ 
35+Another file 
36diff --git a/dir1/file11.txt b/dir1/file11.txt 
37new file mode 100644 
38index 0000000..83baae6 
39--- /dev/null 
40+++ b/dir1/file11.txt 
41@@ -0,0 +1 @@ 
42+version 1 
43diff --git a/file1.txt b/file1.txt 
44index 83baae6..1f7a7a4 100644 
45--- a/file1.txt 
46+++ b/file1.txt 
47@@ -1 +1 @@ 
48-version 1 
49+version 2 
50 
51commit f871b58596491e15ee1da91eaf0a4a6c1da3e573 
52Author: vagrant <vagrant@debian-10.7-amd64> 
53Date:   Wed Mar 10 18:07:13 2021 +0000 
54 
55    First commit 
56--- 
57 file1.txt | 1 + 
58 1 file changed, 1 insertion(+) 
59 
60diff --git a/file1.txt b/file1.txt 
61new file mode 100644 
62index 0000000..83baae6 
63--- /dev/null 
64+++ b/file1.txt 
65@@ -0,0 +1 @@ 
66+version 1

4If you want to be one of the cool kids you can point out that this structure (a tree in which each node hashes its children) is a Merkle tree.