2.4 commits
Tree objects effectively capture and freeze a hierarchical set of files and directories. Put another way, a tree object is a snapshot in time of a set of blob to file path mappings. This is useful to us when we want to capture a history, all we need do is capture tree objects representing the start and end of any operation and then somehow tell Git that the first snapshot precedes the second. We can now look at these two snapshots as a history of the files and directories captured by the tree objects.
We already have two snapshots we can use to start our history.
1git ls-tree -r b7e8fa 2git ls-tree -r 0139f0
I’ve used the -r option to list the tree object recursively. This has no effect on the first tree but the second tree shows blob object 83baae mapped to the file path dir1/file11.txt whereas without the -r option we would see only the tree object 337f38 mapped to directory dir1 (as above).
1100644 blob 83baae61804e65cc73a7201a7252750c76066a30 file1.txt
1100644 blob b0b9fc8f6cc2f8f110306ed7f6d1ce079541b41f another_file.txt 2100644 blob 83baae61804e65cc73a7201a7252750c76066a30 dir1/file11.txt 3100644 blob 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a file1.txt
The first (b7e8fa) tree contains only file1.txt. In the second we have added file another_file.txt, the directory dir1 and within that the file file11.txt (file1.txt has different content to that referred to in b7e8fa, we know this because it is mapped to a different blob (1f7a7a) rather than 83baae).
So Git provides a simple mechanism for showing that tree b7e8fa is historically before tree 0139f0? Yes …and no. Although we know that we added and modified the files between creating tree objects b7e8fa and 0139f0 there is nothing reflecting this history. We could just as easily claim that 0139f0 was created first and then we modified file1.txt and removed dir1 and it’s content, the results would be the same.
To create a history we must first create a new type of object, the commit object. It will not surprise you that these objects are also stored under .git/objects.
Commit objects contain special metadata (that is data about data, in this case data about a tree object). To create a commit object we use the commit-tree command.
1git commit-tree -m "First commit" b7e8fa
As with other commands that create new objects the commit-tree command returns the hash of the new commit object. This is the first time you will notice your commit will have a different hash to my commit object. Pause for a second and consider why this might be.
We can now inspect this object, first confirming its type and then pretty printing it.
1git cat-file -t f871b5 2git cat-file -p f871b5
1commit
1tree b7e8fac7e3e35d93d39d2fa2260868f025a9efb4 2author vagrant <vagrant@debian-10.7-amd64> 1615399633 +0000 3committer vagrant <vagrant@debian-10.7-amd64> 1615399633 +0000 4 5First commit
And here we see another difference between what you see and what I see. What causes this difference? After all we have, so far, started with the same setup and created the same objects in the repository. Compare closely the output of cat-file. At the end of the lines starting author and committer are two numbers, these are timestamps and since you and I created our commit objects at different times we have different timestamps and consequently these commit objects have different hashes.
We can demonstrate this clearly by repeating the commit-tree with no changes.
1git commit-tree -m "First commit" b7e8fa
Git returns a different hash. If we compare the two commit objects (remembering that your commit objects’ hashes will be different to mine!), we see they differ only in the timestamps recorded.
1diff <(git cat-file -p f871b5) <(git cat-file -p e3004b)
We now have two commit objects, but they are not very interesting as they refer to the same tree object (and hence the same ’version 1’ of file1.txt). Let’s create some more interesting commit objects.
We previously created a tree object (0139f0) that captured the files file1.txt (’version 2’), another_file.txt, and dir1/file11.txt. We now what to create a history in which this configuration of files and directories follows from the ’version 1’ file1.txt.
1git commit-tree -m "Second commit" -p f871b5 0139f0
In this commit-tree we added the -p option to indicate that commit object f871b5 is the parent of the commit object we are creating for tree object 0139f9. As before we can examine the new commit object (in my case 0715e7) with the cat-file command.
1git cat-file -p 0715e7
1tree 0139f016af84acd889e2f707ef9eca2140e0222e 2parent f871b58596491e15ee1da91eaf0a4a6c1da3e573 3author vagrant <vagrant@debian-10.7-amd64> 1615399872 +0000 4committer vagrant <vagrant@debian-10.7-amd64> 1615399872 +0000 5 6Second commit
On line 2 we see that this commit has a parent (f871b5).
Now let’s quickly create one more commit. First we create a new version of file1.txt, then create a new tree object, and finally a new commit.
1echo 'version 3' > file1.txt 2git update-index file1.txt 3git write-tree 4git commit-tree -m "Third commit" -p 0715e7 fd97ab
2.4.1 Progress review: blobs, trees, and commits
Let us review the content of our objects store4. We have create three tree objects using the write-tree command. These were:
- Version one of file1.txt on its own.
- Adding another_file.txt alongside version one of dir1/file11.txt and updating file1.txt to version two.
- Update file1.txt to version 3
To create a version chain of these three tree objects we use three commit-tree commands. The first commit object has no parent as it is the first entry, it contains the four pieces of data:
- tree—the hash of the tree object to which this commit refers.
- author—a record of the author’s name and email (the person who write the changes in the tree object), along with the time the commit was authored
- committer—a record of the committer (the user who actually executed the commit-tree)
- A blank line, followed by the text of any comment we want to associate with the commit (in these example, supplied by the -m option to the commit-tree command).
Author versus Committer
Why the two entries ‘author’ and ‘committer’?
The ‘author’ of a change is the individual who edited the files making up the change.
The ‘committer’ is the user who created the commit object.
In private use these two field normally contain the same information. The same user created the commit and makes the changes. However, suppose a user submits a change as a patch file using e-mail? That person is the author of the change but not the person who puts those changes into the Git repository. This is why there is a distinction between the ‘author’ and ‘committer’.
The second commit specifies the first commit object (the hash returned by the first commit-tree). Looking at this commit object you can see one additional piece of data over the initial commit:
- parent—the hash reference to parent commit object.
Finally we created a third commit object referencing the second as its parent.
The entire chain we just created can be displayed using the log command; the hash e27aa being the hash of the last (third) commit object we just created. The --stat option shows summary statistics of each commit and the --patch shows the changes to the files in each commit.
1git log --stat --patch e27aaa
1commit e27aaa8c158e6f261f4c03aaaf173a149ad61d81 2Author: vagrant <vagrant@debian-10.7-amd64> 3Date: Wed Mar 10 18:13:55 2021 +0000 4 5 Third commit 6--- 7 file1.txt | 2 +- 8 1 file changed, 1 insertion(+), 1 deletion(-) 9 10diff --git a/file1.txt b/file1.txt 11index 1f7a7a4..7170a52 100644 12--- a/file1.txt 13+++ b/file1.txt 14@@ -1 +1 @@ 15-version 2 16+version 3 17 18commit 0715e707b906d30c9e395448ddc9e96acd89d5f7 19Author: vagrant <vagrant@debian-10.7-amd64> 20Date: Wed Mar 10 18:11:12 2021 +0000 21 22 Second commit 23--- 24 another_file.txt | 1 + 25 dir1/file11.txt | 1 + 26 file1.txt | 2 +- 27 3 files changed, 3 insertions(+), 1 deletion(-) 28 29diff --git a/another_file.txt b/another_file.txt 30new file mode 100644 31index 0000000..b0b9fc8 32--- /dev/null 33+++ b/another_file.txt 34@@ -0,0 +1 @@ 35+Another file 36diff --git a/dir1/file11.txt b/dir1/file11.txt 37new file mode 100644 38index 0000000..83baae6 39--- /dev/null 40+++ b/dir1/file11.txt 41@@ -0,0 +1 @@ 42+version 1 43diff --git a/file1.txt b/file1.txt 44index 83baae6..1f7a7a4 100644 45--- a/file1.txt 46+++ b/file1.txt 47@@ -1 +1 @@ 48-version 1 49+version 2 50 51commit f871b58596491e15ee1da91eaf0a4a6c1da3e573 52Author: vagrant <vagrant@debian-10.7-amd64> 53Date: Wed Mar 10 18:07:13 2021 +0000 54 55 First commit 56--- 57 file1.txt | 1 + 58 1 file changed, 1 insertion(+) 59 60diff --git a/file1.txt b/file1.txt 61new file mode 100644 62index 0000000..83baae6 63--- /dev/null 64+++ b/file1.txt 65@@ -0,0 +1 @@ 66+version 1
4If you want to be one of the cool kids you can point out that this structure (a tree in which each node hashes its children) is a Merkle tree.