2.2 blobs

We can use some low-level Git commands to create blobs directly1. The git hash-object sub-command creates and stores objects. Let’s create an object:

1echo 'version 1' > file1.txt 
2git hash-object file1.txt 
3tree .git

We created a simple text file and had git hash-object show us it’s hash (a 40 character string, actually the SHA-1 hash of the file’s content) but this object is not stored in the repository yet.

tree -a
1.git 
2├── branches 
3├── config 
4├── description 
5├── HEAD 
6├── hooks/ 
7├── info 
8   └── exclude 
9├── objects 
10   ├── info 
11   └── pack 
12└── refs 
13    ├── heads 
14    └── tags 
15 
169 directories, 15 files

To have git hash-object store the file we use the -w option.

1git hash-object -w file1.txt 
2tree .git
tree -a
1.git 
2├── branches 
3├── config 
4├── description 
5├── HEAD 
6├── hooks 
7├── info 
8   └── exclude 
9├── objects 
10   ├── 83 
11      └── baae61804e65cc73a7201a7252750c76066a30 
12   ├── info 
13   └── pack 
14└── refs 
15    ├── heads 
16    └── tags 
17 
1810 directories, 16 files

The object is stored in the objects directory and the first two characters of the hash are used to create a directory (this is called ‘sharding’ and it is used to reduce the number of files stored in any one directory).

PIC

Figure 2.1:A Git ’blob’

It is important to note that Git has no idea what this blob is, it is just some data. No record is held about the original file name, for that matter Git doesn’t even care that this blob came from a file.

1echo 'not a file' | git hash-object -w --stdin 
2tree .git
tree -a
1.git 
2├── branches 
3├── config 
4├── description 
5├── HEAD 
6├── hooks 
7├── info 
8   └── exclude 
9├── objects 
10   ├── 7a 
11      └── b4ff63b2ea4c2c3ff89ee972bc42988a4b8472 
12   ├── 83 
13      └── baae61804e65cc73a7201a7252750c76066a30 
14   ├── info 
15   └── pack 
16└── refs 
17    ├── heads 
18    └── tags 
19 
2011 directories, 17 files

Here the data for the blob is fed into Git straight from stdin, no file is involved this is ‘raw data’.

PIC

Figure 2.2:Two Git blobs

We can recall the blob from our repository using git cat-file (this is a bit misleading and would be better called cat-object because, as we shall see, we can use it to look inside various git objects).

1git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30
1version 1

The -p option ‘pretty prints’ the content of the object to stdout so if we want to create a file from this object we need to redirect it …

1git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30 > new_file.txt 
2cat new_file.txt
1version 1

Typing out those long hash identities quickly becomes tiresome. Fortunately Git allows us to specify shorter forms in many instances, specifically we can provide just enough of the start of an object’s hash that is unambiguous.

1git cat-file -p 83ba

In most circumstances 6 to 8 characters is sufficient, here we can use just 4 because our repository has so few entries this is all that is required to unambiguously reference each object. (We cannot go so far as reducing to just 2 as Git considers these too short—two characters will only identify the shard directory, not the object file.)

We can add another version of our file1.txt without any confusion (because Git does not care about the filename at this point).

1echo 'version 2' > file1.txt 
2git hash-object -w file1.txt

Git adds the new object as a simple blob.

1tree .git 
2git cat-file -p 1f7a
tree -a
1.git 
2├── branches 
3├── config 
4├── description 
5├── HEAD 
6├── hooks 
7├── info 
8   └── exclude 
9├── objects 
10   ├── 1f 
11      └── 7a7a472abf3dd9643fd615f6da379c4acb3e3a 
12   ├── 7a 
13      └── b4ff63b2ea4c2c3ff89ee972bc42988a4b8472 
14   ├── 83 
15      └── baae61804e65cc73a7201a7252750c76066a30 
16   ├── info 
17   └── pack 
18└── refs 
19    ├── heads 
20    └── tags 
21 
22 12 directories, 18 files
git cat-file -p 1f7a
1version 2

So we can store blobs in our repository but this is of limited use as we normally deal with directories containing files and these tend to have human readable names (like file1.txt).

PIC

Figure 2.3:Three Git blobs

1In day-to-day use we will use high-level commands to interact with our repository but in this chapter we’re interested in learning what Git does under the hood.