2.2 blobs
We can use some low-level Git commands to create blobs directly1. The git hash-object sub-command creates and stores objects. Let’s create an object:
1echo 'version 1' > file1.txt 2git hash-object file1.txt 3tree .git
We created a simple text file and had git hash-object show us it’s hash (a 40 character string, actually the SHA-1 hash of the file’s content) but this object is not stored in the repository yet.
1.git 2├── branches 3├── config 4├── description 5├── HEAD 6├── hooks/ 7├── info 8│ └── exclude 9├── objects 10│ ├── info 11│ └── pack 12└── refs 13 ├── heads 14 └── tags 15 169 directories, 15 files
To have git hash-object store the file we use the -w option.
1git hash-object -w file1.txt 2tree .git
1.git 2├── branches 3├── config 4├── description 5├── HEAD 6├── hooks 7├── info 8│ └── exclude 9├── objects 10│ ├── 83 11│ │ └── baae61804e65cc73a7201a7252750c76066a30 12│ ├── info 13│ └── pack 14└── refs 15 ├── heads 16 └── tags 17 1810 directories, 16 files
The object is stored in the objects directory and the first two characters of the hash are used to create a directory (this is called ‘sharding’ and it is used to reduce the number of files stored in any one directory).
It is important to note that Git has no idea what this blob is, it is just some data. No record is held about the original file name, for that matter Git doesn’t even care that this blob came from a file.
1echo 'not a file' | git hash-object -w --stdin 2tree .git
1.git 2├── branches 3├── config 4├── description 5├── HEAD 6├── hooks 7├── info 8│ └── exclude 9├── objects 10│ ├── 7a 11│ │ └── b4ff63b2ea4c2c3ff89ee972bc42988a4b8472 12│ ├── 83 13│ │ └── baae61804e65cc73a7201a7252750c76066a30 14│ ├── info 15│ └── pack 16└── refs 17 ├── heads 18 └── tags 19 2011 directories, 17 files
Here the data for the blob is fed into Git straight from stdin, no file is involved this is ‘raw data’.
We can recall the blob from our repository using git cat-file (this is a bit misleading and would be better called cat-object because, as we shall see, we can use it to look inside various git objects).
1git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30
1version 1
The -p option ‘pretty prints’ the content of the object to stdout so if we want to create a file from this object we need to redirect it …
1git cat-file -p 83baae61804e65cc73a7201a7252750c76066a30 > new_file.txt 2cat new_file.txt
1version 1
Typing out those long hash identities quickly becomes tiresome. Fortunately Git allows us to specify shorter forms in many instances, specifically we can provide just enough of the start of an object’s hash that is unambiguous.
1git cat-file -p 83ba
In most circumstances 6 to 8 characters is sufficient, here we can use just 4 because our repository has so few entries this is all that is required to unambiguously reference each object. (We cannot go so far as reducing to just 2 as Git considers these too short—two characters will only identify the shard directory, not the object file.)
We can add another version of our file1.txt without any confusion (because Git does not care about the filename at this point).
1echo 'version 2' > file1.txt 2git hash-object -w file1.txt
Git adds the new object as a simple blob.
1tree .git 2git cat-file -p 1f7a
1.git 2├── branches 3├── config 4├── description 5├── HEAD 6├── hooks 7├── info 8│ └── exclude 9├── objects 10│ ├── 1f 11│ │ └── 7a7a472abf3dd9643fd615f6da379c4acb3e3a 12│ ├── 7a 13│ │ └── b4ff63b2ea4c2c3ff89ee972bc42988a4b8472 14│ ├── 83 15│ │ └── baae61804e65cc73a7201a7252750c76066a30 16│ ├── info 17│ └── pack 18└── refs 19 ├── heads 20 └── tags 21 22 12 directories, 18 files
1version 2
So we can store blobs in our repository but this is of limited use as we normally deal with directories containing files and these tend to have human readable names (like file1.txt).
1In day-to-day use we will use high-level commands to interact with our repository but in this chapter we’re interested in learning what Git does under the hood.