Anatomy of a Git pre-commit Hook

What does a Git Hook Look like?

There are several different kinds of hooks in git (“Git Hook); these hooks let you ‘hook’ in to an event in git.

There are lots of events you can hook into, some happening on the client, some happening on your ‘server’. For this conversation, assume ‘server’ is the central repository everyone on your team pushes their git commits to (git purists will be angry by this, but it’s the best we can do in this medium).

Today, I’m going to focus on a particular git-hook, pre-commit:

The first line of a git-hook (thanks unix) is the She-bang line. This line tells the shell what program to use to execute this script. (She-bang, or “Shhhh-bang” refers to the #! Followed by the location of binary used to execute the script):

#! /usr/bin/env/node

After this, the script starts, and each hook passes different parameters into the script, depending on the hook. For the pre-commit hook, There are some interesting properties of git that make this less programmer-y and more sys-adminy.

There is no consistent git API; as programmers we may expect everything to be parameters into the hook; but that’s not how it works in Git. Git treats hooks as scripts that live on the same system as git, as if those scripts are interacting with git (and the OS’s) own filesystem.

A (perhaps wild) outcome from this is that there are certain interactions in git that create files. When you add a commit message using the -m option (and even when you don’t, when you let the editor pop up), it’s actually creating a file in the .git directory called “COMMIT_EDITMSG”.

Viewing this file shows the commit message the user just tried to use to commit the files. If you’re trying to base a script off of interacting with the commit message, this file will contain the commit message.

The next property of a a git-precommit hook is that the exit mechanism follows unix conventions. exit(0) for success, exit(1) for failure.

If you want to access the files changed as a result of the commit (for instance, to ensure the unit tests pass before allowing the code to be committed), there’s a giant problem that isn’t apparent until you run into it a few times: Most git pre-commit hooks iterate over the files in your working directory; even if you’ve only partially staged changes for git. That means that the state of your local copy does not match what is actually going to be committed.

Git does not provide an easy way to check what is going to be committed; so if you want to check files you’re about to commit, and you need to be absolutely sure the state of the repository reflects what will happen on the build server, you have to do some gymnastics. From this point forward, I’ll be referring to John Wiegley’s excellent git pre-commit hook that does this. You may not need something so extreme, but it does get the job done.

  1. Checkout a copy of the current index into MIRROR
git checkout-index --prefix=$MIRROR/ -af

Git’s checkout-index lists the files that are in the index (committed files prior to this commit); the f option is for force and the a option is for ‘all’. Checkout-index focuses on the files that are in git’s index; not necessarily all the files present on the file system in a git directory. For instance, files that are staged, or files that are changed but not committed will have different copies in the index than they will in the working directory.

2. Remove files from MIRROR which are no longer present in the index

git diff-index --cached --name-only --diff-filter=D -z HEAD | 
(cd $MIRROR && xargs -0 rm -f --)

This part removes any file in the mirror that don’t match what is in the HEAD revision (HEAD is a pointer to the ‘latest’ work you’ve done)

3. Move only the correct files to the $TMPDIR.

rsync -rlpgoDO --size-only --delete 
    --exclude-from=tools/excludes $MIRROR/ $TMPDIR/

And finally, it takes the files that were changed according to the commit, and moves them to a directory ($TMPDIR).

Once you’ve done these mental gymnastics, you can finally run unit tests against this new mirror as a part of the pre-commit script.

It’s important to restate that the above gymnastics are only necessary if you want to be absolutely sure the repository will match what happens on the build server.

John’s githook is here in full; but there are a few takeaways from this:

Git hooks (not all) exist on the local repository, and have the same issues you would interacting with the repository. They’re shell (or node, or perl, or python) scripts that have access to the same information you have access to.
There are at least three states you need to be aware of at all times when dealing with a git hook like pre-commit:
The changes in the working directory that you’ve committed
The changes in the working directory that you’ve staged, but not committed
All files in your working directory, neither staged, committed, or tracked

  1. Git hooks can be used to do anything you could do from a local shell script:
    1. Web API calls
    2. Invoking other programs
    3. Making changes to the files you’re committing

In future segments, we’ll dive into other interesting things you can do with git hooks – the important point here is that git is very powerful and can be used to automate your processes without having to worry about vendor lock-in and having the same immediate feedback you would after you’ve already committed changes.

Leave a Reply