Anatomy of a Git pre-commit Hook

What does a Git Hook Look like?

There are several different kinds of hooks in git (“Git Hook); these hooks let you ‘hook’ in to an event in git.

There are lots of events you can hook into, some happening on the client, some happening on your ‘server’. For this conversation, assume ‘server’ is the central repository everyone on your team pushes their git commits to (git purists will be angry by this, but it’s the best we can do in this medium).

Today, I’m going to focus on a particular git-hook, pre-commit:

The first line of a git-hook (thanks unix) is the She-bang line. This line tells the shell what program to use to execute this script. (She-bang, or “Shhhh-bang” refers to the #! Followed by the location of binary used to execute the script):

#! /usr/bin/env/node

After this, the script starts, and each hook passes different parameters into the script, depending on the hook. For the pre-commit hook, There are some interesting properties of git that make this less programmer-y and more sys-adminy.

There is no consistent git API; as programmers we may expect everything to be parameters into the hook; but that’s not how it works in Git. Git treats hooks as scripts that live on the same system as git, as if those scripts are interacting with git (and the OS’s) own filesystem.

A (perhaps wild) outcome from this is that there are certain interactions in git that create files. When you add a commit message using the -m option (and even when you don’t, when you let the editor pop up), it’s actually creating a file in the .git directory called “COMMIT_EDITMSG”.

Viewing this file shows the commit message the user just tried to use to commit the files. If you’re trying to base a script off of interacting with the commit message, this file will contain the commit message.

The next property of a a git-precommit hook is that the exit mechanism follows unix conventions. exit(0) for success, exit(1) for failure.

If you want to access the files changed as a result of the commit (for instance, to ensure the unit tests pass before allowing the code to be committed), there’s a giant problem that isn’t apparent until you run into it a few times: Most git pre-commit hooks iterate over the files in your working directory; even if you’ve only partially staged changes for git. That means that the state of your local copy does not match what is actually going to be committed.

Git does not provide an easy way to check what is going to be committed; so if you want to check files you’re about to commit, and you need to be absolutely sure the state of the repository reflects what will happen on the build server, you have to do some gymnastics. From this point forward, I’ll be referring to John Wiegley’s excellent git pre-commit hook that does this. You may not need something so extreme, but it does get the job done.

  1. Checkout a copy of the current index into MIRROR
git checkout-index --prefix=$MIRROR/ -af

Git’s checkout-index lists the files that are in the index (committed files prior to this commit); the f option is for force and the a option is for ‘all’. Checkout-index focuses on the files that are in git’s index; not necessarily all the files present on the file system in a git directory. For instance, files that are staged, or files that are changed but not committed will have different copies in the index than they will in the working directory.

2. Remove files from MIRROR which are no longer present in the index

git diff-index --cached --name-only --diff-filter=D -z HEAD | \
(cd $MIRROR && xargs -0 rm -f --)

This part removes any file in the mirror that don’t match what is in the HEAD revision (HEAD is a pointer to the ‘latest’ work you’ve done)

3. Move only the correct files to the $TMPDIR.

rsync -rlpgoDO --size-only --delete \
    --exclude-from=tools/excludes $MIRROR/ $TMPDIR/

And finally, it takes the files that were changed according to the commit, and moves them to a directory ($TMPDIR).

Once you’ve done these mental gymnastics, you can finally run unit tests against this new mirror as a part of the pre-commit script.

It’s important to restate that the above gymnastics are only necessary if you want to be absolutely sure the repository will match what happens on the build server.

John’s githook is here in full; but there are a few takeaways from this:

Git hooks (not all) exist on the local repository, and have the same issues you would interacting with the repository. They’re shell (or node, or perl, or python) scripts that have access to the same information you have access to.
There are at least three states you need to be aware of at all times when dealing with a git hook like pre-commit:
The changes in the working directory that you’ve committed
The changes in the working directory that you’ve staged, but not committed
All files in your working directory, neither staged, committed, or tracked

  1. Git hooks can be used to do anything you could do from a local shell script:
    1. Web API calls
    2. Invoking other programs
    3. Making changes to the files you’re committing

In future segments, we’ll dive into other interesting things you can do with git hooks – the important point here is that git is very powerful and can be used to automate your processes without having to worry about vendor lock-in and having the same immediate feedback you would after you’ve already committed changes.

Agile can’t save us

There are two types of people, those that have read and believe Fred Brooks admonishment of ‘no silver bullet’, and those that believe that an “Agile Transformation” will improve productivity in their company.

So we shouldn’t try, right? Just cast Agile aside and do what we’re doing now.

Maybe. Yea.

“agile” is a state of being, not a process. It is a feeling that drives development and delivery. In teams I would consider agile, none of them operated exactly like the others; and some actions that would improve agility in one team would harm it in another. Some companies are able to mimic that feeling by embracing agile principles, and others aren’t. The important part isn’t the specific methodology that others say will help you achieve max productivity for your team, it’s the process that aligns with your culture and your software and your customer.

Your Culture Dictates What you Can Improve

In the previous segment, I mentioned three different areas of focus for productivity improvement:

System: The software being built
Process: How the software gets built (notionally refers to the type of development methodology being followed, but in reality describes the culture around the team and the process the team actually follows)
Customer: The purpose behind the software; and who the software is being built for.

Changing any of these requires understanding your enterprise’s culture intimately.

I once worked with a large defense contractor, and the team had been tasked with improving the performance of the application. The conventional wisdom was that performance would be improved if all of the tickets in the backlog were completed. Now, you and I know this wouldn’t have guaranteed application performance improvement at all, and it certainly wouldn’t guarantee customer happiness. But, it’s a metric the customer could see, and stood in as a proxy for success.

The culture was such that doing what was actually necessary to improve the application’s performance was out of scope. The application essentially needed a complete rewrite. Its assumptions were invalid, and it took 10 years for the customer to realize it. But, for historical reasons, the customer was loathe to rewrite it and the defense contractor was not going to suggest that, with good reason. I foolishly advocated for a targeted rewrite of the transport and presentation layers, and was politely told ‘no’.

I learned from this exchange (and probably would have realized it sooner if I had paid closer attention) that the System itself was not in scope for change according to the culture of the organization. But, as I would find out later, I could improve delivery and customer satisfaction by changing how the software was delivered to the customer. This floored me. We couldn’t port the software to the web, but we could spend hundreds of thousands of dollars on an unproven technology by a proven vendor to deliver the software over the web (Microsoft RemoteApp), and the customer was happy about it!

The culture of the company dictated which lever I could pull.

Do you find yourself running into roadblocks when you suggest process improvements? Delivery improvements? If so, it might be that your culture doesn’t allow for that lever to be pulled. Find another lever, and try to pull that one.

What levers can you pull to improve productivity?

In his seminal work, The Mythical Man Month, one of the many lessons Fred Brooks teaches is that there is no silver bullet. There is no one management technique or technology that can create a 10X increase in productivity over a decade.

Scrum practitioners is learning this now, as are the management teams that went all-in on scrum. They bet their agile transformation on Scrum, and are now dealing with the fall out of its failure to deliver software faster. Waterfall teams knew this, and have long accepted their fate and priced it in. Scrum is a good software development process, as is Waterfall, as is Chaos Driven Development, as is Lean, as is any other process, in the right context. As much as we like to think of Software creation as an engineering project, it is a discovery process. If your team is discovering what the software should do, or what the customer wants, or how to be successful in your management environment, they’re discovering, they’re not engineering.

What’s the difference? Risk and uncertainty. The more known a system, process, and customer are, the less uncertainty and risk there is in delivering for that customer, process, or system.

The greater the uncertainty and risk, the greater the opportunity for increasing productivity.

Perhaps ironically, the less risk and uncertainty you have in your process, system, or customer, the easier it is to sustain productivity efficiencies as isn’t as much risk that it’ll be scrapped, but the potential percentage change will be less.

In order to identify the areas to improve, take the three parts: The system, the process, and the customer; and assign a risk and uncertainty score to each. 10 meaning extremely risky or uncertain, and 1 meaning no risk or uncertainty.

It’s preferable to have a score for both risk and uncertainty, so as an example, if I were to apply this to Gusto (a payroll system I use), from the perspective of the internal software team working on the payroll (this is hypothetical): it may look like this:

Customer: 3 Risk/2 Uncertainty (the customer is well known, the “Why” is well understood)
Process: 4 Risk/7 Uncertainty (development team is adopting “cloud CI/CD”)
System: 8 Risk/2 Uncertainty (the system is well known, but technical debt makes it risky to make changes)

With these numbers in hand, it’s easier to identify which parts to focus initial productivity efforts on.

Grocery Shopping and Dry Cleaning

If you had told me 6 months ago I could pay $800 a year to save 250 hours per year, I would have laughed at you. Keep in mind, $800 sounds like a lot, but saving 250 hours a year is akin to 1.7 months of time saved at work. For the average salary of an average software developer, that’s almost $20K in savings. Pay $800 to save $20K? Yes, please.

What’s almost as important (and maybe more important) is that if you had told me 6 months ago it was even possible to shave off 2 months of work, I would have dismissed you outright.

Once again, as these things often do, the time-saving trade came from my spouse. She’s that magical teacher sort I spoke of in an earlier segment, and she has long spent her time eeking out free minutes where possible. This time? It was groceries and dry cleaning.

Buying groceries for a family of 6 takes around 1.5-3 hours, once you factor in travel, Costco, and putting the groceries away, each week. We’re on a budget, so we can’t afford to waste trips by buying more than we’re sure we’ll consume. Instacart costs $99 per year plus the cost of groceries, and in return, someone else (or several someone elses) do your grocery shopping for you, and deliver your groceries to your house. $99 dollars per year(I think we got a first year promotion, the word on the street is it’s $149 per year).

Three months in, I can’t fathom what would be true for me to say no to renewing instacart. It saves time, which with three kids and a full schedule is more important to me than the money.

The second saver of money for us was getting my dress shirts dry-cleaned. You’ve probably have the $1.50 cleaners, and if you’re like me you’ll spend an 30 minutes ironing five shirts, every week. The trade, once again, is worth it.

Why am I talking about grocery shopping and dry cleaning? We’re surrounded by those sorts of chores when process meets developing software for others, especially in enterprises. Someone, somewhere wants a report, and someone, somewhere else wants certain t’s crossed and i’s dotted.

What sort of activities do you have weekly or daily that remind you of grocery shopping and dry-cleaning?

What do you have to do as a leader of teams or a project manager where you would happily pay to have a report automated, or a process automated? How much would you pay?