Anatomy of a Git pre-commit Hook

What does a Git Hook Look like?

There are several different kinds of hooks in git (“Git Hook); these hooks let you ‘hook’ in to an event in git.

There are lots of events you can hook into, some happening on the client, some happening on your ‘server’. For this conversation, assume ‘server’ is the central repository everyone on your team pushes their git commits to (git purists will be angry by this, but it’s the best we can do in this medium).

Today, I’m going to focus on a particular git-hook, pre-commit:

The first line of a git-hook (thanks unix) is the She-bang line. This line tells the shell what program to use to execute this script. (She-bang, or “Shhhh-bang” refers to the #! Followed by the location of binary used to execute the script):

#! /usr/bin/env/node

After this, the script starts, and each hook passes different parameters into the script, depending on the hook. For the pre-commit hook, There are some interesting properties of git that make this less programmer-y and more sys-adminy.

There is no consistent git API; as programmers we may expect everything to be parameters into the hook; but that’s not how it works in Git. Git treats hooks as scripts that live on the same system as git, as if those scripts are interacting with git (and the OS’s) own filesystem.

A (perhaps wild) outcome from this is that there are certain interactions in git that create files. When you add a commit message using the -m option (and even when you don’t, when you let the editor pop up), it’s actually creating a file in the .git directory called “COMMIT_EDITMSG”.

Viewing this file shows the commit message the user just tried to use to commit the files. If you’re trying to base a script off of interacting with the commit message, this file will contain the commit message.

The next property of a a git-precommit hook is that the exit mechanism follows unix conventions. exit(0) for success, exit(1) for failure.

If you want to access the files changed as a result of the commit (for instance, to ensure the unit tests pass before allowing the code to be committed), there’s a giant problem that isn’t apparent until you run into it a few times: Most git pre-commit hooks iterate over the files in your working directory; even if you’ve only partially staged changes for git. That means that the state of your local copy does not match what is actually going to be committed.

Git does not provide an easy way to check what is going to be committed; so if you want to check files you’re about to commit, and you need to be absolutely sure the state of the repository reflects what will happen on the build server, you have to do some gymnastics. From this point forward, I’ll be referring to John Wiegley’s excellent git pre-commit hook that does this. You may not need something so extreme, but it does get the job done.

  1. Checkout a copy of the current index into MIRROR
git checkout-index --prefix=$MIRROR/ -af

Git’s checkout-index lists the files that are in the index (committed files prior to this commit); the f option is for force and the a option is for ‘all’. Checkout-index focuses on the files that are in git’s index; not necessarily all the files present on the file system in a git directory. For instance, files that are staged, or files that are changed but not committed will have different copies in the index than they will in the working directory.

2. Remove files from MIRROR which are no longer present in the index

git diff-index --cached --name-only --diff-filter=D -z HEAD | 
(cd $MIRROR && xargs -0 rm -f --)

This part removes any file in the mirror that don’t match what is in the HEAD revision (HEAD is a pointer to the ‘latest’ work you’ve done)

3. Move only the correct files to the $TMPDIR.

rsync -rlpgoDO --size-only --delete 
    --exclude-from=tools/excludes $MIRROR/ $TMPDIR/

And finally, it takes the files that were changed according to the commit, and moves them to a directory ($TMPDIR).

Once you’ve done these mental gymnastics, you can finally run unit tests against this new mirror as a part of the pre-commit script.

It’s important to restate that the above gymnastics are only necessary if you want to be absolutely sure the repository will match what happens on the build server.

John’s githook is here in full; but there are a few takeaways from this:

Git hooks (not all) exist on the local repository, and have the same issues you would interacting with the repository. They’re shell (or node, or perl, or python) scripts that have access to the same information you have access to.
There are at least three states you need to be aware of at all times when dealing with a git hook like pre-commit:
The changes in the working directory that you’ve committed
The changes in the working directory that you’ve staged, but not committed
All files in your working directory, neither staged, committed, or tracked

  1. Git hooks can be used to do anything you could do from a local shell script:
    1. Web API calls
    2. Invoking other programs
    3. Making changes to the files you’re committing

In future segments, we’ll dive into other interesting things you can do with git hooks – the important point here is that git is very powerful and can be used to automate your processes without having to worry about vendor lock-in and having the same immediate feedback you would after you’ve already committed changes.

Agile can’t save us

There are two types of people, those that have read and believe Fred Brooks admonishment of ‘no silver bullet’, and those that believe that an “Agile Transformation” will improve productivity in their company.

So we shouldn’t try, right? Just cast Agile aside and do what we’re doing now.

Maybe. Yea.

“agile” is a state of being, not a process. It is a feeling that drives development and delivery. In teams I would consider agile, none of them operated exactly like the others; and some actions that would improve agility in one team would harm it in another. Some companies are able to mimic that feeling by embracing agile principles, and others aren’t. The important part isn’t the specific methodology that others say will help you achieve max productivity for your team, it’s the process that aligns with your culture and your software and your customer.

Your Culture Dictates What you Can Improve

In the previous segment, I mentioned three different areas of focus for productivity improvement:

System: The software being built
Process: How the software gets built (notionally refers to the type of development methodology being followed, but in reality describes the culture around the team and the process the team actually follows)
Customer: The purpose behind the software; and who the software is being built for.

Changing any of these requires understanding your enterprise’s culture intimately.

I once worked with a large defense contractor, and the team had been tasked with improving the performance of the application. The conventional wisdom was that performance would be improved if all of the tickets in the backlog were completed. Now, you and I know this wouldn’t have guaranteed application performance improvement at all, and it certainly wouldn’t guarantee customer happiness. But, it’s a metric the customer could see, and stood in as a proxy for success.

The culture was such that doing what was actually necessary to improve the application’s performance was out of scope. The application essentially needed a complete rewrite. Its assumptions were invalid, and it took 10 years for the customer to realize it. But, for historical reasons, the customer was loathe to rewrite it and the defense contractor was not going to suggest that, with good reason. I foolishly advocated for a targeted rewrite of the transport and presentation layers, and was politely told ‘no’.

I learned from this exchange (and probably would have realized it sooner if I had paid closer attention) that the System itself was not in scope for change according to the culture of the organization. But, as I would find out later, I could improve delivery and customer satisfaction by changing how the software was delivered to the customer. This floored me. We couldn’t port the software to the web, but we could spend hundreds of thousands of dollars on an unproven technology by a proven vendor to deliver the software over the web (Microsoft RemoteApp), and the customer was happy about it!

The culture of the company dictated which lever I could pull.

Do you find yourself running into roadblocks when you suggest process improvements? Delivery improvements? If so, it might be that your culture doesn’t allow for that lever to be pulled. Find another lever, and try to pull that one.

What levers can you pull to improve productivity?

In his seminal work, The Mythical Man Month, one of the many lessons Fred Brooks teaches is that there is no silver bullet. There is no one management technique or technology that can create a 10X increase in productivity over a decade.

Scrum practitioners is learning this now, as are the management teams that went all-in on scrum. They bet their agile transformation on Scrum, and are now dealing with the fall out of its failure to deliver software faster. Waterfall teams knew this, and have long accepted their fate and priced it in. Scrum is a good software development process, as is Waterfall, as is Chaos Driven Development, as is Lean, as is any other process, in the right context. As much as we like to think of Software creation as an engineering project, it is a discovery process. If your team is discovering what the software should do, or what the customer wants, or how to be successful in your management environment, they’re discovering, they’re not engineering.

What’s the difference? Risk and uncertainty. The more known a system, process, and customer are, the less uncertainty and risk there is in delivering for that customer, process, or system.

The greater the uncertainty and risk, the greater the opportunity for increasing productivity.

Perhaps ironically, the less risk and uncertainty you have in your process, system, or customer, the easier it is to sustain productivity efficiencies as isn’t as much risk that it’ll be scrapped, but the potential percentage change will be less.

In order to identify the areas to improve, take the three parts: The system, the process, and the customer; and assign a risk and uncertainty score to each. 10 meaning extremely risky or uncertain, and 1 meaning no risk or uncertainty.

It’s preferable to have a score for both risk and uncertainty, so as an example, if I were to apply this to Gusto (a payroll system I use), from the perspective of the internal software team working on the payroll (this is hypothetical): it may look like this:

Customer: 3 Risk/2 Uncertainty (the customer is well known, the “Why” is well understood)
Process: 4 Risk/7 Uncertainty (development team is adopting “cloud CI/CD”)
System: 8 Risk/2 Uncertainty (the system is well known, but technical debt makes it risky to make changes)

With these numbers in hand, it’s easier to identify which parts to focus initial productivity efforts on.

Grocery Shopping and Dry Cleaning

If you had told me 6 months ago I could pay $800 a year to save 250 hours per year, I would have laughed at you. Keep in mind, $800 sounds like a lot, but saving 250 hours a year is akin to 1.7 months of time saved at work. For the average salary of an average software developer, that’s almost $20K in savings. Pay $800 to save $20K? Yes, please.

What’s almost as important (and maybe more important) is that if you had told me 6 months ago it was even possible to shave off 2 months of work, I would have dismissed you outright.

Once again, as these things often do, the time-saving trade came from my spouse. She’s that magical teacher sort I spoke of in an earlier segment, and she has long spent her time eeking out free minutes where possible. This time? It was groceries and dry cleaning.

Buying groceries for a family of 6 takes around 1.5-3 hours, once you factor in travel, Costco, and putting the groceries away, each week. We’re on a budget, so we can’t afford to waste trips by buying more than we’re sure we’ll consume. Instacart costs $99 per year plus the cost of groceries, and in return, someone else (or several someone elses) do your grocery shopping for you, and deliver your groceries to your house. $99 dollars per year(I think we got a first year promotion, the word on the street is it’s $149 per year).

Three months in, I can’t fathom what would be true for me to say no to renewing instacart. It saves time, which with three kids and a full schedule is more important to me than the money.

The second saver of money for us was getting my dress shirts dry-cleaned. You’ve probably have the $1.50 cleaners, and if you’re like me you’ll spend an 30 minutes ironing five shirts, every week. The trade, once again, is worth it.

Why am I talking about grocery shopping and dry cleaning? We’re surrounded by those sorts of chores when process meets developing software for others, especially in enterprises. Someone, somewhere wants a report, and someone, somewhere else wants certain t’s crossed and i’s dotted.

What sort of activities do you have weekly or daily that remind you of grocery shopping and dry-cleaning?

What do you have to do as a leader of teams or a project manager where you would happily pay to have a report automated, or a process automated? How much would you pay?

Mise en place (Part 2)

In my last blog post, I talked about how my wife, a teacher, is able to pack so much into her day. How she’s able to reach students in 4-7 disparate lessons per day and do so in the the time she has. The biggest enabler is Mise en place. In this segment, I want to talk about how we can adopt that mindset for our work — knowledge work, that is by itself sometimes unknowable.

First off, let’s look at how our days are structured:

1. Get into work (8:00-8:10)
2. Check Email, respond to any issues (8:10:8:30)
3. Open up IDE, text editor, pull code (8:30-8:35)
4. Get distracted, open up browser (8:35-9am)
5. Look at work items in JIRA or TFS (9:00am-9:15am)
6. Find right files, and start working (9:15-9:20am).

Your day probably looks more or less about this; but let’s be realistic. More often than we care to admit, we waste a full hour and a half at work before we do what we are judged on — what software features and outcomes we can deliver. What would this look like if we practiced Mise en place — everything in its place?

First off, we’d probably spend the previous day’s final 15-30 minutes getting ready for work the next day. It’s not often we’re productive in the final 15-30 minutes away; and so this is a good time to prepare for the next day.

One important outcome when preparing for the next day is ensuring that you feel as productive as soon as possible. Email releases the endorphins when you’re able to respond, but is it the most productive use of your time? Probably not. (unless you’re a manager type). If you know you’ll be working on a new feature tomorrow, go ahead and create the branch today. It doesn’t matter what branching methodology your team uses, creating branches in git is free and helps segment your work. (And if you’re not yet using git, use your source control’s analogue, if it has one). Typically you’re not going to be assigned new work overnight, and so you can already pick tomorrow’s first work item today. Go ahead and move it into in-progress before you leave, and assign it to yourself. Get your workstation set up so that when you come in, you’re able to start working on it immediately.

Do you need some research to be able to start on that work? Go ahead and pull it up. Look over the work item, and make sure you can get started with it. That’s what’s important — ensuring your next 4 hours or so are covered and that you won’t have to pause work to get something to be able to be productive.

That’s all we’re going to focus on for right now — ensuring that your next day’s work is able to be started as soon as you walk into the office. Only worry about the first few hours for now, later, we’ll talk about strategies for increasing your productivity window.

Mise en place (Part 1)

My wife is an amazingly organized person professionally. She’s been teaching for 12 years now, and knows how to squeeze more out of a few minutes than I do an hour. If you can remember back to your time in a classroom (or are a parent and look at your child’s schedule), how was it that a single teacher was able to conduct 4-7 disparate lessons a day? Every day? With crafts, or manipulatives, or anything else they needed to make it happen?

It feels like magic, but it isn’t.

In watching my spouse work, she breaks down her day into chunks. Every minute is accounted for, whether it’s a scheduled restroom time, lunch, snack, going to and from other classrooms, lesson times, students with questions, whatever. She puts this into a spreadsheet, and she figures out what will work, and what won’t. Every year (sometimes every semester). This takes her around a full day or so. She looks at the curriculum, and makes adjustments to her planned lessons for the year depending on what has changed in that curriculum. She then breaks her lessons down (if they aren’t already), and thematically adjusts them for the year and the kids she’s expected to get for that year (she’s fortunate to know her students before they come, so with few exceptions she knows what she’s getting).

You’d think that’d be it, right? She’s done? Not even close. Every sunday, she spends about 30 minutes re-adjusting lessons for the week depending on events on the ground. She may spend 5-10 minutes a day if needed to re-adjust nightly if needed; but generally it’s not needed. As a rule at her school lesson plans are due the week prior; and so Sunday is her final chance to make sure it’s done.

Once you see it in action a few times, you become a believer. Teachers have from 8am-3:15pm per day (minus lunch, “specials”, and recess) to teach students. They don’t have the 8+ hour days that we have, and they accomplish much more than we tend to per day (weighed together).

How do they do it?

“excellent time management skills” seems like a cop-out. They’re forced to have those, sure. But the practices I see include a fundamental concept that we can practice in our own work. Mise en place. It’s a term that means “everything in its place”. It relates to cooking as preparing all of the ingredients and their portions before cooking, but it works for other stuff too.

For teachers, Mise en place includes the activities before the year starts, the activities before each week starts, and the preparation of all the items needed for activities before the day starts.

What does it include for us? Next time, We’ll go into how we can adopt that mindset, and what that means in developing software.

Why doesn’t my team get it?

One of the questions I hear often from new Delivery Managers is, why doesn’t my team ‘get it’? Why don’t they understand what’s at stake? Why don’t they understand what the client needs like I do? In fact, in any endeavor, understanding the why is the hardest part. Once you understand the why (if there is one), the How and What become much clearer.

When you’re developing a software product, it’s usually easy to connect the why to what you’re doing. I use the payroll platform Gusto, and their why is pretty easy: To make payroll easy for small businesses. That’s why they exist. If you were a developer for Gusto, you could point to that any time you needed to figure out if you should take course of action A or B. “Does this make payroll easier for small businesses?” If the answer is yes, do it, if not, do something else.

For enterprise software, ‘why’ is a muddier concept, but is still critically important in helping to paint the picture. If your team is struggling to deliver features, or to make visible progress, the first step is to be sure you have a “why”, and that you’ve communicated it, often.

If you struggle with understanding why your team does what they do, or you’re a software developer and you don’t ‘get it’, this talk by Simon Sinek will help.

Invisible Process

How do you feel when someone says the word “process”?

I’ve seen three reactions to it:

  1. Yea, it sucks, but how could we get our work done without it?
  2. Let’s check these boxes so the powers that be are happy
  3. Meh. It doesn’t affect me.

Process is at best a necessary evil to you or it’s how you get work done. It either fills you with dread or you are at best indifferent to it.

The emotional reactions towards process are inversely proportional to its impact.  That is, the right process can keep your team running smoothly where the wrong process causes everyone to cringe and complain during retrospectives.

Scrum attempted to solve this problem by narrowly defining the ‘required’ processes to a minimum and claiming that if you used other processes mixed in with scrum, you were not doing scrum!

Process has become the enemy for all right-thinking software developers; and required for medium to large teams.  

But what if your process was invisible?  What if your entire process were automated?  How would you feel about process then? 

What if the necessary code review requests were automatically sent out to the right people — people with domain knowledge in the area you’re working in?  What if the test suites you’re working on helped send an email with the method signatures and their expected / actual return values? 

What if those reports that you spend a day generating for your clients to show them how much effort you put into your work (never to be used again) were automatically generated for you?

How would you feel about process then?

Why should I care about Git Hooks?

In helping software teams double their productivity, one of the tools that we all use as software developers that gets the least attention is source control. If you’re using git, there are numerous features sitting there, waiting to help you develop software with less friction. One feature that I’m particularly enamored with are git hooks. Git hooks are automation scripts that fire when certain events happen in git, allowing you to automate otherwise manual activities.

Git hooks allow you to do the following:

  • ensure branch names are consistent across individuals
  • separate whitespace commits (formatting) from substantive commits
  • update your ALM’s (JIRA, TFS, Gitlab) status automatically on commits with certain properties (e.g., a commit saying “fixed bug” triggers a workflow move from ‘in progress’ to ‘in review’ for a given bug)
  • trigger checks to ensure copyright/license information is included in commits for new files

and so much more.

If you can write a script for it, you can automate it in git — and due to the nature of git, those hooks stay with the repository, no matter who pulls your code.