Git and V&V: How I Learned to Stop Worrying and Love the Merge
Every Engineer Gets “Bit by Git”
It’s no secret that Git falls into the category of tools that are so powerful that novices routinely cause unsolvable (or seemingly so) issues by experimenting with commands they don’t fully understand yet.
It’s also no secret that “fully understanding” any one Git command is probably a fool’s errand, and you’ll still be learning new things for the next decade.
Through the process of learning Git, you experience a lot of confusion (what the hell is a detached HEAD
? that sounds painful) and generally make a mess of your repository’s history.
You probably end up with something like this (or much worse):
Once you’ve learned enough from your mistakes to be competently branching, staging, committing, and merging, you encounter a curious and enticing new command: git rebase
.
Rebasing is /aw(esome|ful)/
I’ve got nearly a decade of professional Git experience, and even more from my high school and college days, and I still sometimes accidentally obliterate a change while rebasing.
Hopefully that change was my own and doesn’t belong to someone else who I now have to explain myself to.
The ways you can get yourself into horrible broken states with rebase
and any other history-rewriting verb are myriad, so I will not cover them here. Learn them, but avoid them when you can!
Everyone who’s used the rebase
verb knows that it’s one of Git’s most useful swiss army knives.
Reverting something on your development branch? Rebase and drop that commit.
Renaming commits en masse? Rebase and reword.
Made too many tiny commits iterating on one single over-arching change? Rebase and squash them together.
It’s easy to see why engineers love git rebase
because just using it once on the above example cleans the history up an order of magnitude.
Before a single invocation of git rebase main
After a single invocation of git rebase main
(and maybe some merge conflict resolution)
Rebasing again for commit cleanup
Running an interactive rebase, you can reword and squash those three iterations of the same feature on new_feature
:
git rebase -i main
reword working on a new branch to not screw up main this time
squash back to working on new feature
squash aw yiss
And now your history is even more readable to yourself and your coworkers.
That’s the awesome part of rebasing. You’ll discover the awful part yourself, if you haven’t already.
(Tip: try enabling git rerere
to make it a little less painful)
Hooked on Rebasing: The Gateway Drug to Untraceable Versioning Practices
Now we’ve learned our lessons about things like committing directly to main and how to rewrite the history of feature branches, and we’re addicted to “clean” git history. We’ve been lucky enough to be starting on a green-field project next, so we’re able to take these learnings and implement them from square one on a new repo, and it. is. glorious. Our perfect and beautiful Git history is a joy to behold:
But there’s a feeling tugging at the back of our reptile brain.. What if it could be cleaner? What if we could avoid the inter-weaving merge lines entirely? Wouldn’t it be so nice if we were able to work on feature branches to prevent conflicts, but have a Git history that looks as clean as pushing directly to main?
Then we discover the inviting comfort of the squash merge.
Squash Merging: But Why? .gif
At this point we’re deep into mis-conflating “clean Git history” with “clean development practices”.
We learned as novices that you shouldn’t develop directly on main for many good reasons, but now we are addicted to the sight of “simple” and “linear” history in our commit graph and want to return to something that emulates developing directly on main.
We discover in our Git hosting service (in the case of many, GitHub) that there’s a way to set the main
branch to require linear history.
Okay, no arguments there - I’ve checked the box.
We also discover in merging our PRs the little drop-down that lets us chose between merging, rebase-merging, and squash-merging.
Now that linear history is required, it doesn’t let us use the normal merge and points us toward squash-merging.
What’s squash-merging, we ask ourselves?
The documentation from the Git CLI about merge --squash
reads as follows:
Produce the working tree and index state as if a real merge happened (except for the merge information), but do not actually make a commit, move the HEAD, or record
$GIT_DIR/MERGE_HEAD
(to cause the next git commit command to create a merge commit). This allows you to create a single commit on top of the current branch whose effect is the same as merging another branch (or more in case of an octopus).
More likely, instead of reading the documentation, we just try it. And it works! It generates the beautiful graph from above that looks as if we’ve all been working directly on main and magically avoiding conflicts with each other. As the documentation says, it does the merge without actually creating the merge information, thereby making it look like our whole branch was one beautiful, atomic, pristine commit.
The beauty of “linear history” is reason enough for us to move forward with it as our project’s preferred setting for merging PRs. We adopt it and never look back!
Post-Release Traumatic Squash Disorder
We’ve been moving fast up until now.
Our greenfield project is rapidly approaching a 1.0 release because we’ve been able to break whatever we want since nobody’s using it yet.
A wonderful feeling as an engineer, but one that always comes to an end.
It’s time for 1.0, and our team is committed to making sure that released versions are stable while we continue moving forward on main
, so we branch our release when we’re ready.
… Wait.
We’ve got project rules in place that enforce that main
must have linear history.
How do we get hotfixes for versions back onto main
?
Do we squash them?
If we do, we end up with something like this:
Seems fine, I guess. It makes sense that 1.0 would be “behind main” in this way, right? But, subtly, this is where our troubles begin. Our next release, 2.0, approaches, and we create a branch.
But wait! Our biggest customer just reported a bug on 1.0. Unfortunately they cannot upgrade to 2.0 for a few weeks, due to an upcoming presentation to their stakeholders. They really need this fix applied to 1.0, and fast. We double-check and find that the bug exists on both 1.0 and 2.0, so both need a retroactive patch. Thankfully the code in that component hasn’t changed much, so we’re able to make one fix and open two PRs from it.
We go to open our PRs from important-customer-fix
to versions 1.0 and 2.0 and find something really nasty.
Trying to merge it into 2.0 has inscrutable merge conflicts - how could this happen? The actual code being changed isn’t even a part of the merge conflicts.
The culprits is the squash commit:
By squashing all the changes that were made on the 1.0
branch into one single commit, you created one commit with a unique hash on the Git commit graph that contains the same changes as other commits in the graph.
Now, merging your important bugfix from 1.0
, which has the “real changes” from before the squash, you’re encountering merge conflicts because Git doesn’t know that it’s exactly the same changes because they’re not the same hash.
If two commits have the same hash, Git knows they don’t need to be compared.
If we went back in time and had done a traditional merge instead of a rebase, we could easily merge our new hotfix:
With two PRs from one feature branch, no merge conflicts, and no history rewriting, we were able to apply a hotfix to two release trains and then to main. I won’t argue that it isn’t uglier, but I can’t argue that it isn’t more effective.
Cherry-Picking: It’s Just Rebasing With Extra Fewer Steps
Cherry-picking has exactly the same problem.
The first sentence of the documentation for git cherry-pick
is as follows:
Given one or more existing commits, apply the change each one introduces, recording a new commit for each.
I’ve worked with many engineers who are highly proficient in Git that do not understand that cherry-picking is not “copying a commit to another branch”.
Copying a commit to another branch is actually (practically) impossible, because the commit’s parent in the graph is the previous commit on whichever branch you were working.
As a result, tools like cherry-pick
copy the changes from the commit, but not the Git metadata.
This can cause the same problem described above, but is often encountered when working in a different direction. Many projects that encounter this pain by operating like this:
This is acting almost identically to merge --squash
above and often generates the same horrible merge conflicts.
I would generally recommend against this workflow whenever possible.
Cherry-picking is very useful when you accidentally fix a bug on main
and need to backport it, but if you’re deliberately fixing a bug on multiple versions, I would recommend fixing it on the oldest version and porting it forward instead of backward so that you can maintain the commit’s identity via merges.
Conclusions
Preferge the Merge
I work on a very complex project that has merged thousands of PRs.
While we have a cultural attachment to squash-merging that pre-dates me, we did successfully modify our process to use traditional merges in the contexts where they are most impactful.
As described above, that’s when synchronizing a release branch with a development branch or when applying a hotfix to multiple releases.
We still use squash-merging for putting feature branches directly into main
, and it largely works fine unless you’re trying to bisect something granularly and then you wish you had more context.
Develop forwards, not backwards
Honestly, just don’t backport things.
Unless you have a dedicated QA team who tirelessly QAs all available versions of your software (basically nobody has this), you should develop forward not backward.
In my opinion, the structure of the Git commit graph strongly implies this is the intended usage of the software, and when you start using it as intended things get a lot easier.
If you need to apply a fix to an old version, apply it there first and then apply it to newer versions with another merge.
This avoids needing to use cherry-pick
frequently, which as I explained above creates sinister duplicate nodes in your Git commit graph that become landmines that go off when you try to merge things together later.
Don’t Sacrifice Introspection for Cleanliness
As I tried to avoid making this post a step-by-step git tutorial with blocks of commands, one thing I didn’t mention is that Git has all sorts of useful metadata on the graph that can help you (and your QA) debug issues if you follow the processes I’ve described.
If you follow a squash-merge workflow that discards this metadata, when you try to verify if a fix was applied to a particular branch, you have to go look at the actual code to check, because the commits aren’t the same.
If you do a traditional merge instead, the two instances of that commit are the same exact node on the graph, and when you show it with git show
or look at it in your Git GUI like GitHub it will just have a list of every single branch and tag that contains that commit.
Now you can tell at a glance which branches and which releases contain that fix.
The core of the point I ask you to consider is this: Am I making more work for myself and my team just because I want my terminal output from Git to look cleaner?