no subject

#1:

[...] the new idea feels obvious. And more, you are so changed by the new perspective that you can’t even understand your own prior lack of understanding.

My most vivid memory of that happening to me is the day I lost my fear of git. The inability to express what I'd previously been confused about was so striking, and so sudden, that I actually bothered to introspect to see if I could work out what had happened.

I'd read several articles on the web describing git's object data model: immutable commit objects linked in a dag, each linking to a tree object, each linking to blob objects for files or further trees for subdirectories, and branch heads as (mutable) references to a specific commit. Somehow, these articles didn't cause me to be less confused. And then one day I ran git fast-export to dump out the whole of a test git repository, and looked at the dump file to see if I could understand it. Then suddenly the blockage cleared from my mind and I felt I understood git. But all that was in the dump file were the same commits, blobs and trees that I'd already read about: with hindsight I could see that those articles had told me nothing but the truth. But somehow looking at it "for real" conferred an understanding that the articles hadn't.

It took me months of pondering to figure out exactly what had become clear to me that day, which somehow hadn't been clear before. But eventually I decided the thing I hadn't taken in from the articles (maybe they hadn't said it, or maybe I missed it, I never went back to check which) is: there's nothing else. If you have a collection of commits, blobs, and trees, and you know which commit each branch head points at, that's enough to construct an equivalent git repo of your own (which is exactly what git fast-import will do when it reloads the same dump file). No further types of metadata exist in the repository that I hadn't known about¹.

Once I was sure of that, I wasn't afraid of complicated operations like git merge or git rebase, because I could work out for myself what they basically had to be doing internally: given that data model, there's only one option. Whereas when I still had a 10% worry that there was some extra metadata lying around (e.g. machine-readably marking cherry-picks of the same commit as equivalent, or explicitly marking that a file was renamed in a commit), there were more possibilities in my head as to how the thing might work.

"If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming." – Rob Pike

¹ (of course, there are edge-case weirdnesses like the git-replace and git-notes ref namespaces, and no end of extra nonsense that can exist locally to a particular git checkout, but for the core understanding, I was interested only in the common case, and only in the data shared between repositories by pushes and pulls. Everything else I was confident I could learn later as and when it impinged on me.)

(11 comments)

no subject

Post a comment in response: