Time sure does fly don’t it? I’m going to go over a lot of the beginner mistakes and little pitfalls and papercuts I’ve encountered so far while working on the GHC test-suite. In a later post that I’m going to write pretty soon, I’ll be going over my progress so far, the remaining things I have yet to finish for the conclusion of the HSOC, and an epilogue of where to go from there. Be sure to checkout the Reddit discussion if you haven’t already.
I had a lot of these; mostly due to my inexperience, but also GHC is a pretty old project and has a lot of small warts and niggles here and there that people just end up “learning” over time. Unfortunately it’s not really something where comprehensive documentation can easily be written down other than a general “best practices” approach (which I think I’ll write up).
git submodule update; I had quite a few issues early on with extraneous files in git status before I learned to do that.
dist/maintainer cleaninstead of ./validate will help you out quite a bit. Unfortunately, this still doesn’t clean out everything.
compiler/stage2without messing up a stage 1 or library build, which can save a lot of time depending on your use cases.
Yeah, there were so many of these that it deserves its own category. It turns out that for basic usage all you need to know is “push, pull, commit, add” but for working on a real codebase with real code and tons of things you can’t just nuke every time you do something wrong… It’s a bit trickier. Things I really needed and had to learn:
Programmers need sanity checks for everything, especially when working in a language like python. I can’t begin to mention how many times I was working on something only to have everything break for hours before I realized I was implicitly assuming something that wasn’t accurate. Not all tests are performance tests, some tests involve measuring the compiler, some don’t, etc. Sanity checks don’t just involve the code, they also involve your environment as well; I’ve accidentally ran tests on the wrong commit, used the wrong flags, accidentally wasted 2 hours building the wrong version of a program… Small things like that sneak up on you and the only way to really solve them is to approach things in a certain way.
Are you assuming that everything a performance test? Write that in a comment somewhere. Do you have a variety of scenarios where things need to hold? Write those down and test them every time. (example: I lost quite a few hours once when I forgot that the performance comparison tool couldn’t assume multiple commits of information. It had to work if you had no git notes setup yet, if you only had one commit to compare with, if you had multiple, etc). Does some condition need to hold? Check that condition rather than some other condition which implies the one you actually want. (example: I had some code break because the testsuite was originally built in a way that assumed if one particular thing existed then the program was in a certain state. What should’ve happened is the program should’ve just checked for that state explicitly.)
On a related note, it’s really important for me to write down some notes while I’m debugging things so I can keep my train of thought if I want to take a break and come back. There were quite a few times where I would go through something and figure out that X was doing Y and then I’d take a break and I’d go back and I’d be debugging some other area and I’d rediscover that same X doing Y again. Keeping some notes together whenever trying to piece something together lets you hold a lot more things together and see how they’re all related. This is even more important in a language like Python where lots of small moving parts are involved and they can all break each other.
Error handling should be robust, but… Not overly excessive. I had one issue that came up where my git notes parsing function broke if there were empty lines in the git notes. If I had been more robust with error checking code elsewhere, I likely wouldn’t have noticed that until much later and it would’ve been harder to track down. Thinking of functions as semi-pure helps a lot with figuring out where to put the error handling; I like to write my functions in a way that it always returns a “sane answer” to whomever called it. Error handling should be bottled up inside as tightly as possible and only blown out if it absolutely needs to be. In this case, the solution was to strip empty lines and then the function worked fine; if I had blown up the error elsewhere, I would’ve needed to explicitly handle it everywhere which would be a giant pain. Instead, the git notes parsing function is a “pure” function that returns either an empty list or the parsed information and life is grand.
This one is really difficult for me because I’m fairly ADHD and tend to bounce around a lot everywhere. If I see a small little issue, I’ll fix it; if I see a small little thing, I’ll poke it; before you know it, there’s 15 unrelated changes in a commit and it looks terrible. Instead, what I’ve learned to do is keep a notepad (or org document) handy and I just write down all the small shit I wanna fix. Then, whenever I’m working on code, I’ll design the commit before I actually start working on it: that is, I figure out what atomic change or improvement I’m going to make and then go do that. I’m not super strict with following this methodology yet but it’s working fantastically so far.
Another thing I have trouble with is that often, “one thing” to me is really more like 20-30 different things; you want to work as small as possible and commit as often as possible and then squish commits down later if needed. For example, “implement git notes parsing” is one item but it really should be made up of ~20 commits that are later squashed into one. I’m not quite there with the squashing and committing yet, but I’m getting better.
This one is somewhat obvious but it’s still something that bears repeating. I tend to get hyperfocus and I’ll either not work on a project or I’ll sink 35 hours in 2 days on a project. That’s not particularly healthy nor is it particularly effective; it’s fine when you’re just dinking around on things, but when efficiency and critical thinking matters, it’s no longer very sufficient. Eat regularly, sleep regularly, take breaks regularly, and work regularly; the mind thrives on consistency and learns to work well when it’s given that consistency. If I didn’t do these things, I’d find that I would have a constant feeling of “needing to work on the project” but what I would usually end up doing was just screwing around on the computer rather than getting things done in a productive manner. Closely related to all of this is optimizing your workflow for you and being able to recognize your limitations and how to get around them. One thing that’s critical for me is being able to immediately answer the question “what is the next immediate task I need to do”; if I can’t answer that, I need to sit down and figure out that answer and get a small checklist written before I start working, otherwise I’ll just wander around aimlessly for a while until I sorta get stuff done. This applies to everything in my life and it’s why making breakfast takes me an hour some days and 3 minutes other days and why I can get ready for the day in 30 minutes or 3 hours.
So these are just some of the things I’ve learned and struggled with while working on this project. I’d probably summarize this into a few key points:
As always, feel free to comment on the Reddit discussion in r/haskell.