Author: Jared Weakly

Beginner Mistakes and Oddities I Encountered

Time sure does fly don’t it? I’m going to go over a lot of the beginner mistakes and little pitfalls and papercuts I’ve encountered so far while working on the GHC test-suite. In a later post that I’m going to write pretty soon, I’ll be going over my progress so far, the remaining things I have yet to finish for the conclusion of the HSOC, and an epilogue of where to go from there. Be sure to checkout the Reddit discussion if you haven’t already.

Project specific issues

I had a lot of these; mostly due to my inexperience, but also GHC is a pretty old project and has a lot of small warts and niggles here and there that people just end up “learning” over time. Unfortunately it’s not really something where comprehensive documentation can easily be written down other than a general “best practices” approach (which I think I’ll write up).

  • GHC has a lot of submodules. Before working you should not only pull the master but you should also run `git submodule update`; I had quite a few issues early on with extraneous files in git status before I learned to do that.
  • Git notes is really not that ergonomic at all. If you have any namespaces used in your git notes (which this project does) you’ll need –ref=perf for damn near every command you use. It’s a small thing, but it can sometimes take a while to realize what you missed.
  • Running the ./validate script is the only way to guarantee a clean anything. Switch a branch and something weird breaks? ./validate; Not quite sure why tests started failing after a git pull and a small innocuous code changes? ./validate. The real bummer here is that ./validate wipes everything and rebuilds the entire damn compiler and all that stuff from scratch and then runs the most extensive version of the testsuite. The exhaustive version of the testsuite alone can take nearly 2 hours on my computer if it’s not run threaded, so it’s definitely a huge time sink if you don’t have this thing optimized.
    •  Related pain-point; there’s a lot of things you can do to tweak and fix your setup so that you can build and validate quicker. Unfortunately, this is all something that you just have to kind of learn over time and there’s no real way to intelligently “auto-configure” this nicely. You’ll have to suffer super long build times for GHC and long runtimes for the testsuite or sink quite a few hours into configuring the “quicker options” for your specific usecase.
    • Update: Thanks to a few helpful Reddit comments, I’m now aware that `dist/maintainer clean` instead of ./validate will help you out quite a bit. Unfortunately, this still doesn’t clean out everything.
    • To thoroughly clean out the ghc repo, you’ll need to copy mk/ (and everything else you don’t want to lose) somewhere else and then run git clean -ffdx && git submodule foreach git clean -ffdx.
    • Alternatively, a gentle clean can be done by deleting compiler/stage2 without messing up a stage 1 or library build, which can save a lot of time depending on your use cases.
  • GHC uses Arcanist and Phabricator for its code diffs. I’ve never seen any other project use these things, so that’s not super helpful for knowing how to use them. With arcanist (the CLI tool for phabricator), you really want to explicitly name every single commit you push to phabricator; otherwise it guesses and it’s terrible at guessing.
  • Ironically enough, the testsuite has no tests. There’s no way to know whether or not you broke something in the testsuite without running it on every single test and making sure nothing broke that wasn’t supposed to (of course, this doesn’t catch false positives…). ./validate helps, but it’s painful.
  • The testsuite was pretty confusing to understand at first. There was no high level documentation really detailing how the codebase works from the perspective of on-boarding someone to work on it. While there’s enough documentation on how to /use/ it, there’s very little in the way of why it’s designed the way it is and how things fit together. A lot of my earlier weeks were wasted just screwing around and reading the code and being super unproductive in general due to that.

Git related mistakes

Yeah, there were so many of these that it deserves its own category. It turns out that for basic usage all you need to know is “push, pull, commit, add” but for working on a real codebase with real code and tons of things you can’t just nuke every time you do something wrong… It’s a bit trickier. Things I really needed and had to learn:

  • Really basic, but it’s not really something I see mentioned a lot. With git, it’s totally okay to not add every file when you commit things; I committed some files I didn’t need to before I learned that.
  • cherry picking (and the fact that you really want to cherry pick with only one commit, so there was some squashing involved using temporary branches)
  • merging. Every time you think you know how to merge things cleanly enough, you’re wrong.
  • Related to that is rebasing. Ohhh, man, I have a section of my git log where you literally see the same commits 3-4 times because of how screwy and non-linear my history got before I cleaned it up some. Luckily for me I got much better at git hygiene after that and my issues were much less.
  • Messing around with branches was another one. Sometimes you’ll have something you’re working on and then you need to start working on something else. You /could/ just put that thing in the same branch, or you could do the SMART thing and make a new branch. But oh no, what if you have tons of ADHD and you work on both branches at the same time and need to merge them together again? What do you do? You weep silently, my child, as to not disturb your sleeping girlfriend.

Sanity checks

Programmers need sanity checks for everything, especially when working in a language like python. I can’t begin to mention how many times I was working on something only to have everything break for hours before I realized I was implicitly assuming something that wasn’t accurate. Not all tests are performance tests, some tests involve measuring the compiler, some don’t, etc. Sanity checks don’t just involve the code, they also involve your environment as well; I’ve accidentally ran tests on the wrong commit, used the wrong flags, accidentally wasted 2 hours building the wrong version of a program… Small things like that sneak up on you and the only way to really solve them is to approach things in a certain way.

  • Be deliberate: Make sure that you’re doing what you want to do with the right version of what you want to do it with.
  • Hold nothing in your head. Be explicit about every assumption.

Are you assuming that everything a performance test? Write that in a comment somewhere. Do you have a variety of scenarios where things need to hold? Write those down and test them every time. (example: I lost quite a few hours once when I forgot that the performance comparison tool couldn’t assume multiple commits of information. It had to work if you had no git notes setup yet, if you only had one commit to compare with, if you had multiple, etc) Does some condition need to hold? Check that condition rather than some other condition which implies the one you actually want. (example: I had some code break because the testsuite was originally built in a way that assumed if one particular thing existed then the program was in a certain state. What should’ve happened is the program should’ve just checked for that state explicitly.)

On a related note, it’s really important for me to write down some notes while I’m debugging things so I can keep my train of thought if I want to take a break and come back. There were quite a few times where I would go through something and figure out that X was doing Y and then I’d take a break and I’d go back and I’d be debugging some other area and I’d rediscover that same X doing Y again. Keeping some notes together whenever trying to piece something together lets you hold a lot more things together and see how they’re all related. This is even more important in a language like Python where lots of small moving parts are involved and they can all break each other.

Error Handling

Error handling should be robust, but… Not overly excessive. I had one issue that came up where my git notes parsing function broke if there were empty lines in the git notes. If I had been more robust with error checking code elsewhere, I likely wouldn’t have noticed that until much later and it would’ve been harder to track down. Thinking of functions as semi-pure helps a lot with figuring out where to put the error handling; I like to write my functions in a way that it always returns a “sane answer” to whomever called it. Error handling should be bottled up inside as tightly as possible and only blown out if it absolutely needs to be. In this case, the solution was to strip empty lines and then the function worked fine; if I had blown up the error elsewhere, I would’ve needed to explicitly handle it everywhere which would be a giant pain. Instead, the git notes parsing function is a “pure” function that returns either an empty list or the parsed information and life is grand.

Do only one thing at a time

This one is really difficult for me because I’m fairly ADHD and tend to bounce around a lot everywhere. If I see a small little issue, I’ll fix it; if I see a small little thing, I’ll poke it; before you know it, there’s 15 unrelated changes in a commit and it looks terrible. Instead, what I’ve learned to do is keep a notepad (or org document) handy and I just write down all the small shit I wanna fix. Then, whenever I’m working on code, I’ll design the commit before I actually start working on it: that is, I figure out what atomic change or improvement I’m going to make and then go do that. I’m not super strict with following this methodology yet but it’s working fantastically so far.

Another thing I have trouble with is that often, “one thing” to me is really more like 20-30 different things; you want to work as small as possible and commit as often as possible and then squish commits down later if needed. For example, “implement git notes parsing” is one item but it really should be made up of ~20 commits that are later squashed into one. I’m not quite there with the squashing and committing yet, but I’m getting better.

Taking care of yourself

This one is somewhat obvious but it’s still something that bears repeating. I tend to get hyperfocus and I’ll either not work on a project or I’ll sink 35 hours in 2 days on a project. That’s not particularly healthy nor is it particularly effective; it’s fine when you’re just dinking around on things, but when efficiency and critical thinking matters, it’s no longer very sufficient. Eat regularly, sleep regularly, take breaks regularly, and work regularly; the mind thrives on consistency and learns to work well when it’s given that consistency. If I didn’t do these things, I’d find that I would have a constant feeling of “needing to work on the project” but what I would usually end up doing was just screwing around on the computer rather than getting things done in a productive manner. Closely related to all of this is optimizing your workflow for you and being able to recognize your limitations and how to get around them. One thing that’s critical for me is being able to immediately answer the question “what is the next immediate task I need to do”; if I can’t answer that, I need to sit down and figure out that answer and get a small checklist written before I start working, otherwise I’ll just wander around aimlessly for a while until I sorta get stuff done. This applies to everything in my life and it’s why making breakfast takes me an hour some days and 3 minutes other days and why I can get ready for the day in 30 minutes or 3 hours.


So these are just some of the things I’ve learned and struggled with while working on this project. I’d probably summarize this into a few key points:

  • Abstraction is the process of communicating more precisely about something. Use it whenever possible and helpful to make code more robust.
  • Be explicit about your assumptions and use implicit behavior as little as possible; ideally document that implicit behavior in a comment whenever you can.
  • Be deliberate and methodical about your sanity checks; write them down so you can verify things consistently and thoroughly.
  • Do /one/ thing at a time, learn your git hygiene, and use it.
  • Write down your thought processes somewhere whenever you’re doing something more intensive than fixing an immediate, small issue.
  • Take care of yourself, set hours, take breaks, etc.

As always, feel free to comment on the Reddit discussion in r/haskell.

Halfway There, or “I want to say I know what I’m doing, but it’s a dangerous thing to say”

Greetings and salutations! If you’re reading this, it means I haven’t died yet, which is pretty great. So, as my last blog post mentioned, I’ve been spending this summer working on a Haskell Summer of Code project: Bringing Sanity to the GHC Performance Test-Suite. My last blog post also mentioned that I would hopefully be having frequent blog updates, but let’s not talk about that right now. Far more exciting, here, is the potential for me to be able to write more frequently now that I have a proper handle on things! Be sure to checkout the Reddit discussion if you haven’t already.

It turns out that it’s exceedingly hard to write interesting content when you spend the majority of your time in front of a keyboard going “wat” and “I hope this doesn’t blow something up.” However, once you start to get past that stage, stare into the abyss of insanity and not go insane, yadda yadda, life becomes more fun. In other words, I spent the first few weeks of the project being depressingly unproductive before, suddenly, a beam of light descended from the heavens and kissed me with the gift of “sorta-kinda understand stuff.” After that, things have been going increasingly swimmingly and I’m sure my mentor, Ben, appreciates my more coherent questions.

Without further ado, the current state of the union

  • There now exists functionality in the code to grab all of the performance metrics from the performance tests and collect them in a python list, where each element is a string of the form: (test_env , name , way , field , val)
  • There now exists infrastructure in the code to support running only performance tests. It’s a little nicety that helped me understand the codebase better and should hopefully prove to be occasionally useful to people who need it.
  • Infrastructure is now setup so that, from the commandline, one can use the new options: TEST_ENV=”my environment” , USE_GIT_NOTES=YES, ONLY_PERF_TESTS=YES.
    • Currently using git notes is opt in (you must use the command line option); eventually it will be default and perhaps not even able to be opted out of. At the conclusion of the whole project, it would be great if we didn’t even need any all.T files at all.
    • TEST_ENV currently defaults to the string “local” if not set by the user. The CI builders will build with a TEST_ENV=”my special environment” flag.
  • After collecting the information, if USE_GIT_NOTES is set, it’ll automatically be loaded into git notes for that commit.
  • Similarly, data from git notes can now be loaded into the test driver.
  • Finally, I’ve started working on some tooling to start comparing performance metrics across commits.

This progress corresponds to, in my proposal, basically completing phase one. What remains now is still quite a bit, but the first big hurdle is done. The comparisons done right now in the test driver have basically not been modified much in any way; it was right around the point where I finished adding stuff into git notes that I realized we’re going to need a separate tool for sane performance metric comparison; so, I’ve done some preliminary work on that. Unfortunately, git notes are a bit tricky and not very ergonomic to work with so I’ll be working on making that as seamless as possible as well; ideally, nobody will ever need to touch them or even know they exist. Technically, at this point, the ticket for this project could probably be closed as
“successfully completed”, but it’d be a shame to do so without developing the tooling to help make using the performance test-suite as painless as possible.

What now?

At first, I was thinking that the next logical step towards working on the tooling and the project was going to be working on things like: Developing tooling to make sure that the right information is added to git notes, that the correct information is maintained and propagated into the future, into different branches, etc.,; and the auto builders will need some additional configuration in order to handle the git notes (git notes do not work very seamlessly). But then, with Ben’s help, I fleshed out the workflows a bit and realized that none of that really needs to happen if I just rip all of the performance metric comparison out of the test-driver. After that, things fell into place a bit more and the new plan is:

  • Create a “library” to import into the test-driver. It’ll contain a few tools:
    • Boolean “pass/fail” function to compare whether a test varies beyond a certain variance % between two commits (generally HEAD and HEAD~1).
    • Helper utilities to load up git notes, store git notes, etc.
    • Other things as I think of them or the need presents itself.
  • Flesh out the current proof of concept comparison utility:
    • The table output is currently useless for actually ‘comparing.’ I need to fix that.
    • While I’m at it, I need to tag each bit of data I grab with the commit it came from so that I can organize the data from oldest commit to newest commit and then group it by test.
    • Nifty features here and there as I figure out new ways to make people’s lives easier.

If you have any questions, feel free to ask them on reddit where this will be cross-posted, or on #haskell (my nick is jared-w), or feel free to stalk my facebook and send super creepy messages at 3am (maybe don’t do that).

Haskell Summer of Code

Hey there! I’m Jared and I’m proud to announce that I’ll be working with the Haskell Summer of Code project; specifically, on the project “Bringing Sanity to the GHC Performance Test-suite.” It was a bit of a roller-coaster getting to this point but I’m quite excited to get cracking. Be sure to check out the Reddit discussion if you haven’t seen it yet.

For those who want the nitty-gritty details on the project and wish to read my exact proposal, I’ve put it on my GitHub and you can view it here. For those who want a less technical introduction, here’s a bit of introduction about the project itself, the languages I’ll be working in, and what I aim to do:

Quick Intro

First off, GHC is the compiler of the programming language Haskell. It’s pretty neat and does lots of cool stuff; unfortunately, one of the things it isn’t super great at right now is running performance tests. The performance tests do two things: first, they make sure that when people change the compiler, they don’t accidentally make it slower; secondly, they make sure that when people change the compiler, they don’t accidentally make the generated code less efficient. Right now, the performance test situation requires manually generating “target numbers,” which are the same for every machine and operating system–this renders them mostly useless unless you submit your patch to the build server that has a standardized testing area.

Pain… Everywhere…

This is a giant pain in the neck when you’re just trying to hack on the compiler yourself and get something implemented. Worse yet, because Windows, MacOS, and Linux typically all have slightly different characteristics, the performance number has to have a wide enough error tolerance to account for the operating system differences; this means that you can introduce, say, a 10% performance regression and have the tests pass because that error margin was needed to account for OS differences. Over time, those hidden regressions can pile up if they’re not diligently caught. So now, in order to really contribute meaningful code to the compiler, a contributor has to:

  1. Constantly push code to the build server or get an intuitive feel for how much to “mentally correct” the performance numbers when running tests on their personal computer.
  2. Either test constantly on all the operating systems themselves or push to the build server and suffer a massive slowdown in a feedback-loop.
  3. Do tons of repeat testing to figure out whether or not the performance numbers are actually improved, constant, regressed, etc., taking into account the OS differences and the very wide tolerances on numbers set in place to account for those OS differences. A pass/failure indication of a test becomes almost meaningless; only experience tells you how to proceed.
  4. Any new test or new feature that requires a test must be manually given performance numbers. The test must be run a ton of times to figure out how much error tolerance to give the number to make the test “accurate” on all of the operating systems GHC supports.
  5. And so on…

That’s an impressive amount of busy-work and nonsense that a programmer has to keep up with just to not make the compiler suck. We’re not even getting to actually improving the compiler! Clearly, this is a situation that needs to change.

Future Bliss

The goal, at the end of this, is to have a much improved situation all around.

  • Get rid of performance metric numbers entirely from the Test-suite. Instead, they’ll be added into something called a “git note” and will be able to be added/changed per operating system and over time. The goal is to always trust a passing performance test.
  • Automate the entire process of adding performance numbers to tests, managing performance numbers, and provide tools to make things as painless as possible.
  • Allow for on-the-fly numbers generated for any particular machine. That way, developers hacking around on their own computer have a useful set of tests to run without needing to push to a remote build. The goal is to have as fast, painless, and accurate of a feedback-loop as possible.

How are we gonna achieve all of this? I’m glad you asked! The first goal is going to be to introduce a new test modifier to the testsuite driver (a python program that runs the tests). This will dump collected measurements into a git note. At this point, I won’t be messing with any of the program’s logic itself yet except for potentially starting to refactor the performance metrics out of the test-driver. After that, a few tests will be moved over to use the new test modifier and make sure that everything’s working; things will still be manual, but when everything works, all that remains is to automate the git note population and usage–then the most basic and essential goals of the project will be done.

Insert Hero

The plucky hero (me) and his brave companion will adventure forth to fix this and bring sanity to the GHC performance test-suite! Will they succeed? Will they fail? Find out on this blog! It’ll be covering the ups and downs of the journey towards sanity.  (And as a note to the technical users: I will be committing to a git branch in the GHC repo: wip/perf-testsuite. You’ll be able to see progress there, too.)