Category: Blog

Halfway There, or “I want to say I know what I’m doing, but it’s a dangerous thing to say”

Greetings and salutations! If you’re reading this, it means I haven’t died yet, which is pretty great. So, as my last blog post mentioned, I’ve been spending this summer working on a Haskell Summer of Code project: Bringing Sanity to the GHC Performance Test-Suite. My last blog post also mentioned that I would hopefully be having frequent blog updates, but let’s not talk about that right now. Far more exciting, here, is the potential for me to be able to write more frequently now that I have a proper handle on things!

It turns out that it’s exceedingly hard to write interesting content when you spend the majority of your time in front of a keyboard going “wat” and “I hope this doesn’t blow something up.” However, once you start to get past that stage, stare into the abyss of insanity and not go insane, yadda yadda, life becomes more fun. In other words, I spent the first few weeks of the project being depressingly unproductive before, suddenly, a beam of light descended from the heavens and kissed me with the gift of “sorta-kinda understand stuff.” After that, things have been going increasingly swimmingly and I’m sure my mentor, Ben, appreciates my more coherent questions.

Without further ado, the current state of the union

  • There now exists functionality in the code to grab all of the performance metrics from the performance tests and collect them in a python list, where each element is a string of the form: (test_env , name , way , field , val)
  • There now exists infrastructure in the code to support running only performance tests. It’s a little nicety that helped me understand the codebase better and should hopefully prove to be occasionally useful to people who need it.
  • Infrastructure is now setup so that, from the commandline, one can use the new options: TEST_ENV=”my environment” , USE_GIT_NOTES=YES, ONLY_PERF_TESTS=YES.
    • Currently using git notes is opt in (you must use the command line option); eventually it will be default and perhaps not even able to be opted out of. At the conclusion of the whole project, it would be great if we didn’t even need any all.T files at all.
    • TEST_ENV currently defaults to the string “local” if not set by the user. The CI builders will build with a TEST_ENV=”my special environment” flag.
  • After collecting the information, if USE_GIT_NOTES is set, it’ll automatically be loaded into git notes for that commit.
  • Similarly, data from git notes can now be loaded into the test driver.
  • Finally, I’ve started working on some tooling to start comparing performance metrics across commits.

This progress corresponds to, in my proposal, basically completing phase one. What remains now is still quite a bit, but the first big hurdle is done. The comparisons done right now in the test driver have basically not been modified much in any way; it was right around the point where I finished adding stuff into git notes that I realized we’re going to need a separate tool for sane performance metric comparison; so, I’ve done some preliminary work on that. Unfortunately, git notes are a bit tricky and not very ergonomic to work with so I’ll be working on making that as seamless as possible as well; ideally, nobody will ever need to touch them or even know they exist. Technically, at this point, the ticket for this project could probably be closed as
“successfully completed”, but it’d be a shame to do so without developing the tooling to help make using the performance test-suite as painless as possible.

What now?

At first, I was thinking that the next logical step towards working on the tooling and the project was going to be working on things like: Developing tooling to make sure that the right information is added to git notes, that the correct information is maintained and propagated into the future, into different branches, etc.,; and the auto builders will need some additional configuration in order to handle the git notes (git notes do not work very seamlessly). But then, with Ben’s help, I fleshed out the workflows a bit and realized that none of that really needs to happen if I just rip all of the performance metric comparison out of the test-driver. After that, things fell into place a bit more and the new plan is:

  • Create a “library” to import into the test-driver. It’ll contain a few tools:
    • Boolean “pass/fail” function to compare whether a test varies beyond a certain variance % between two commits (generally HEAD and HEAD~1).
    • Helper utilities to load up git notes, store git notes, etc.
    • Other things as I think of them or the need presents itself.
  • Flesh out the current proof of concept comparison utility:
    • The table output is currently useless for actually ‘comparing.’ I need to fix that.
    • While I’m at it, I need to tag each bit of data I grab with the commit it came from so that I can organize the data from oldest commit to newest commit and then group it by test.
    • Nifty features here and there as I figure out new ways to make people’s lives easier.

If you have any questions, feel free to ask them on reddit where this will be cross-posted, or on #haskell (my nick is jared-w), or feel free to stalk my facebook and send super creepy messages at 3am (maybe don’t do that).

Haskell Summer of Code

Hey there! I’m Jared and I’m proud to announce that I’ll be working with the Haskell Summer of Code project; specifically, on the project “Bringing Sanity to the GHC Performance Test-suite.” It was a bit of a roller-coaster getting to this point but I’m quite excited to get cracking.

For those who want the nitty-gritty details on the project and wish to read my exact proposal, I’ve put it on my GitHub and you can view it here. For those who want a less technical introduction, here’s a bit of introduction about the project itself, the languages I’ll be working in, and what I aim to do:

Quick Intro

First off, GHC is the compiler of the programming language Haskell. It’s pretty neat and does lots of cool stuff; unfortunately, one of the things it isn’t super great at right now is running performance tests. The performance tests do two things: first, they make sure that when people change the compiler, they don’t accidentally make it slower; secondly, they make sure that when people change the compiler, they don’t accidentally make the generated code less efficient. Right now, the performance test situation requires manually generating “target numbers,” which are the same for every machine and operating system–this renders them mostly useless unless you submit your patch to the build server that has a standardized testing area.

Pain… Everywhere…

This is a giant pain in the neck when you’re just trying to hack on the compiler yourself and get something implemented. Worse yet, because Windows, MacOS, and Linux typically all have slightly different characteristics, the performance number has to have a wide enough error tolerance to account for the operating system differences; this means that you can introduce, say, a 10% performance regression and have the tests pass because that error margin was needed to account for OS differences. Over time, those hidden regressions can pile up if they’re not diligently caught. So now, in order to really contribute meaningful code to the compiler, a contributor has to:

  1. Constantly push code to the build server or get an intuitive feel for how much to “mentally correct” the performance numbers when running tests on their personal computer.
  2. Either test constantly on all the operating systems themselves or push to the build server and suffer a massive slowdown in a feedback-loop.
  3. Do tons of repeat testing to figure out whether or not the performance numbers are actually improved, constant, regressed, etc., taking into account the OS differences and the very wide tolerances on numbers set in place to account for those OS differences. A pass/failure indication of a test becomes almost meaningless; only experience tells you how to proceed.
  4. Any new test or new feature that requires a test must be manually given performance numbers. The test must be run a ton of times to figure out how much error tolerance to give the number to make the test “accurate” on all of the operating systems GHC supports.
  5. And so on…

That’s an impressive amount of busy-work and nonsense that a programmer has to keep up with just to not make the compiler suck. We’re not even getting to actually improving the compiler! Clearly, this is a situation that needs to change.

Future Bliss

The goal, at the end of this, is to have a much improved situation all around.

  • Get rid of performance metric numbers entirely from the Test-suite. Instead, they’ll be added into something called a “git note” and will be able to be added/changed per operating system and over time. The goal is to always trust a passing performance test.
  • Automate the entire process of adding performance numbers to tests, managing performance numbers, and provide tools to make things as painless as possible.
  • Allow for on-the-fly numbers generated for any particular machine. That way, developers hacking around on their own computer have a useful set of tests to run without needing to push to a remote build. The goal is to have as fast, painless, and accurate of a feedback-loop as possible.

How are we gonna achieve all of this? I’m glad you asked! The first goal is going to be to introduce a new test modifier to the testsuite driver (a python program that runs the tests). This will dump collected measurements into a git note. At this point, I won’t be messing with any of the program’s logic itself yet except for potentially starting to refactor the performance metrics out of the test-driver. After that, a few tests will be moved over to use the new test modifier and make sure that everything’s working; things will still be manual, but when everything works, all that remains is to automate the git note population and usage–then the most basic and essential goals of the project will be done.

Insert Hero

The plucky hero (me) and his brave companion will adventure forth to fix this and bring sanity to the GHC performance test-suite! Will they succeed? Will they fail? Find out on this blog! It’ll be covering the ups and downs of the journey towards sanity.  (And as a note to the technical users: I will be committing to a git branch in the GHC repo: wip/perf-testsuite. You’ll be able to see progress there, too.)