Outage debriefing
posted by simon on Wednesday, January 16, 2008
I promised to write a little on our outage earlier this month. In short what happened was that our scripts suddenly stopped working. They would crash when reading a CGI parameter. The @params var in CGI was being set to an empty hash instead of an object with a special [] method (see here under Parameters. We suspect that Dreamhost had perhaps updated something or modified some Apache configuration to cause this sudden change. So I made a little "hello world" test page to attempt to recreate the problem. To my surprise the test page worked fine. After a lengthy period of making the real page successively more identical to the test page while swearing repeatedly I found that removing require 'haml' made the problem go away. Now Haml is a nifty little templating engine that I was using to render our html. (Actually I'd had a conversation with Daniel on the plane just days earlier about how I wanted to stop using Haml since it's benefits were minimal and it was one more place where things could go wrong. So I was right about that). Luckily it wasn't hard to revert back to the plain old html templates that I was using before installing Haml.
In defence of Haml, we were using 1.5 and the latest version was 1.7 at the time (now 1.8). Perhaps the bug (if it is a Haml bug) is fixed in a new release. But at this point I don't care. I still believe Dreamhost must have done some kind of update to something ruby related for this to have suddenly happened like it did, though their (excellent) support team were not able to verify this.
Thanks for the many messages of good-will and support we received via email, twitter and this blog. Surprisingly not a single negative message arrived during the outage. Which perversely makes me want to encourage some negative feedback. Come on guys, we can take it. :)
In defence of Haml, we were using 1.5 and the latest version was 1.7 at the time (now 1.8). Perhaps the bug (if it is a Haml bug) is fixed in a new release. But at this point I don't care. I still believe Dreamhost must have done some kind of update to something ruby related for this to have suddenly happened like it did, though their (excellent) support team were not able to verify this.
Thanks for the many messages of good-will and support we received via email, twitter and this blog. Surprisingly not a single negative message arrived during the outage. Which perversely makes me want to encourage some negative feedback. Come on guys, we can take it. :)