February 2, 2022
Hello hello! This past week was a quiet one: I mentioned that I was headed to the cabin with some friends, and cabin time was unproductive — which is to say it was time well spent. This week has been productive, but it's also been start-of-the-month-productive: there's been a lot of invoices being sent for processes that I've been a little too lazy to automate, there have been tweet threads to write and various marketing efforts to kick off, that kind of thing. (Today is perhaps emblematic of the entire week: four hours spent, which is pretty high, but two of those on operations, one on marketing, and one on actual development efforts.)
Two 'big' things on my mind:
First, I was lucky enough to score some time with Brian Lovin before he wised up and raised his prices. If you are interested enough in Buttondown to read this ol' newsletter every week, I couldn't recommend enough reading through his design critique — I was furiously nodding along through the entire thing, and have a slew of things to fix (both big and small) thanks to what he surfaced.
One of the things that he really brought to light for me, at a bit of a meta level, is how lost I've gotten in the day-to-day of working on things. I probably spend.... 7%? Of my time thinking about Buttondown as a gestalt, since so much is firefighting and keeping the lights on — which I mean in a positive way, since Buttondown has enjoyed such unalloyed growth, but it still means reaction rather than action. In particular, I think he pushed me in a really good way for some of the design system-y things that I've been thinking about, and the long-ballyhooed scaffolding work I haven't merged yet is... going to remain unmerged. It is (cheerfully, since I didn't share it with Brian) very much in line with where he was going, but I think the right thing to do is to keep it under wraps and keep iterating on it a bit longer.
Second, after bragging for a little bit that Buttondown's performance issues have been mostly laid to rest I... got to spend the past two Wednesday mornings dealing with Redis scaling issues. I've got now three 50K+ newsletters who all schedule their emails to go out at the exact same time every week, and while I think I've escaped the flumes thus far I got to spend some time manually triaging my dear Redis instance. These kind of incidents feel very, uh, schizophrenic:
- Oh my god, how am I not getting alerted for this on the hardware side?
- What happens if I purge all of the SES events? Does SNS have a backup at all?
- Can I madly rolling-restart the workers and hope for the best?
- Ooh, five of them latched on and started draining the queue.
- Ah, shit, they just added more events and now it's even worse.
- Okay, let's get rid of the checker jobs and the notification jobs. That'll stem the bleeding, right?
- Hmm, apparently not.
- Let me just throw another $50 at the Heroku instance and pray that upgrades are on-line and won't purge my entire queue.
- [sigh of relief]
Everything netted out okay: I wasn't sending emails for 45 minutes, which was bad but not SLA-breaching bad. It is funny to think of how far things have progressed in terms of sensitivity: this time in 2018, I maybe would have noticed if Buttondown had gone four hours without sending emails and now I get paged if there's been five minutes without an event.