May 21, 2019, 1 a.m.
I had a rough whoopsie last week:
Ugh, pushed a change that broke login. Fix is going live right now.
— Buttondown (@buttondown) May 16, 2019
This is, roughly speaking, my nightmare! Even as I grow more and more rigorous about automatic testing (Buttondown's regression suite is pretty extensive, and reflects thing that I'm most concerned about — email rendering and delivery, because email is a thing that can't be reverted or undone once it's sent) there are holes and embarassing failures.
The genre of this failure is particularly revealing in terms of where some of my sore spots are. Here's what happened, roughly:
I'm lucky in that most outages are eminently fixable: this took me ~two minutes to roll back the latest commit and another ~fifteen to diagnose what the issue is. But there are still a lot of things to learn here:
(As an aside, I can't help but suspect I am still in the 'newbie mode' level of customer support, where everyone is unduly nice to me about these things. I can't pretend that this isn't at least somewhat by design — turns out when you're nice and friendly to your users they respond in kind! — but I am dreading the day where I have legitimately unfriendly interactions.)
Here is the least fun aspect of running Buttondown: malicious users. I have, in the past two years, only run into two bad actors, and last week marked my third.
I cannot emphasize enough how much of a bummer it is to diagnose a user as a villain. All three have been on the same genre (generic name, instantly sign up, register a card, and try spamming ten to twenty thousand emails) which makes it easy to spot, but it's just — spending my time building out automation for the worst possible use case is sort of a morale-killer. I want to be cleaning up some interfaces or making rendering faster, not adding administrative layers around large user imports or adding a site-wide denylist.
(The one thing that is 'fun' is that this is a thing you only really run into on sufficiently mature projects, and divining a strategy for anti-spam from first principles will be a worthwhile exercise!)
Are you familiar with the bus factor? It is a useful (if not morbid) concept:
The "bus factor" is the minimum number of team members that have to suddenly disappear from a project before the project stalls due to lack of knowledgeable or competent personnel.
I have been trying to work the past few months with my eye to a future where a lot of customer-service-esque stuff is shouldered by an assistant. Largely, this means taking common procedures and tasks that exist in two places:
And depositing them in places that are much more non-Justin friendly, like:
What's interesting is that this is a useful refactoring exercise in its own right. A piece of business logic that has to be invoked from Slack or from the command line or from the internal admin needs to be decentralized and relatively stateless; a piece of weird subscriber logic that can't be easily explained or documented without a flow chart should probably be rewritten.
Last week, I had three main goals:
This is the first time in a long time that I can remember biting off exactly as much as I meant to chew, which feels amazing. I'm finding that a structure where I work on a couple small things throughout the week (low-focus work, like documenting errors, making small tweaks, or fixing bugs) and then I have a big sprint of work on the weekend works really well for balancing "brick" work and "mortar" work.
This week, I'm going to go back to the well and pick out three things: