April 5, 2021

should

                April 6, 2021

            April 5, 2021

            Pay-what-you-want is live! There are still some fast-follows that I want to add, as alluded to in the blog post, but overall I’m satisfied with things. (In the...48 hours since public launch, I’m sitting a little shy of $5,000 being committed through PWYW, which is higher than I expected!)
It is always tricky to know when to call it a day here. I could have spent this week working on fast-follows, and the fact that so many of them have been explicitly called out in questions I’ve received post-launch is a sign that maybe I should have spent the extra day documenting upsell CTAs or adding more details to the post-upsell email. All of those tasks feel granular and snackable, though: they don’t require the dedicated level of flow and focus that actually hunkering down and shipping PWYW did.
I decided to celebrate with a bit of dessert work: rebuilding the checker architecture. My checker architecture as it stands is grim: I have a cron that runs twenty different methods simultaneously and emails/pages me if any have interesting failures. This falls over for the obvious reasons: some checks are flaky, uncaught exceptions will trump any subsequent legitimate failures, etc. So I embarked on some work to make it better: setting up the obvious architecture and data models and shifting it to something more decentralized.
One interesting bit I’m grappling with here: I’m explicitly building this as a separate app with the goal of being able to open-source it down the line. I don’t have a great grip on how to structure Django apps, but I do love that it acts as a forcing function to make you think about what your interfaces and external surfaces should be. I settled on this decorator-registry pattern that I’m quite fond of:
@register_checker
def no_bad_things_are_happening_checker():
  if bad_things_are_happening:
    yield CheckerFailure(text="Oh no! Bad things are happening!")

There are still some unsolved bits (and thankfully, they don’t need to be solved to launch; they just need to be solved to open-source):

How do you register actions to take when a checker fails? I have a very kludgy reactions.py that proxies calls to my Slack plugin and mail_admins, but that doesn’t exactly scale.
How do you solve autodiscovery without jankily recursively importing submodules? (I really don’t want to have to open-source my import_submodules implementation...)

Lastly, though, I need to drop everything and fix some scaling issues. I had two incidents in the past seven days: both are minor incidents in the scale of things (anything that is sub-ten-minute and only requires scaling up is a minor incident in my book), but it’s still bad.
I might eat my words here, but I think it’s a pretty solved problem. My webhooks consumption infrastructure is in the same codebase as everything else: I should be able to just set up a second production app in Buttondown which doesn’t take ‘actual’ traffic and just consumes webhooks, and not even change that much of my Heroku infrastructure. (This is something I want to do long-term anyway: the frontend-application, admin interface, externally-facing API, and webhook consumers would ideally be independent apps sourced from the same codebase.) Much like other performance issues, I think this will either be an hour of work or a hundred hours of work and I am praying it is the former.

                Don't miss what's next. Subscribe to Weeknotes from Buttondown: