Well, gang, thanks for letting me soft-launch surveys last week. The results are in, and I promise to play by the rules: this week, a technical deep dive and next week a war story.
So, a bit of a technical dive, courtesy of Matt:
have you written about using multiple ESPs before? are you doing some kind of round-robin? splitting per account? one primary and the others as fallback for downtime?
β matt swanson π (@_swanson) February 20, 2023
When Buttondown launched, it was heavily oriented around a single email service provider β Mailgun. using (and still uses) django-anymail on the backend.
(You might be tempted to ask “why Mailgun”, to which I have a boring and simple answer β it had a free tier and my day job was using it.)
Because I was using django-anymail
, there wasn’t a lot of technical build-out work required. The most consequential action I really made in those early days was to poorly name my email events database (where click events, open events, and so on are stored) mailgun_events
. (A name that has stuck around, because the costs and risks of renaming a table [and a Django app] so aggressively outweigh the rewards.)
And life on Mailgun was good, for the most part. I sent over a lot of volume, and their pricing was for a while the highest single cost I incurred, but there weren’t that many issues. Their support staff was responsive; the performance wasn’t amazing, but it was also not so bad that I had to actively think about moving. It was, in many ways, the classic SaaS land-and-expand relationship: they got me with their free tier and then I stuck around because the cost of switching was never quite worth it.
Some architectural changes are insisted upon by a single mediating force: maybe you have a bad incident and the obvious remediation is to rebuild a key component, maybe you have a P0 feature or customer ask for which a new architecture is required.
Most, though, is the opposite: a number of slow factors gradually pushing roi_on_rearchitecture
from negative to neutral to positive until you’ve got a sufficiently high level of confidence that the new architecture should be built. This was the case with starting to build out support for multiple ESPs:
All of these things and more congealed into an obvious next outcome I wanted to have: equal sending parity between AWS, Postmark, and Mailgun, with the following set of goals:
The actual implementation of this was mostly 90% boring thanks to django-anymail
sanding off a lot of the edges in API differences between the three: you plug in all the API keys, store a field like Newsletter.delivery_provider
, and you’re good to go.
That remaining 10%, though…
django-anymail
does not provide a unified interface for the long tail of escape-hatch options for each ESP (“disable click tracking for just this email”, “add these tags”, “add a list-unsubscribe header” β that kind of thing). This means I have a terrible but functional prepare_email
method that takes a RenderedEmail
and a delivery_provider
and sets all the relevant bits.All of this was built out with an eye towards second systems syndrome; new functionality was slowly grafted onto the old system until it was production ready.
The final product is acceptable. I can switch off all traffic from a single ESP in a click of a button (albeit with some disruption to custom domain senders); that’s the really important part. There are lots of rough edges, but most of those rough edges come from bad object-level code (my ways of verifying custom domains, for instance, is just janky for no interesting reasons besides “the code is brittle and bad”) rather than from failures of strategy.
If I were to do it all over again, I’d do the same thing that I’ve done with rewrites of previous systems: promote stateful items (domains, providers) to top-level models and separate the concerns of tracking & updating state from the concerns of operating & reacting to that state.
Thanks, Matt, for asking! Next week: “how to roll a new S3 bucket on Christmas Eve”