March 14, 2021
Last week, I wrote:
There is some bug bashing that needs to take place, but I think finally tackling multiple-newsletter creation in a less janky way than “log out, create a new newsletter, and paste that API key in” feels like a good candidate. (And no, of course that’s not because I received two support emails about it this morning. Why would you even ask that?)
And I appear to be in the position now for the second week in a row that a chunk of work I pegged at being a solid week’s worth of work ended up being about an hour’s. When I thought about it in the abstract, “newsletter creation from the perspective of an already-registered user” felt hairy and complex and prone to lots of yak-shaving. When I actually sat down to work on it and start figuring out the data modeling, it turns out it’s quite simple: newsletters need a name and a username to be created, and that’s it. The rest can be handled just by the existing settings interfaces (which, don’t get me wrong, need some TLC, but that’s an orthogonal problem) — there’s no point recreating all of those form inputs just to have them in a multi-stage modal or whatever.
This seems to be something of a recurring theme for me. I have a good sense of things that are Definitely Big Projects (repricing; relaunching the marketing site) and things that are Definitely Small Tasks (tweak administrative panel to prioritize exact matches on string search; optimize ORM usage for BulkSendReminderView
and add pinning test) but the messy in-between is where I can get into trouble. This is symbolized (and likely exacerbated) by how I keep track of tasks: I explicitly have. “projects” list and a “tasks” list, and the stuff that is in the middle is hard to scope and slot into my schedule.
On the “not sure quite how to scope this” side of things: my email analytics pipeline is officially reaching the point where it’s timing out for larger users (anyone who’s in the 20K+ subscriber range, say.) Calling it a “pipeline” is incorrect, and that’s arguably the problem: it’s just a REST endpoint that does a bunch of inline computations based on events. This is sort of the classic scaling problem, and I’m torn between a few different directions:
- I could compute this offline (say, 24 hours after the email is sent) but then I need to model out that data and also deal with things like events after the 24 window.
- I could try and drop down to raw SQL (instead of just interacting with the ORM) and hope to solve the entire thing with performance improvements.
- I could break up the database and move events to something like Pinot that is better-suited to my query patterns, but then I’m taking on a lot of architectural overhead.
It’s hard to commit to any of these directions when I feel like the analytics/querying story is still...not yet solved, from a design perspective. There’s a lot of filtering & querying I think needs to be done (analytics by tag! analytics by domain! filtering by metadata!) that suggests a better abstraction layer than my current hodgepodge of views and calculations. The more I think about it, the more “filtered event stream” seems correct: given a date range and a bunch of filters, what are all of the things that happened during that time (and presenting/aggregating them accordingly)? But that’s vague, and more of a computational answer to the question than a presentational one.
Anyway, I think this will be a charcuterie-board week of work: some bug fixes, some performance improvements, a bunch of legitimate but granular commits. And, on the margins, I will be thinking about what a better analytics engine looks like.