this post was submitted on 08 Jan 2024
54 points (96.6% liked)
Fediverse
28528 readers
418 users here now
A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).
If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!
Rules
- Posts must be on topic.
- Be respectful of others.
- Cite the sources used for graphs and other statistics.
- Follow the general Lemmy.world rules.
Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I'd rather have fewer bots on Lemmy, but from an implementation pov I wonder whether a pub-sub interface could keep up better with fast updates. Do webhooks make a new outgoing tls connection on every event?
Pub-sub might work for some use cases, but it wouldn't work at all for mine. I host my bots on AWS Lambda so I don't pay for anything, unless the code is actually running. So the webhook essentially wakes the virtual machine up and after processing is done, it goes back to sleep.
Yeah, they make a new ongoing tls connection on every https webhook. Which doesn't necessarily mean all db events, there's quite powerful filtering available and everyone should use it, sending a ping for db events you don't need to seems quite wasteful.
If we maintain the fantasy (and we may as well) of Lemmy someday overtaking Reddit, that can mean 100s of new posts per second that bots might want to inspect. So that's quite a lot of vm restarts as well as load on the side sending out the webhook queries. I guess this stuff will have been redesigned a few more times by then though, so it is ok. Lemmy at the moment isn't ready for such volume for many other reasons too.
Well, it stays warmed up some 15 seconds or so, but the important part is you don't pay for that uptime. And if my bots ever get to 100s of requests per second, I'm gonna have to shut them down, I'm not that rich.
It shouldn't be hard to handle 100s of requests per sec on a small vm. Where does your server side (the part listening to postgres events) run anyway?
I'm thinking of e.g. that stupid reddit bot that responds if all the words in someone's else's post are in alphabetical order. That isn't the type of filter you'd normally offer in a webhook API, so the bot has to listen to the "fire hose". But its outbound traffic won't be too large.
From a privacy standpoint I'd also consider a firehose feed preferable to a filtered one. Like if I want to count how many posts per day mention Taylor Swift, I might not want to reveal that interest. So I have to take in an unfiltered feed and do the counting in my own client.
There is a whole CS topic called Private Information Retrieval (PIR) that revolves around this idea, fwiw. The Wikipedia article about it is ok.
It needs direct access to the db, so in my case it's on the same vm as my instance.
It theoretically could be done in the webhook filter, it's a full (but limited) language, I'd just need to add support for some functions.
Those are not really webhooks for public use but for the instance admins, so filtering by posts mentioning Taylor Swift should be more than enough. But yeah, you can just send everything to your bot if it can handle that.
Oh I see, thanks. I was imagining this being used by bot authors who don't want to run actual lemmy instances. This makes sense now, given that you want to run your bots on Lambda. I'd just run them on a VM but that's just me. Cloudflare Workers seems like another possibility.
It's not only for the Lambda use case, my main motivation was AutoMods - they are very resource intensive currently and need to run very often. What you needed to do until this package, was traversing all new posts and all comments in there to check whether they're newer than the last post / comment you've moderated. Which is a lot of api requests every minute or two, you're essentially DDOSing yourself. With this, your AutoMod receives the information that a new comment was created and you can fetch the comment in a single (relatively inexpensive) api request, instead of plethora of requests which are all fairly expensive.
Whether the webhooks feature is then exposed to other users is really up to each instance admins, I'm thinking of exposing the functionality for my instance's users when I finish implementing all I have envisioned.
Of course bot authors can add support for webhook triggering which means the admins can then use it more effectively.
I see, that is a good application. How did Reddit deal with this issue? Lemmy does lots of dumb things by comparison it seems to me. I'd be surprised if reddit required constant polling from automods.
No idea, I didn't do any api development with Reddit, it felt way too oversaturated already. But event subscriptions are a common thing for such use-cases, so my guess would be actually very similar to what I have created here with this package.
Fair enough, but it sounds like the subscription feature should really be built into Lemmy's API. That is, your package is a useful workaround for a shortcoming in Lemmy. It's probably worth fixing Lemmy directly