this post was submitted on 21 Dec 2023
270 points (97.2% liked)
Fediverse
28490 readers
602 users here now
A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).
If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!
Rules
- Posts must be on topic.
- Be respectful of others.
- Cite the sources used for graphs and other statistics.
- Follow the general Lemmy.world rules.
Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I love the idea, especially from a technical standpoint!
How big is the fediverse today? How many posts are there? What kind of algorithms atmre you using to store the results? Do you scan sites and then their connected sites or do you have a premade list?
More technical information please ๐!
The fediverse is a few thousand servers, from Mastodon, Lemmy, etc. Can't say the amount of posts but there are a lot.
So on the more technical side, I plan on using a light weight fast search engine called Sonic (It's written in rust). I have already used it in other projects and it can handle billions of messages / posts. But it has a cost it doesn't have faceted search, like for example if you want to exclude certain texts from the results. I think this is a fair trade off. The other solution would be to use something more mature like ElasticSearch but it'll be expensive (I'm assuming not much money will be made from this and I'm talking about donations)
For scanning sites there are premade lists to start with and it'll be possible to scan new sites from other instances if found. So a bit of both.