this post was submitted on 21 Dec 2023
270 points (97.2% liked)

Fediverse

28490 readers
602 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 2 years ago
MODERATORS
 

Hey everyone,

This isn't an announcement, just wanted peoples thoughts on this.

I think everyone knows searching the fediverse can be better. Googling doesn't work too well, etc. So I wanted to do my part and help out.

Indexing all posts, etc is quite a lot to handle, so I wanted to start small and just focus on video search. I've started indexing videos from Peertube and other video websites. (Even YouTube but this could be removed to just focus on independent sites)

I know Peertube has their own search engine for videos. I will be reaching out to them. Compared to my site I'm planning it'll have other video sources and be easier to use.

So that leads to feedback from you guys.

  • What do you think about indexing videos posted on the fediverse and other independent platforms?
  • Are there similar services?
  • Am I just wasting my time?
you are viewing a single comment's thread
view the rest of the comments
[โ€“] Valmond@lemmy.mindoki.com 8 points 11 months ago (1 children)

I love the idea, especially from a technical standpoint!

How big is the fediverse today? How many posts are there? What kind of algorithms atmre you using to store the results? Do you scan sites and then their connected sites or do you have a premade list?

More technical information please ๐Ÿ˜Š!

[โ€“] lautan@lemmy.ca 3 points 11 months ago

The fediverse is a few thousand servers, from Mastodon, Lemmy, etc. Can't say the amount of posts but there are a lot.

So on the more technical side, I plan on using a light weight fast search engine called Sonic (It's written in rust). I have already used it in other projects and it can handle billions of messages / posts. But it has a cost it doesn't have faceted search, like for example if you want to exclude certain texts from the results. I think this is a fair trade off. The other solution would be to use something more mature like ElasticSearch but it'll be expensive (I'm assuming not much money will be made from this and I'm talking about donations)

For scanning sites there are premade lists to start with and it'll be possible to scan new sites from other instances if found. So a bit of both.