this post was submitted on 12 Aug 2024
103 points (96.4% liked)
Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
54716 readers
253 users here now
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.
Rules • Full Version
1. Posts must be related to the discussion of digital piracy
2. Don't request invites, trade, sell, or self-promote
3. Don't request or link to specific pirated titles, including DMs
4. Don't submit low-quality posts, be entitled, or harass others
Loot, Pillage, & Plunder
📜 c/Piracy Wiki (Community Edition):
💰 Please help cover server costs.
Ko-fi | Liberapay |
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I think this would make a good -arr application.
Ingest podcast feeds, crowdsource hashes of whole and partial sections of the downloaded audio, which should be a good start to auto-tag dynamically inserted ads.
For non-dynamic ads, provide an interface to manually identify their start/end, and publish for others. The same interface could be used to add chapters and other metadata.
Then you’d just point your podcast app to an RSS feed you self host.
I propose Listenarr, unless this has already been taken.
Alternatively what you're describing sounds like SponsorBlock but for podcasts. You probably wouldn't have to rehost the actual audio files to accomplish this, just have a podcast client/addon that allows user submissions for ad segments and a database somewhere that can host the metadata for ad breaks.
Biggest issue is probably that you're probably building or forking an existing podcast app to do it, and some podcasts dynamically insert ads so it's possible that peoples downloaded files could have different ad segments/times.
I thought I explained how to handle the dynamically inserted ads, but I’ll elaborate a little here.
If your Listenarr instance is part of a broader network of other instances, they’ll all potentially receive a unique file with different ads inserted, but they’ll typically be inserted at the same cut location in the program timeline. Listenarr would calculate the hash of the entire file, but also sub spans of various lengths.
If the hash of the full file is the same among instances, you know everyone is getting the same file, and any time references suggested for metadata will apply to everyone.
If the full file hash is different, Listenarr starts slicing it up and generating hashes of subsections to help identify where common and variant sections are. Common sections will usually be the actual content, variants are likely tailored ads. The broader the Listenarr network, the greater the sample size for hashes, which will help automate identification. In fact, the more granular and specific the targeting of inserted ads, the easier it will be to identify them.
Once you have the file sections sufficiently hashed, tagged, and identified, you can easily stitch together a sanitised media stream into a file any podcast app can ingest.
You could shove this function into a podcast player, but then you’d need to replicate all the existing permutations of player applications.
The beauty of the current podcast environment is it’s just RSS feeds that point to audio files in a standard way. This permits handling by a shim proxy in the middle of the transaction between the publisher and the player.
This could also be a way to better incorporate media into the fediverse. One example is the chapters and transcripts generated could be directly referenced in Lemmy and Mastodon posts.
I've been looking into sponsorblock-ml for an alternative approach