this post was submitted on 17 Dec 2024
46 points (83.8% liked)

Technology

59963 readers
3471 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] ivanafterall@lemmy.world 10 points 16 hours ago* (last edited 16 hours ago)

I don't have a specific figure for you. My use-case is I'm trying to write a non-fiction book. I've got a ton of old newspaper articles in PDF format. The Library of Congress' built-in OCR is very helpful, but very lacking and, in some cases, can miss large swaths of pages or generate really unhelpful gibberish that requires painful cleaning. I've had similar results from every other OCR tool I've tried.

Thus far, in using Claude/ChatGPT for transcription of a few dozen articles, I've only had to fix one individual stray word a few times. It's been very close to perfect in my limited testing. High 90%. Impressively, with old newspaper articles where words have worn away or are otherwise very hard to make out even for me, it has done a great job of inferring/recognizing, where OCR would start generating gibberish. I haven't tried hand-writing and suspect that's a different beast, but I know there are tools that have cropped up to that end.