this post was submitted on 18 Nov 2024
52 points (94.8% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

54669 readers
572 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

founded 1 year ago
MODERATORS
 

I have been trying for hours to figure this out. From a building tutorial to just trying to find prebuilt ones, I can't seem to make it click.

For context I am trying to scrape books myself that I can't seem to find elsewhere so I can use and post them for others.

The scraper tutorial

Hackernoon tutorial by Ethan Jarell

I initially tried to follow this but I kept having a "couldn't find module" error. Since I have never touched python prior to this, I am unaware how to fix this and the help links are not exactly helpful. If there's someone who could guide me through this tutorial that would be great.

Selenium

Selenium Homepage

I don't really get what this is but I think its some sort of python pack and it tells me to download using the pip command but that doesn't seem to work (syntax error). I don't know how to manually add it in because, again, I have little idea of what I'm doing.

Scrapy

Scrapy Homepage

This one seemed like it'd be an out-of-box deal but not only does it need the pip command to download but it has like 5 other dependencies it needs to function which complicates it more for me.

I am not criticizing these wares, I am just asking for help and if someone could help with the simplification of it all or maybe even point me to an easier method that would be amazing!


Updates

  • Figured out that I am supposed to run the command for pip in the command prompt thing on my computer, not the python runner. py -m followed by the pip request

  • Got the Ethan Jarrell tutorial to work and managed to add in selenium, which made me realize that selenium isn't really helpful with the project. rip xP

  • Spent a bunch of time trying to workshop the basic scraper to work with dynamic sites, unsuccessful

  • Online self-help doesn't go in as much as I would like, probably due to the legal grey area


you are viewing a single comment's thread
view the rest of the comments
[–] chicken@lemmy.dbzer0.com 6 points 2 days ago (6 children)

The reason to use Selenium is if the website you want to scrape uses javascript in a way that inhibits getting content without a full browser environment. BeautifulSoup is just a parser, it can't solve that problem.

[–] MaggotInfested@lemmy.dbzer0.com 1 points 2 days ago (2 children)

This was the original plan but it doesn't work as well for this on 'dynamic' websites

[–] chicken@lemmy.dbzer0.com 1 points 2 days ago (1 children)

IIRC it should be able to be made to work since it does everything a browser does, found this search result, though it has been a while since I used it myself at all. Another thing you might try that has worked for me is iMacros, that's a little simpler and more basic than Selenium but should work for what you say you want to do.

I test with IDLE for python + use selenium for driver directory (geckodrive)

load more comments (3 replies)