• lazynooblet
    link
    fedilink
    English
    2517 days ago

    How is blocking scrapers easy?

    This instance receives 500+ IPs with differing user agents all connecting at once but keeping within rate limits by distribution of bots.

    The only way I know it’s a scraper is if they do something dumb like using “google.com” as the referrer for every request or by eyeballing the logs and noticing multiple entries from the same /12.

    • @rumba@lemmy.zip
      link
      fedilink
      English
      717 days ago

      Exactly this, you can only stop scrapers that play by the rules.

      Each one of those books powering GPT had like protection on them already.