
Its somewhere between spam and data broker on the scale of people fucking up the internet.
> Its somewhere between spam and data broker on the scale of people fucking up the internet.
Yeah, if the data were public, I wouldn't care as much. I don't care if Shodan nmaps me. But these fucking fuckers, not just scammy but retarded. Like one time FSE slowed down because they were hammering a closed port, they refused to accept that I was not going to open up port 445 and they were sending so many connection attempts that the majority of the bandwidth was those fuckers.
These "people" all use the big 3 tech clouds to host their scrapers. Block their entire ASN and you avoid a lot of grief. Make exceptions if you have to but anyone who hosts their fedi server on azure, aws or gcp is not worth federating with. The few I've seen who I'm blocking this way are your typical shitlib mastodon site or g*rmans. No big loss.
> Block their entire ASN and you avoid a lot of grief.
Yeah, most of the machines I run have a script that I have to maintain; lost a bunch of machines in December, though, and had to bring them back.
The list is sloppy:
censys="162.142.125.0/24 167.248.133.0/24 167.94.138.0/24 167.94.144.0/22 192.35.168.0/23 198.108.204.216/29 199.45.154.0/23 206.168.32.0/22 74.120.14.0/24";
research_esrg_stanford_edu="171.67.70.0/23"
nagra="185.35.62.0/23"
qrator="185.94.108.0/22"
paloaltonetworks="130.41.0.0/16 134.238.0.0/16 137.83.192.0/18 139.180.240.0/20 165.1.128.0/17 165.85.0.0/16 167.94.198.0/24 168.149.240.0/21 198.135.184.0/24 198.235.24.0/24 204.87.186.0/24 205.210.31.0/24 208.127.0.0/16 66.159.192.0/19 66.232.32.0/20 74.221.128.0/20"; # https://rdap.arin.net/registry/entity/PAN-22
comsys="137.226.113.0/26" # http://researchscan.comsys.rwth-aachen.de/
shadowserver="184.105.139.67 184.105.139.68 184.105.139.69 184.105.139.70 216.218.206.66 216.218.206.67 216.218.206.68 216.218.206.69 74.82.47.2 74.82.47.3 74.82.47.4 74.82.47.5 184.105.247.194 184.105.247.195 184.105.247.196 184.105.247.197 65.49.20.66 65.49.20.67 65.49.20.68 65.49.20.69 184.105.247.238" # for i in $(seq 1 99); do n="$(printf scan-%02d.shadowserver.org $i)"; echo -n $n ' '; dig +short $n; done
my_tiny_bot="44.230.252.91 52.25.208.208 100.21.24.205"
fidget_spinner_bot="54.184.159.16 44.231.202.44 50.112.160.3"
botguy="$my_tiny_bot $fidget_spinner_bot"
constant_contact="205.207.104.0/22 208.75.120.0/22 216.21.230.0/24";
spamboxes="$constant_contact"
internet_measurement="87.236.176.0/24 193.163.125.0/24"; # https://internet-measurement.com/
>The few I've seen who I'm blocking this way are your typical shitlib mastodon site or g*rmans.
whats the difference between the two?
Granted he was being a little bitch and trying to get gibs but still.....AWS? christ with these people.
Also yeah fuck the Germans.
> Remember that all journalist masto instance?
There have been many, but I imagine this is the one you mean: https://fedilist.com/instance/journa.host .
(There seem to be a lot: https://fedilist.com/instance?q=journalists&ip=&software=®istrations=&onion= )
On the topic of feds, all of the "NAFO" instances disappeared some time after the election: https://fedilist.com/instance/recent-changes?host=nafo.army,nafo.social,nafo.uk
> The dude was up for less than a week and was asking people to donate to cover his "$3000" a month hosting costs.
Ha, I remember that.
> Also yeah fuck the Germans.
Operation Bent Paperclip
Fuck now I'm all worked up
They like to spoof their user agents to look like an iphone or some other benign device. But if all that user agent does is http GET and never POST, then it's a scaper.
their founder is an israeli unit 8200 alum
> their founder is an israeli unit 8200 alum
FFS, okay, yeah, if that's accurate, then definitely.
On the other hand, he appears to employ a pack of idiots.
>I was not going to open up port 445 and they were sending so many connection attempts that the majority of the bandwidth was those fuckers.
I had an angry moment over Chinese scrapers two weeks ago after I promptly nullrouted half of Huawei Cloud 2 weeks before that. They thought it would be great to switch to Alibaba US and hammer my Gitea instance with requests for every file in most repos and asking for every revision of those files. And in typical Chink when they didn't receive a response in time (obvious when you are sending ~15r/s to a small server), they just closed the connection and tried again in 30 minutes while still scraping other files.
And they have the audacity to use normal browser UAs from a randomized selection of a few making them very hard to block in an easy. Claude on the other hand completely ignores the meta tag and robots.txt, but at least they have "ClaudeBot" in the UA making them trivially blockable in nginx. That said, Claude is also retarded in a different way. They send requests for issues with numbers in the thousands and never stop when literally all of them return a 404.
> hammer my Gitea instance with requests for every file in most repos and asking for every revision of those files.
Complete retards.
I was talking about this a while ago, like, they love git repos. People make these complex tarpits for AI but all you have to do is just run cgit somewhere.
> when they didn't receive a response in time (obvious when you are sending ~15r/s to a small server), they just closed the connection and tried again in 30 minutes while still scraping other files.
Fucking assholes.
> Claude on the other hand completely ignores the meta tag and robots.txt,
Are they one of the ones that tries the "/ai.txt" or something or do they just fucking scrape?
> They send requests for issues with numbers in the thousands and never stop when literally all of them return a 404.
Oh, I think they queue it up and then don't even notice until the queue is empty. I ended up just killing off their IPs, but because I also had to wipe the logs (media.fse ran out of space on /var) I can't check if they did.
Although it was good for a laugh. Watching Taylor Lorenz spin around going "what is federation?, where am I?" lol. Bitch read the documentation
Yeah I haven't seen much NAFO stuff anywhere all of the sudden except for a few Canadian accounts on X raging about Trump. Those seem to be people just bandwagon jumping though.
If Trump withdraws from NATO I will build churches in his honor.
> Although it was good for a laugh. Watching Taylor Lorenz spin around going "what is federation?, where am I?" was fun.
Ha, they blocked us right away onnacounta some DMs that were sent.
> Yeah I haven't seen much NAFO stuff anywhere all of the sudden
It stopped right after the election but before Trump got into office. CURIOUS

> If Trump withdraws from NATO I will build churches in his honor.

Are they one of the ones that tries the "/ai.txt" or something or do they just fucking scrape?
Nope, they ask for robots.txt and then immediately ignore it.
18.119.253.53 - - [23/Feb/2025
08:20 +0000] "GET /robots.txt HTTP/2.0" 200 1833 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)"
I ended up just killing off their IPs, but because I also had to wipe the logs (media.fse ran out of space on /var) I can't check if they did.
With Claude it's at least easy. Return 403 to the UA and you are done. Which btw still does not stop their attempts at scraping. They will continue to hit webserver even when they obviously aren't let through. From there a log monitor will do the job.
With the Chink scrapers, it's a bit harder than automated log monitoring. They are clever in a way, where they will not send you more than approx. 3 requests from one IP, meaning that the typical monitoring tools like fail2ban or something custom won't work as all of the ones I know of don't do subnet/ASN detection, or it will be very trigger-happy.
Thankfully they are retarded in other ways which make them stick out like a sore thumb in the logs. Currently I just look at the logs every few days unless they trigger alerts and throw the whole announced prefix into the trash. So far that has worked out great.
> :02:

> With Claude it's at least easy. Return 403 to the UA and you are done.
They completely hammered fedilist, not matter what I returned.
> they will not send you more than approx. 3 requests from one IP, meaning that the typical monitoring tools like fail2ban or something custom won't work
Oh, yeah, same shit they do with ssh. Luckily you can just kill off IPs on port 22 because it doesn't matter.
When in doubt bgp.he.net is your friend. Throw one of the annoying IPs into search->click on AS number->Prefixes vX and enjoy all the nullroutable prefixes.
forbes dot com/sites/calebmelby/2013/03/27/nir-zuks-palo-alto-networks-is-blowing-up-internet-security/
huaweicloud-git-scraping.txt
alibabacloud-git-scraping.txt