Low Pass Filter

Low Pass Filter

"Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com"

terryshiggy They've started including HTML-escaped entities in their goddamn UA headers. These people are idiots.

@ins0mniak There are like a million of these retarded "nmap-as-a-service" companies; it must be pretty fuckin' lucrative. (I'm pretty sure Palo Alto Networks is a fed operation, though.)

@p Yeah, bottom feeders man. A lot of those services charge ridiculous amounts of money to "security researchers" for API access.

Its somewhere between spam and data broker on the scale of people fucking up the internet.

@ins0mniak

> Its somewhere between spam and data broker on the scale of people fucking up the internet.

Yeah, if the data were public, I wouldn't care as much. I don't care if Shodan nmaps me. But these fucking fuckers, not just scammy but retarded. Like one time FSE slowed down because they were hammering a closed port, they refused to accept that I was not going to open up port 445 and they were sending so many connection attempts that the majority of the bandwidth was those fuckers.

@p @ins0mniak
These "people" all use the big 3 tech clouds to host their scrapers. Block their entire ASN and you avoid a lot of grief. Make exceptions if you have to but anyone who hosts their fedi server on azure, aws or gcp is not worth federating with. The few I've seen who I'm blocking this way are your typical shitlib mastodon site or g*rmans. No big loss.

@p @ins0mniak
open port 445 please please please
🥺
👉 👈
replies
1
announces
0
likes
3

@dj @ins0mniak

> Block their entire ASN and you avoid a lot of grief.

Yeah, most of the machines I run have a script that I have to maintain; lost a bunch of machines in December, though, and had to bring them back.

The list is sloppy:

censys="162.142.125.0/24 167.248.133.0/24 167.94.138.0/24 167.94.144.0/22 192.35.168.0/23 198.108.204.216/29 199.45.154.0/23 206.168.32.0/22 74.120.14.0/24";
research_esrg_stanford_edu="171.67.70.0/23"
nagra="185.35.62.0/23"
qrator="185.94.108.0/22"
paloaltonetworks="130.41.0.0/16 134.238.0.0/16 137.83.192.0/18 139.180.240.0/20 165.1.128.0/17 165.85.0.0/16 167.94.198.0/24 168.149.240.0/21 198.135.184.0/24 198.235.24.0/24 204.87.186.0/24 205.210.31.0/24 208.127.0.0/16 66.159.192.0/19 66.232.32.0/20 74.221.128.0/20"; # https://rdap.arin.net/registry/entity/PAN-22
comsys="137.226.113.0/26" # http://researchscan.comsys.rwth-aachen.de/
shadowserver="184.105.139.67 184.105.139.68 184.105.139.69 184.105.139.70 216.218.206.66 216.218.206.67 216.218.206.68 216.218.206.69 74.82.47.2 74.82.47.3 74.82.47.4 74.82.47.5 184.105.247.194 184.105.247.195 184.105.247.196 184.105.247.197 65.49.20.66 65.49.20.67 65.49.20.68 65.49.20.69 184.105.247.238" # for i in $(seq 1 99); do n="$(printf scan-%02d.shadowserver.org $i)"; echo -n $n ' '; dig +short $n; done
my_tiny_bot="44.230.252.91 52.25.208.208 100.21.24.205"
fidget_spinner_bot="54.184.159.16 44.231.202.44 50.112.160.3"
botguy="$my_tiny_bot $fidget_spinner_bot"
constant_contact="205.207.104.0/22 208.75.120.0/22 216.21.230.0/24";
spamboxes="$constant_contact"
internet_measurement="87.236.176.0/24 193.163.125.0/24"; # https://internet-measurement.com/

@dj @p @ins0mniak
>The few I've seen who I'm blocking this way are your typical shitlib mastodon site or g*rmans.
whats the difference between the two?

@dj @p I've seen that too. Remember that all journalist masto instance? Dumbass who started it was hosting on AWS. The dude was up for less than a week and was asking people to donate to cover his "$3000" a month hosting costs.

Granted he was being a little bitch and trying to get gibs but still.....AWS? christ with these people.

Also yeah fuck the Germans.

@ins0mniak @dj

> Remember that all journalist masto instance?

There have been many, but I imagine this is the one you mean: https://fedilist.com/instance/journa.host .

(There seem to be a lot: https://fedilist.com/instance?q=journalists&ip=&software=®istrations=&onion= )

On the topic of feds, all of the "NAFO" instances disappeared some time after the election: https://fedilist.com/instance/recent-changes?host=nafo.army,nafo.social,nafo.uk

> The dude was up for less than a week and was asking people to donate to cover his "$3000" a month hosting costs.

Ha, I remember that.

> Also yeah fuck the Germans.

Operation Bent Paperclip

@p Yeah i agree. Scan away if that's all you're doing. The people that do these services..bro I can't even tell you how much I hate them. It's the same type of snivilling fuckbags that get your social security number and address and then leave them up on a unsecured server to get leaked. Then they'll prance around and talk about "competitive data analytics" or some bullshit.

Fuck now I'm all worked up

@ins0mniak

> Fuck now I'm all worked up

How do you feel about running a pirate radio station?

@ins0mniak @p
They like to spoof their user agents to look like an iphone or some other benign device. But if all that user agent does is http GET and never POST, then it's a scaper.

@dj @p Oh yeah that's not really fooling anyone if they're fingerbangin your server.

Its the equivalent of throwing a trashcan down a flight of stairs.

@p @ins0mniak > (I'm pretty sure Palo Alto Networks is a fed operation, though.)

their founder is an israeli unit 8200 alum

@vonzeppelin @ins0mniak

> their founder is an israeli unit 8200 alum

FFS, okay, yeah, if that's accurate, then definitely.

On the other hand, he appears to employ a pack of idiots.

@p @dj censys was the first one I thought of. Those fucks.

"get a research licence for thousands of dollars for a faggy ass scraper"

@ins0mniak @dj

> censys was the first one I thought of. Those fucks.

Complete dipshits.

@vonzeppelin @p If its not a fed thing its fed adjacent.

@p @ins0mniak
>I was not going to open up port 445 and they were sending so many connection attempts that the majority of the bandwidth was those fuckers.
I had an angry moment over Chinese scrapers two weeks ago after I promptly nullrouted half of Huawei Cloud 2 weeks before that. They thought it would be great to switch to Alibaba US and hammer my Gitea instance with requests for every file in most repos and asking for every revision of those files. And in typical Chink when they didn't receive a response in time (obvious when you are sending ~15r/s to a small server), they just closed the connection and tried again in 30 minutes while still scraping other files.

And they have the audacity to use normal browser UAs from a randomized selection of a few making them very hard to block in an easy. Claude on the other hand completely ignores the meta tag and robots.txt, but at least they have "ClaudeBot" in the UA making them trivially blockable in nginx. That said, Claude is also retarded in a different way. They send requests for issues with numbers in the thousands and never stop when literally all of them return a 404.

@phnt @ins0mniak

> hammer my Gitea instance with requests for every file in most repos and asking for every revision of those files.

Complete retards.

I was talking about this a while ago, like, they love git repos. People make these complex tarpits for AI but all you have to do is just run cgit somewhere.

> when they didn't receive a response in time (obvious when you are sending ~15r/s to a small server), they just closed the connection and tried again in 30 minutes while still scraping other files.

Fucking assholes.

> Claude on the other hand completely ignores the meta tag and robots.txt,

Are they one of the ones that tries the "/ai.txt" or something or do they just fucking scrape?

> They send requests for issues with numbers in the thousands and never stop when literally all of them return a 404.

Oh, I think they queue it up and then don't even notice until the queue is empty. I ended up just killing off their IPs, but because I also had to wipe the logs (media.fse ran out of space on /var) I can't check if they did.

@RedTechEngineer @dj @ins0mniak TRAMPING OUT A VINTAGE WHERE THE GRAPES OF WRATH ARE STORED

@p @dj Yeah that's the one.

Although it was good for a laugh. Watching Taylor Lorenz spin around going "what is federation?, where am I?" lol. Bitch read the documentation

Yeah I haven't seen much NAFO stuff anywhere all of the sudden except for a few Canadian accounts on X raging about Trump. Those seem to be people just bandwagon jumping though.

If Trump withdraws from NATO I will build churches in his honor.

@ins0mniak @dj

> Although it was good for a laugh. Watching Taylor Lorenz spin around going "what is federation?, where am I?" was fun.

Ha, they blocked us right away onnacounta some DMs that were sent.

> Yeah I haven't seen much NAFO stuff anywhere all of the sudden

It stopped right after the election but before Trump got into office. CURIOUS alexdenton

> If Trump withdraws from NATO I will build churches in his honor.

trumpsmug

@p @vonzeppelin Most feds do

@p @dj Imagine paying thousands of dollars for something you could do for free with a little Go programming....(or whatever the hell else someone wants to use)

@ins0mniak @dj You can do it in bash!

Israeli "competency" is largely in contrast to low-trust peers.

@dsm @ins0mniak @vonzeppelin On second thought, that was obtuse rather than subtle.

@p @dj

>NAFO

Those guys are such dorks.

@ins0mniak @dj From the geniuses that came up with "Maybe we can de-radicalize them by telling them all that Pepe is gay."

@p @ins0mniak

Are they one of the ones that tries the "/ai.txt" or something or do they just fucking scrape?

Nope, they ask for robots.txt and then immediately ignore it.

18.119.253.53 - - [23/Feb/20250208:20 +0000] "GET /robots.txt HTTP/2.0" 200 1833 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)"

I ended up just killing off their IPs, but because I also had to wipe the logs (media.fse ran out of space on /var) I can't check if they did.

With Claude it's at least easy. Return 403 to the UA and you are done. Which btw still does not stop their attempts at scraping. They will continue to hit webserver even when they obviously aren't let through. From there a log monitor will do the job.

With the Chink scrapers, it's a bit harder than automated log monitoring. They are clever in a way, where they will not send you more than approx. 3 requests from one IP, meaning that the typical monitoring tools like fail2ban or something custom won't work as all of the ones I know of don't do subnet/ASN detection, or it will be very trigger-happy.

Thankfully they are retarded in other ways which make them stick out like a sore thumb in the logs. Currently I just look at the logs every few days unless they trigger alerts and throw the whole announced prefix into the trash. So far that has worked out great.

@phnt @ins0mniak

> :02:

terrylol2

> With Claude it's at least easy. Return 403 to the UA and you are done.

They completely hammered fedilist, not matter what I returned.

> they will not send you more than approx. 3 requests from one IP, meaning that the typical monitoring tools like fail2ban or something custom won't work

Oh, yeah, same shit they do with ssh. Luckily you can just kill off IPs on port 22 because it doesn't matter.

@phnt @p @ins0mniak It'd be nice if there was a list of ips they use so we could iptables them out of our houses.

@nyanide @phnt @ins0mniak Check the NRO delegated stats dataset.

@p @dj yeah man. and its free

@nyanide @p @ins0mniak I'll send them when I'm done with other stuff (couple hours).

When in doubt bgp.he.net is your friend. Throw one of the annoying IPs into search->click on AS number->Prefixes vX and enjoy all the nullroutable prefixes.

@p @dj You know those kids at school that don't have friends but they always sit together at lunch because there's no where else to sit?

that's NAFO

@p @phnt If you follow some of those ssh attempts from your logs you find a lot of compromised systems, almost always in their backyard.

@ins0mniak @phnt Oh, yeah, absolutely. In fact, if you just replay the same shit they are doing back at the machines that are sending the traffic, you probably get a bot army yourself.

@p @ins0mniak don't take my word for it

forbes dot com/sites/calebmelby/2013/03/27/nir-zuks-palo-alto-networks-is-blowing-up-internet-security/

@ins0mniak @p Yeah, same with the random Mirai droppers you sometimes see.

@phnt @p scan for soe low hanging fruit, use some exploitdb thing and they got themselves a scanner.

@p @phnt Yeah I mean I'm sure they just massscaned for some easy ass cve and took over.

It's always like a supermarket or an antiques store in asia somewhere.

@p @ins0mniak Yarrrrrrrrrrrrrrrrrr!

@nyanide @ins0mniak @p Here are the ipsets that currently deal with most of the traffic. Claude, Amazon and FB are blocked based on UA in nginx.
huaweicloud-git-scraping.txt
alibabacloud-git-scraping.txt
huaweicloud-git-scraping.txt alibabacloud-git-scraping.txt

@phnt @p @nyanide @ins0mniak if i wanted to scrape from the fediverse i'd just set up an instance and a user i use to talk to others amicably and that's it

@mischievoustomato @p @nyanide @ins0mniak These aren't for Fedi scrapers. These are IPs that kept hammering my Gitea instance until it almost died. One day I literally woke up with 20 alerts in my inbox because of these retards.