noauthority.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Long live NAS!

Administered by:

Server stats:

1.3K
active users

:tyrellmanic::elliotmanic: Big scraper! :ed::terrylol2:

Looks like they are trying to get all of fedi, including dumping following/followers lists. Looks like (more on this later) they already have following/followers lists for at least a few hundred servers, as well as a few tens of gigs of timeline.

Hostnames: node$x.testsmall10.mastodonmeasure-pg0.utah.cloudlab.us

IPs: 128.110.218.165 128.110.218.181 128.110.218.172 128.110.218.182 128.110.218.180 128.110.218.197 128.110.218.164 128.110.218.169 128.110.218.185

CNAMEs for those hostnames are hp126, hp142, hp133, hp143, hp141, hp158, hp125, hp130, and hp146.utah.cloudlab.us, respectively.

Sample requests:
128.110.218.181 - [2024-11-26T10:55:20+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:20+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:20+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:20+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:20+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:20+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:21+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:22+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:22+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:22+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:22+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
128.110.218.181 - [2024-11-26T10:55:22+00:00] "GET /api/v1/timelines/public?local=true&limit=40 HTTP/1.1" 429 169 "-" "python-requests/2.32.3" 0.000 - - - fsebugoutzone.org - -
Matt Hamilton

@p thanks for the notice. I blocked the /24 because I'm lazy. I'm also too lazy to inspect what little logs I keep to know if they've been scraping NAS, but I've done my part to throw a wrench. Hopefully it hits one of them.

I have no problem with people scraping stuff for archival or whatever, but when it comes from a University like this, the odds that they're using it as part of some undergrad's "Misinformation and hate speech on the Fediverse" thing is pretty high, so fuck 'em.

@eriner

> I have no problem with people scraping stuff for archival or whatever,

If they do it without flooding my machine, I don't notice and thus don't complain. This is 7 attempts per second, with 100% of them returning 429. Unless they're paying my overages, fuck them. The easy way to get the data is to just follow the goddamn relay endpoint and get the posts delivered to your own instance instead of making it everyone else's problem.

> that they're using it as part of some undergrad's "Misinformation and hate speech on the Fediverse" thing

This is an NSF cluster; odds are good that it's related to the NSF's "combating misinformation" deal. The last time that publicly bumped into fedi, it wasn't an undergrad, it was a psychology professor at MIT Sloane.

@p @c92a979036ccbbe62736de83ec9258fe2fc5608f5d51b2185bf2611210523e28

> it was a psychology professor

Yeah, I remember that and even then had presumed that it was actually an undergrad whipping boy doing all of the work while the professor stamps their name on it and runs PR interference. I assumed the same was going on here, but if it's actually the NSF then fuck 'em even more, lol.

> just follow the goddamn relay endpoint

I suspect they don't do this because they want historical records.

@eriner @c92a979036ccbbe62736de83ec9258fe2fc5608f5d51b2185bf2611210523e28

> I assumed the same was going on here

Likely.

> but if it's actually the NSF then fuck 'em even more, lol.

Well, NSF-funded or subsidized.

> I suspect they don't do this because they want historical records.

I'm not sure what you mean. They'd have better records if they did it this way: they'd know, for example, whether to account for clock-skew or forged timestamps, since they'd get the data delivered when it's created.

@p @c92a979036ccbbe62736de83ec9258fe2fc5608f5d51b2185bf2611210523e28

> I'm not sure what you mean

What I mean is that they're probably interested in collecting more than just the posts published after the time at which they set up the collector/scraper; they probably want posts published prior to the instantiation of their collector.

@eriner Okay, yeah. If they set up the collector earlier, they wouldn't have that problem, either, though.