noauthority.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Long live NAS!

Administered by:

Server stats:

1.4K
active users

Matt Hamilton

I've been analyzing the public NPD data leak. Not *all* of it, only the most public ~277GB (uncompressed) corpus: troyhunt.com/inside-the-3-bill

Corporate media headlines said 2.9, 2.7 billion, "every American", etc, which raised questions in threads about it: noauthority.social/@ned/112962

Troy Hunt estimated ~899M unique SSNs, though I'm curious if his "100M samples" were random or sequential because I've noticed formatting differences between sections of the corpus, indicating different origins.

Cont...

Troy Hunt · Inside the "3 Billion People" National Public Data BreachI decided to write this post because there's no concise way to explain the nuances of what's being described as one of the largest data breaches ever. Usually, it's easy to articulate a data breach; a service people provide their information to had someone snag it through an act of

On to some observations about this corpus that I can make with a high degree of confidence:

* The corpus does NOT contain 2.9 or 2.7 billion people.
* "Every American" is NOT present in the corpus.
* The corpus does contain accurate information about many Americans.
* The corpus does contain accurate information about deceased Americans.

Not that anyone here would ask, but like Troy, I'm not a databroker or your personal lookup service.

These posts are purely informational.

Continued...

I'll post an approximate count of total unique SSNs in this corpus when I have it.

I'd like to do some fraud analysis to determine if/how many "hot" SSNs may be being fraudulently abused by what appear to be multiple people. However his will be tricky because of recycled numbers, name changes, etc, so I'll have to experiment a bit and see how feasible this is. Processing "big data" is time consuming no matter how efficient you are.

The BleepingComputer article was one of the better early general-audience writeups: bleepingcomputer.com/news/secu

> previously leaked samples also contained email addresses and phone numbers

> The data breach has led to multiple class action lawsuits against Jerico Pictures, which is believed to be doing business as National Public Data, for not adequately protecting people's data.

There is also a lawsuit:
documentcloud.org/documents/25

> Defendant Jerico Pictures, Inc. d/b/a National Public Data

If this filing is accurate, my question is, why is Jerico Pictures, Inc, a Florida business "Located in both Los Angeles and South Florida, Jerico Pictures maintains a talented group of film and television producers with a passion for storytelling", headed by its president, Salvatore Verini (JR?), doing business as National Public Data, a background check service?

web.archive.org/web/2024080218

web.archive.orgAbout – Jerico Pictures

And I don't know if this is the same Salvatore Verini (JR?)

imdb.com/name/nm4701915/

salvatoreverini.com/

Interesting to note that the twitter account linked on the aforementioned website doesn't exist.

But his Facebook and Instagram accounts are still up. His Facebook bio says he's "EP Jerico Pictures".

I am somewhat perplexed by the use of both "Salvatore Verini", and "Salvatore Verini JR" in Jerico Pictures' annual filing with the State of Florida.

Krebs compiled a fair bit of info about Sal: krebsonsecurity.com/2024/08/na

> The Florida Secretary of State says Jerico Pictures is owned by Salvatore (Sal) Verini Jr., a retired deputy with the Broward County Sheriff’s office. The Secretary of State also says Mr. Verini is or was a founder of several other Florida companies, including National Criminal Data LLC, Twisted History LLC, Shadowglade LLC and Trinity Entertainment Inc., among others.

Krebs' is the best general writeup so far.

krebsonsecurity.comNationalPublicData.com Hack Exposes a Nation’s Data – Krebs on Security

Small update. Still working on getting the DB built and data imported, (onto round 9). I'm continuing to see "invalid page in block xxxxxxx of relation base /xxx/xxx" and various data checksum errors, which are presumably from transient write failures. What's weird is that this is present even after I put it on zfs.

I'm forced to conclude that it's a "hardware" problem, so I'm going to try disabling the write cache on the NVMes.

Nothing's ever easy.

*snap*

Also, I got the rest of the "partial" data leak, but I haven't even begun to open and sort through that can of worms.

After finally tracking down and resolving my DB corruption issue, I imported everything again last night.

There may have been (read: probably was) some corruption when I'd unarchived the corpus, plus potentially some issue(s) with my pre-insertion normalizer.

Given those caveats, here are my results:

Total rows of data: 2,695,281,509

Total distinct/unique SSNs: 272,384,882

272M is a bit off from Troy's early estimate of 899M.

I'll start from scratch again later to validate my results.

@eriner I have to leave before you've finished your thoughts, so I'll just leave this here:

>> ~899M unique SSNs

LOLWUT?

@eriner unless this includes fraudulent identities somehow... in which case there might be hope for humanity

@IceCubeSoup

What number were you expecting?
Why is that number fake?

If 7-8k ppl die each day in the US and SSNs were first issued in 1936, how many would you expect in circulation?
It's a 9 digit number, right?

Just call it 100 years, so just the numbers issued to dead ppl on an oversimplified estimate:
100 * 365 * ~7000 = (over 255 million)

It's probably not that high, bc the population growth rate, but it's a reasonable estimate. Plus lots of other ppl get SSNs, too.

@eriner

@eriner I can’t help but think that this “Leak” will be part of a push to institute a new “Smart” ID system since we can no long trust SSN’s.

Just waiting on that shoe to drop.

@Jagahati that, and it's just in time for an election.

@Jagahati @eriner they've been working on one since the obama administration.

@eriner

Remember, every person on earth that has to report income to US gov has to get some kind of tax identifier, if it involves paying into SS, you get an SSN

@eriner

My theory is that this is an op and they will use it to issue news SSNs under the guise of there is not really being enough numbers and not wanting to recycle them.

This will then allow them to “accidentally” give Social Security numbers to tens of millions of illegals.