Hacker News new | past | comments | ask | show | jobs | submit login
Verizon/Yahoo Blocking All Attempts to Archive.org Groups. Deletion: Dec.14 (modsandmembersblog.wordpress.com)
571 points by Diagon 5 hours ago | hide | past | web | favorite | 153 comments





There have to be some Verizon or Yahoo employees on HN who are reading this.

Can any of you shed some light on why Verizon and Yahoo aren't cooperating with the Archive Team to archive this valuable historical content?

(If you don't feel comfortable commenting with your regular HN account, maybe you could do so with a throwaway account?)

Also, is it possible for any of you to bring this issue to the attention of upper management and help them understand how important it is to archive this?

You Verizon/Yahoo employees have much more power to make a difference here than anyone of us from the outside can.


Probably not very helpful/informational but:

I work for VzM, but not historically directly on Yahoo products (product teams have been merged/consolidated etc. over the past few years, but there's still strong tendencies toward products people came from).

So I wouldn't be very clued into what's happening with Yahoo Groups internally. And I've heard nothing about this internally. At all.

As it stands, it's 2:30pm in SV, VzM is top of the HN frontpage, and not a single soul has mentioned it yet on internal Slack.

Will see if I can find out more.


I hope this doesn't sound naive, but what does the M in VzM stand for?

Media. Verizon Media is the specific division of Verizon that contains Yahoo, AOL, and VDMS (formerly edgecast)

Not OP but I assume it means verizon media

https://en.wikipedia.org/wiki/Verizon_Media


That could really help. Thank you!

When I was at aol I tried to get them to open source the q link server code from the 1980s. Someone actually got it on DVD for me and everything but after the Verizon merger they fired the entire legal team that was responsible for authorizing open source release and it just stalled.

Open sourcing code can be tricky—there's quite a bit of review that needs to go into doing it right, as well as more work if you want the release to actually be reasonably useful. Blocking this archiving effort is on a whole other level. We're talking about saving information that was already public. All they have to do to allow this to happen is... nothing. I can't comprehend why Verizon/Yahoo would go out of their way to block these efforts.

It depends on the size of the codebase and how shitty your programmers are, but if you aren't greedy or scared of over-litigation, it isn't hard at all.

I have written great contributions to a python API library that could be of benefit to the community around it. The code has nothing to do with my company's core competency, and the code is used for internal orchestration, so "exposing insecure code" is an unlikely concern.

It is easier for a lawyer, especially a luddite, to say "no" than to help their employees give back to the world.


>open source the q link server code

what a lovely thought. Thanks for the effort, even tho it didnt pan out. if you've got the dvd torrent it out :)

now im wondering if there's a stratus emulator anywhere and/or the os code. Them things were nasty... individually battery backed hard drives was just the beginning. The slot cards looked like someone had dumped yellow patchwire spaghetti all over them.


Nah I don’t have the dvd and gave up trying to get it released because it wasn’t my job.

They want to wipe their hands clean of it. They don't want a record of it.

Is this something you know from the inside, or your (probably good) guess?

Just a guess to be clear. Not an employee myself.

I would imagine there's a lot of porn, sex work, stuff Verizon is trying to wash its hands of lately with Tumblr.

EDIT: Ah yes HN - where admitting that you're just speculating wildly like everyone else gets you the downvotes -_-


Speculating wildly is OK. Speculating wildly using definitive-sounding statements, particularly in a thread directed specifically towards employees of a company, is extremely misleading.

If you had started with “Not related to Verizon in any way, but IMO...” that would have been perfectly fine, but given where and how you made the statements it looks remarkably like you were claiming inside knowledge of the situation, and then feigning innocence when called out.


weak speculation gets downvotes m8

Extensive history is about to be lost. Despite being broken, may organizations still use it. Examples from that post:

A police cooperative in Washington DC that was using them as a network to communicate with their respective neighborhoods with over 17,000 members.

A phone company in the UK that assigns phone numbers using the groups and now will lose all those phone designations when it’s deleted.

A Birding group in new Delhi with 2,000 members that has collected data and research on birds for TWO DECADES.

An Adoption group in France, that has been using it for years and years to communicate and share history and photos and more.

They also would have found: Numerous support groups for people who are suicidal or depressed.

Numerous medical groups for people to communicate more effectively with their doctors.

Numerous Vet groups with 24 hr care advice for sick pets.

Numerous support and help groups for the Elderly.

Numerous Historical groups for WW2 Veterans, Vietnam Veterans, and etc.

Numerous science groups that have used them for years and have all their research there.

Numerous fan fiction groups or arts groups that have shared their work for years.


> A phone company in the UK that assigns phone numbers using the groups and now will lose all those phone designations when it’s deleted.

Wow, somebody invented a database that's even worse than an Excel file on a network share.

(Also, how are they going to assign new numbers when archive.org takes over? Is archive.org going to give them write access?)


“My understanding is that [the group] will still function as a mailing list, which is for all practical purposes, what people use this as,” https://www.theverge.com/2019/10/17/20919630/yahoo-groups-uk...

That's right, but (our main concern) is that the archives are being deleted. With no further history being recorded, it's utility for some purposes is limited. I have also come across some complaints that even as a list-serve it can be problematic. Posts, for example, are no longer coming in order.

And this is the dangers of relying on a private, corporate, for-profit law-bound organization. They're susceptible to abiding by the laws and of course, there is a cost attached to all of this.

Exploiting a free resource, as we all do these days (reddit, youtube, facebook, hackernews itself etc) is all well and good but maintaining history is expensive (content needs moderating, you are required to abide by the GDPR and DMCA, there may be disputes about content on the platform).

I mean, Google+, MySpace, Bebo, IMDB comments is now dead and gone, how useful was the data really? I'm sure some people might go to archives but I would imagine 95% of the data is just "rot" that has no value or substance.

History is lost all the time, we barely know what we've been up to the last few thousand years only now can we so extensively document our world with the precision and quality afforded to us.

But in the end, time moves on and some of that history is lost, it hurts, but whose to say any archived history will be preserved anyhow? We're still relying on our storage technology being readable years/decades/centuries from now, which is not a given.


While I agree with your first point, and tried to get groups I was associated with to move for years, nevertheless there are groups there that engaged in community driven research and have important data uploaded there. (This is my main concern, though other groups were focused on different issues - uploaded art, for example.) So I think while we need to educate people about not using centralized providers like Yahoo and Google, right now we need to focus on getting someone at Verizon/Yahoo to respond to this urgent situation.

> maintaining history is expensive (content needs moderating, you are required to abide by the GDPR and DMCA, there may be disputes about content on the platform).

Things shouldn't be like this. The price per unit of storage and bandwidth falls fast (and, except for the sites dealing with user-generated videos, faster than the amount and size of content grows). Laws shouldn't apply retroactively.

The problem really is that our means of accessing information are services. When you have a physical letter, or an e-mail saved locally, or a text message from 15 years ago, you can just read them. Nobody will know or care. Nobody will come after you trying to apply GDPR or DMCA retroactively. And since storage is near-free, you won't ever lose it until you forget about it (or at least about doing regular backups). Whereas with modern webmail, forums, link aggregators, IMs - you don't have even your own messages, and viewing a conversation that happened 15 years ago is really being provided a service today. Services are ephemeral, they're also subject to ever-changing regulations and whims of the service providers.

Bottom line, while services are necessary for transferring conversations, we really shouldn't be relying on them for access to conversations that already happened.


If you are a company, GDPR does apply to data on physical letters and local emails. A large part of the preparation for the introduction of GDPR enforcement was companies getting a handle on what they had stored in various media.

actually email and letters are something which the gdpr falls short in some countries. especially germany. since basically the constitution is above the gdpr and depending on the letter/email the content of the letter does not need to be acknowledged or showed (gdpr also means you can access your data) to the person who want his data deleted/showed/whatever.

disclaimer: I'm a Member of Archive Team who's helping coordinate the joining of Yahoo Groups in preparation for archival.

Yahoo's banning of a large amount of the accounts we were using is a huge setback for us. In total we lost over access to over 55,000 Yahoo Groups, many of these will now not be archived and will be lost when Yahoo deletes everything on December 14.

Particularly disastrous was the loss of access to all of the 30,000 Fandom (fanfic / fanart / etc..) groups that were requested to be archived by members of the fandom community. We're back to square one now, and it is looking increasingly likely that we're only going to be able to re-join (and therefore archive) a small percentage of these groups before December 14.

(And now for the inevitable, shameless plug...) We could really use some help! If you've got an hour or so, we could really use people to come and complete CAPTCHAs for us. (A CAPTCHA is needed to join every group). Instructions at: https://github.com/davidferguson/yahoogroups-joiner


I tried to do this but upon clicking the purple "Join Group" button Yahoo is giving me an error saying my email address is not linked to a Yahoo account:

> Your email address is not linked to a Yahoo ID. To join this group, you need to link your email address to a Yahoo account.

When I click "link your email address", it just takes me to a page called "Personal info" which doesn't have any obvious way to link my email address.

So I'm not sure how to proceed.

EDIT: Solved it. I had initially only "verified" the account with a phone number, but you have to add an email address as well. It's now working.

For anyone who, like me, signed up for this and filled in the Google form, but then couldn't find the leaderboard URL after closing the tab, it is https://df58.host.cs.st-andrews.ac.uk/yahoogroups/leaderboar...

It seems to be working through a list in reverse alphabetical order. Watching the progress being made is quite satisfying. When I started it was on groups like "sciencefiction" and now it's moved on to "petzluverz".


While the above post is concerned with Fandom groups, my concern is with groups that started doing early community driven biohacking type research. There are medical tests results and discussions of medical interventions. While that's my focus, I'm sure there's additiona important material. We really need to save this data.

FYI: The extension offers many private groups that I can't join without approval and that seems to disrupt the flow of the extensions.

Yeah, sorry about that. The current (as of 2100 UTC) set of groups being sent out to be joined were ones submitted through our nomination form: https://tinyurl.com/savegroups

I did specify that groups requiring approval to join shouldn't be submitted, but not everyone took notice. (And then there was the several dozen Google Groups URLs that were submitted!)


It seems a weird set of groups. Like, lots of three-to-five person groups roleplaying doctor who, spiderman and things like that. Is this the long tail of what hasn't been archived or is there not even a good way to tell post/member count without loading up through the extensions?

From IRC (betamaxthetape):

It's a set of groups that have been specifically requested by the fandom community. Of course, the groups handed out depend on what's been joined, so if / once all the fandom groups are joined, we'll move onto something else.

I appreciate this isn't made clear in the instructions, but if you have a desired set of groups in mind, you don't need to use the chrome extension. Just join the groups you want saved and (provided you've sent the account details through the form) they'll be added to the queue to be archived. I did a lot of Amateur Radio (Ham Radio in US) groups that way.


Yeah not volunteering for that mate.

See immediately above in the thread. Instructions were perhaps not clear.


Is there any cited reason for the groups they're blocking?

Verizon's response, and the response to the response, are in the article of the OP. They claim they offer a Group Downloads Manager, but it's very broken.

btw, maybe Mechanical Turk could help with the captcha part?

I feel like there must be some protection in place against using mTurk with captcha, or it would have already been abused.

Mturk's turnaround for this stuff can't be fast enough to work would be my guess. I know jobs I put up there for transcription, despite a generous bonus, were always delayed for at the very least hours.

Just solved a bunch of captchas, but Chrome crashed a few times during. Due to the addon?

I've been using Edge (Chromium) for past few hours, no issues yet. Plugin could be unrelated to your crashing. May help to use a standalone Chromium build for this https://chromium.woolyss.com/

I checked on IRC. One person says they've been using it for hours on chromium without a problem. "I've been using Edge (Chromium) for past few hours, no issues. Could be unrelated, could be related. May help to use a standalone chromium build for this."

I imagine you guys already know this but considering we’re up against the timeline, I’d use the captcha solving service (easy to google yourself) and Luminati to distribute the IP addresses while swallowing my ethical qualms.

I would donate my IP/bandwidth to archive.org if I could run a scraper easily.


Unfortunately it doesn't offer a qemu-compatible image or an image that would work when converted, it's a shame and shooting itself in the foot.

You should be able to trivially run the Dockerfile[0] on a standard Ubuntu image for qemu, should that be your only reason for desisting.

0: https://hub.docker.com/r/archiveteam/warrior-dockerfile/


Thanks! I never heard of that before; just like project SETI though for archival purposes.

What are the hardware requirements of that VM? I'm attempting to import it on my NAS4Free home NAS Virtualbox service which is the only machine I keep up 24/7 atm, but it takes forever to import. The hardware is very limited however (Atom D410 + a bit over 1GB RAM available), so I'm not sure it would succeed, but so far it loads forever, no errors given. I'd like to run it for this project to start contributing quickly albeit with limited hw before the deadline, then find better iron in the future.


I don't find it processor or memory heavy, it's mostly doing a lot of IO (network and disk).


Forgive my naivety, but why would blocking of your accounts delete the data you have already backed up? This sounds like you are doing it the wrong WAY, IMO.

Two reasons: (a) If we hit Yahoo with everything we've got, groups would have almost certainly crashed, or at least become unbearably slow. That's not a reasonable thing to do, and would be (IMHO) grounds for Verison banning us.

(b) We were still testing / writing the scripts to do the actual archiving. Most of the groups we did save before the banning were from test runs of the archiving script.

And sure, given hindsight, I'd do things differently. We've learned, now, and are archiving a groups soon after it is joined.


OK, thanks for explaining this. Just my 2 cents then: big companies make decisions like this based on the potential PR win/loss. If ignoring you keeps the PR delta at 0, while allowing to export the data exposes them to even a minimal risk (I dunno, someone's private details buried in), they will ignore, or even actively resist you.

Politically, you need to arrange it so that cooperating with you will give Verizon a small PR boost, while ignoring you will be seen negatively by the public. This thread had a good example of interesting data that is worth preserving, so I would try reaching out to news companies (NY Times and whatnot) to see if anyone wants to publish a piece. Phrasing this positively and ensuring enough people see it, would greatly increase the chances of cooperation from Verizon.


They hadn't backed up yet. They had set up accounts with yahoo that they were then planning to use to back up those groups. Backups themselves were starting, but they had to go slowly enough not to bog down yahoo's servers.

Have you considered using NordVPN for CAPTCHA bypass? They are a shady company, but their network of residential VPNs is impressive.

I'm genuinely curious from an ideological perspective, why archivists think all this material is worth saving?

People often compare the shutting down of sites or the banning of content (e.g. When Tumblr banned porn, or now yahoo shutting down groups) to the burning of the Library of Alexandria. But there is a huge difference. The LoA held knowledge collated and collected by the best thinkers of the time. The Internet is not that. The Internet is an open platform where anybody can say anything like that. Most comment sections are filled with all sorts of material ranging from factual to entirely fictional.

I realise it is hard to decide what is worth keeping (and therefore erring on the side of saving it all), but I'd wager that the vast majority of archived content is not useful at all. The Wayback machine is a perfect example. Lots of great stuff, but that's a drop in the bucket compared to the vast amounts of useless, or even redundant information stored.

It is a lot of resources thrown at saving, not the equivalent of the Library of Alexandria, but the public toilet block graffiti wall.

Anybody want to share what drives them to do this?


Even if we still had the Library of Alexandria, it may have shed zero light on the actual lives of citizens. Archiving content on the internet means capturing thousands of individual level perspectives and experiences. We don't know what will end up being important to historians 50 or 100 years from now. I would bet there are dozens if not hundreds of historians that would give anything for a record of their favorite time period that contains even a fraction of the amount of content today's archive efforts are storing.

It's also not horrendously expensive - we are getting better and better at storage as well data analysis techniques, so stuff that seems useless today may be useful 50 years from now and cost less to store than it does now. The key thing again being that we can't benefit from hindsight.

Even graffiti can give insight into a time period, even if that insight is that that time period had an unusually high number of graffiti artists.


What about people who don't want stupid comments they made online when they were 14 permanently indexed and searchable for all of time by the Archive Team? Yes, they may have posted to Yahoo! Groups back in 1999 when they didn't know better, but now it's 2019 and you have people digging up decades-old dirt on people to try and destroy their reputations and careers.

Given that search engines have zero ethics when it comes to removing embarrassing (but not illegal) content, sometimes the loss of information is a small blessing for some.

Yes, it's their fault, but I also don't think it's fair that something a child said at 14 should haunt them their entire professional careers, either.


For example: World War two groups where many of the the members have passed away by now. There could be first hand accounts of history that has already been lost to time.

Could?

More like definitely.


YES! It's like preserving ecological diversity. It's a store for later learning. Verizon is working in cold hard capitalism, and you can bet your lunch that they did NOT use Google Groups to hold their shared wisdom/history, and they would never let it be lost.

But many don't have the pockets for better systems, and so their earned knowledge lived on Google Groups. And when you think of all the people and groups that might have had needs to store their history, and what tools they might have used, what do you expect the skew of Yahoo Groups was. Certainly no Fortune 500 companies, but rather nonprofit and grassroots and all sorts of domains that are already getting the short end of the stick in our world :)


Step 1: We only need to archive the genuinely good content.

Step 2: It will take a long time to look through all this content and determine which parts deserve keeping.

Step 3: We will inevitably leave out something that someone else thinks is worth keeping anyway.

Step 4: Let's just archive everything.


See below. My main concern is early medical/biohacking groups that shared data, like medical tests, and engaged in extensive discussion/community driven research. Such groups go back to at least the late 1990's.

A main concern of the Archive Group (again, below) is art that was uploaded there.

I'm sure those are not the only two classes of examples. See for example the bird watching group in Delhi that has been collecting data for decades. (In the link of the OP.)


> I'm genuinely curious from an ideological perspective, why archivists think all this material is worth saving?

It's easier to just save it all and let gawd sort it out.

You never know what some future person might find interesting. For example, my father took lots and lots of pictures, but they're all set in the living room and kitchen. No pictures of the rest of the house. I'm sure the thought of photographing other rooms simply never occurred to him as being interesting.

For another example, many people are interested in where/when/why certain words first appeared, like the origin of "OK". Massive archives of text that are searchable would help with this.


Great question! I'll take an amateur swing at a decent answer:

People doing important work (esp important work that is underfunded) don't have time to write/record their own histories. But that history can be instructive, to learn what worked and what didn't, and help future travellers do it better :)

And perhaps especially important: ppl engaging in these under-resourced efforts are often working in domains that capitalism is... less curious about, we'll just say. Otherwise, it would likely be able to be more highly documented, as incentive is there to preserve it.

Our ability to improve our present from better understanding our past is a supposed benefit of a digital world that accrues data -- we have records of things that in prior ages just flew by in conversation (for better or for worse). But efforts like this rob us all of that wisdom <3

And again, there is an asymmetry in who gets robbed. It is often the folks working in the commons, those doing invisible maintenance labour (nonprofits, grassroots, community), and generally just people doing work within the cracks of capitalism.


It is a lot of resources thrown at saving, not the equivalent of the Library of Alexandria, but the public toilet block graffiti wall.

Ask an antiquarian about the value of graffiti in the ruins of Pompeii and other archaeological sites sometime. The great historians of the day wrote about their contemporary culture, while the vandals and miscreants and lowlifes and commoners contributed to that culture. Having access to both sources gives us a much more complete picture.

You don't know what's worth saving at the time you save it.


Ha, ha! Well, there's some high quality material there too, but I take your point. In the right context, like "history from below," all kinds of material can be high quality!

We don't think it's necessary to preserve everything that's ever spoken verbally. We don't lament that everyday conversation is ephemeral.

People are conflating internet discussion content with written content because it's stored as text. Whereas the more legitimate comparison is to verbal communication.


> We don't lament that everyday conversation is ephemeral.

I imagine you're not a historian. Neither am I, but I cannot imagine that there is a historian out there who hasn't lamented the ephemerality of everyday conversation (and even of apparently more durable forms of communication).


It's like the burning of the Library of Alexandria all over again.

We don't know exactly what was in the library when it burned. We assume it was all great works of intellectualism, but it could very well have been the fanfics of their time.


Except that the Library of Alexandria never actuelly burnt ! That is a very good ol' myth ;)

- https://www.firstthings.com/web-exclusives/2010/06/the-perni...

- https://www.ancientworldmagazine.com/articles/making-myth-li...

- https://history.stackexchange.com/questions/677/what-knowled...

But anyway, no one should delete human littérature, be it inadvertently or by lack of effort.


From Wikipedia: "Scholars have interpreted Cassius Dio's wording to indicate that the fire did not actually destroy the entire Library itself, but rather only a warehouse located near the docks being used by the Library to house scrolls"

If anything this would make the analogy even more apt, since only part of Yahoo is being destroyed. :)

Regardless, it's mostly used as a metaphor for the destruction of knowledge at this point.


Wait the library wasn't lost due to that fire, but the contents were slowly lost due to the passage of time and people not caring or having access to copy it's contents? That makes the analogy way better, but the "burning" part is sadly wrong.

Too often historical events turn out to be perfectly true, but claimed to be myths due to dizzying semantic distinctions.

Just looking at the third link, the most upvoted answer agrees that humanity suffered a significant loss of important information. And the 'myth' is just an asinine distinction regarding whether loss was due literally due to fire, or whether the information was lost due to some other cause. I think declaring it a myth in a conversation like this misses the point (it certainly isn't a distinction relevant to the original comparison made here to Yahoo Groups) and just serves to confuse people.


It's quite clear the library is no longer here. How exactly it was lost does matter as its destruction has been used to paint various groups as anti intellectual barbarians since ancient times. Eliminating the story as a weapon to attack others would do humanity some good.

Whoa. I guess what they say is true - say a lie often enough, and it becomes the truth.

These articles seem more concerned with detailing how important it is that it wasn't Christians. Makes sense for a organization centered around "religion and public life", I guess. Quite the angle.

Yahoo Answers is an invaluable trove of insight into an intellectual class of people that I think a lot of us regularly forget exist.

It is an absolute trove of insight:

https://www.youtube.com/watch?v=EShUeudtaFg


Before clicking that link I guessed at what it would be, and I was wrong but not far (I was expecting https://www.youtube.com/watch?v=Ll-lia-FEIY )


https://i.redd.it/yv9k5nes87rz.jpg

Not sure why this one kills me so much...


I think one of the unintended consequences of privacy legislation is it will support the burning the library of Alexandria over and over again.

The default corporate posture will be : Delete all the data! It's a liability and figuring out what we can keep is an enormous headache.


What prevents Verizon from donating the Yahoo Groups database to the Internet Archive? What does Verizon have to gain from preventing the archival of Yahoo Groups?

Companies don't typically operate that way. All else being equal (especially when there's no $$$ in it for them) when given the choice between doing something and doing nothing, they usually choose to do nothing. It's often not malicious, but an overabundance of caution. (i.e. lawyers raising red flags about liability, 'our IP' etc... it's a real pain even from the inside getting large companies to do anything different from the status quo)

My bet would be that Verizon's network monitoring system/team sees the archive team's attempts as some sort of anomaly to be stopped. It's possible, though I wouldn't bet on it given Verizon's history re: public relations, that making noise might alter the equation and get them to allow the archive team to continue.


It's simply way too much work. Dying projects generating no revenues don't get the luxury of having tens of people assigned to work on them.

Maybe those who care (we?) could organize a campaign to get customers to commit to leaving Verizon if they let the messages be deleted without archive? That would convert it into the language they understand.

To raise the perceived threat level, many folks could support in building tooling or docs to help ppl migrate as easily and streamlined as possible, to minimize the tax on consumer time that they rely on. (E.g., help on comparable plans, cheat sheet for call centre keywords, etc.)

Maybe something team "Do Not Pay" could help run with...! [1]

[1]: https://boingboing.net/2019/10/28/parking-tickets-plus-plus....


There is a campaign already. https://modsandmembersblog.wordpress.com/

Oh God, I'm that guy. I'd been following this elsewhere, so didn't actually expect I'd get new info from the link itself :/ [opens mouth, inserts foot]

I can imagine it's easier and safer (from a legal perspective) to just delete the data and therefore no longer be responsible for the content. Twitter wants to delete older Twitter accounts because they're required to by law under the GDPR.

I mean, the GDPR makes things kind of difficult in this regard, and I suspect even archives are liable if somebody takes an issue with content they are hosting.


> Twitter wants to delete older Twitter accounts because they're required to by law under the GDPR.

So, by analogy, if Twitter did allow people to download an archive of any public Twitter account's history... what would the GDPR require them to do? Wrap those archives in some sort of auto-expiring DRM?


This seems relatively cheap to fix. Spin off Yahoo Groups as a new corporation, and have that corporation subsequently donate all its assets. If the corporation somehow manages to get sued, it doesn't really matter, since it has no assets.

Or spin it off and sell it.


No non-privately owned company would ever willingly put itself through the legal and tax requirements for spinning off a new company with part of its assets just to do the right, non-profitable thing, with those assets.

Also, in my opinion, no privately owned company either, unless the owner was soon dying of something and wanted to get in good with their creator.


I’d assume the law is smarter than this, because companies would otherwise continually spin of new corporations to get rid of their liabilities with no assets as a sort of lightning rod for lawsuits.

This is, iiuc, how the movie and construction industries work. Spin up a minicorp for every big risky project to shield the mother ship.

When you create the SPV in advance, it's very clear what part of the work done by the organization attaches to it (because the organization ensures that all its processes explicitly specify the legal compartment they're running under.)

When you create an SPV after-the-fact, you have to go back and reverse-engineer a separation of liabilities from documents that don't specify whether they're work done for the organization or the SPV (because the SPV didn't exist.)

It's like a divorce. (Or, for an even more on-the-nose analogy, it's like trying to use a condom after-the-fact by extracting any bodily contamination and putting it in the condom.)


They do it regularly. Lead in gasoline (ethyl corp), asbestos.

If Yahoo Groups has a GDPR obligation now (and it's not clear that they do) they don't erase obligation that by spinning up a different company and dumping all this personal data into that new company - that would be its own GDPR breach.

That doesn’t sound correct, given they GDPR doesn’t generally apply to archival products.

Why not? According to GDPR someone can show up and request (1) fixing personal data (PII) like nickname - this is data accuracy requirement, in fact, according to GDPR Yahoo should do the data accuracy check (for instance send a reminder to the user to check data). (2) Someone can file data portability request, Yahoo needs to provide this. (3) Some can request data removal. (4) Yahoo has to managed user consents for anything they do with those data.

For a product that does not bring any revenue or significant revenue, it is better to dump everything and simply don't be associated with data any longer.

That's the side effect of GDPR, it is hard from the technical and financial perspective to maintain anything free on the Internet that keeps user's data.


GDPR has an actual archive exception to the "right to be forgotten", art. 17, §3d [0]. IANAL, so I don't want to say if it covers this archival, but I would hope so.

0: https://gdpr-info.eu/art-17-gdpr/


Anything being archived by archive.org is pretty clearly being done in the public interest. If it was something like Equifax archiving the data to use as a factor in people's credit scores then it would be much more ambiguous.

One of Verizon's spokespeople was literally Darth Vader. "Ma Bell has you by the calls".

Large corporations are not anthropomorphic entities, regardless of their disarming branding. Rather they are amoral bureaucracies, likely administered by people who have learned to ignore their empathy to get there. Verizon won't change course to accommodate the Internet Archive or general Internet community any more than a combine would pause for a field mouse.


The "dark side" of web scrapers has always been one step ahead with things like IP bans and CAPTCHA solvers, maybe it's time to get their assistance... as the old saying goes, "an enemy of an enemy is a friend".

there are a few groups i was a member of like lifters https://groups.yahoo.com/neo/groups/Lifters/info which was an intensive technical development group in the field on propellerless, rocketless, jetless flight using only electronic high voltage.

also some of the politics groups were a great time capuslue for around the clinton/bush election era

a lo to f eartthquake researchers gathered on several earthquake groups as well including caltech seismologistics and advanced amatuers many of whom arent around anymore.

also some of the info in these groups can be used to defeat patent applications as they show evidence of public prior concepts and art.

yahoogroups consisted of somewhat more technically advanced users than modern website users like reddit etc because they were earlier and somewhat harder to use.

its a lot of good quality content.

also in the early days on these groups spam and massive controlled astroturfing account groups was pretty rare.

this is like losing 15 years of ancient Sumerian writings in a very interesting early time for the Internet.


IDK if this is any help, Verizon is holding their annual conference On Dec 10th (less 2 days away as of this writing), with C.E.O. Hans Vestberg presenting at 12:15 EST.

https://www.verizon.com/about/investors/ubs-global-tmt-confe...

Maybe someone can pipe up at the conference.


I'm confused why Archive.org is attempting to archive and expose to the public what is essentially private communications?

My usage of Yahoo groups in the early 2000s was mostly to communicate with my high school / college / dorm groups and the last thing I want is for embarrassing messages from 20 years ago sent to a private group to be archived.


Clarification - we're not archive.org. Archive Team and Internet Archive are completely separate.

And we're only archiving things that "any guy on the internet" can see. If someone can access the messages simply by joining a group (with no moderator approval), I'd argue it's fair game.

We're not going to be unreasonable, though. If something private slips through and we receive a takedown request from the author, we typically remove it.


Just wondering, if the author of a post or the administrator of a group makes a takedown request regarding non-private* info, would you delete it?

* and by that I mean something that is not a telephone number, an address, a real life name, and other similar things.


Also, I am looking at https://www.archiveteam.org/index.php?title=Mastodon and some jobs there (such as berries.space) give 404 when I try to download the data. Do you have any idea why that could be?

But the data will be going to Archive.org, correct?

Many groups are public. Many have owners that requested the group be archived.

Yahoo Groups can be configured to allow public access.

They are only archiving data on publicly accessible groups - many of which contain lots of discussions worth archiving.

> what is essentially private communications

If they have access to it, it is not private.


What was the original plan exactly ? Subscribe to as many groups as possible and then wait until the last moment to grab the data ? That would almost certainly have resulted in massive bandwidth problems and massive bans by Verizon in response at this point, failing the archival effort anyway.

The current administration put Verizon’s chief counsel into the position of FCC Chairman. I would not expect Verizon to answer to anyone.

Also, it is shame that the person in direct contact with Yahoo over this is sending angry emails in all caps. The Internet Archive deserves better.


I agree on the first point. The second is perhaps understandable if you read the whole exchange. You know they initially gave us 13 days before they cut off storing any more of the group emails (that is, new emails)? With an outcry, they increased that to 20. Many thousands of people were scrambling to find a new home. We are now reaching the end of the line (the last week) before the archives themselves are gone, and they have blocked the main concerted attempt to save some of that history. So, some level of frustration is in order.

A lack of emotional control is usually understandable. But it suggests a lack of care and focus that does not befit an important effort. I learned years ago to never send and email or text or to make a call when angry. I always thank myself the next day when I am able to choose my words more tactfully. That email makes them look like a group of angry trolls.

What is Verizon’s motivation for taking steps to prevent it?

GDPR, CCPA, <insert other regulation>, all are possible reasons to throw their hands in the air rather than do the work / endure the possible risk.

Very possibly the timing isn’t a coincidence, being CCPA is about to take effect.


I don’t know why your comment was downvoted. Very legitimate reason.

Cost savings, plain and simple. Less bandwidth, fewer servers.

That only explains a decision to take them down ITFP. What can be gained by shredding all that information? Maybe they aren't shredding it. Maybe they just want no publically available copies to exist.

It explains both. Blocking archiving will save a bunch of bandwidth as well as not having to scale up the servers for the load of dealing with the archiving.

Yes. The archiving was starting to take a lot of bandwidth. Some care had to be taken not to bog down Yahoo's servers.

Less disk space being used maybe?

This is a wake-up call to the entire world: we cannot take internet history for granted. We need affordable, decentralized means with long-term economic incentives to archive the digital world.

In a way, the digital world is far more fragile than the physical world. And the time to solve this is now.


Recently Verizon have blocked all of my yahoo accounts. I've spent some time trying to find any kind of support form to get them restored with no luck. To get support you need pay money now. Perhaps, Archive.org accounts fell under the same ban.

We have groups that the owners can't even access any more. They demand a yahoo email even when there's a non-Yahoo email associated with it. Y-Groups has been badly broken for some time.

Verizon has stated in support emails that they were aware of Archive Team's efforts and specifically will not be un-banning our accounts.[0] I therefore think it likely that the banning was targeted.

[0] https://modsandmembersblog.wordpress.com/2019/12/08/verizon-...


I wonder if a small scraper script that an existing member of the group could download and run under their existing, valid account, would work?

Like a TCL language 'starpack' , a single binary Go program or something else?

Can you script a browser to do some crawling for you?


There's way too much data. Many group owners did not know of the shutdown (Yahoo was negligent regarding informing owners), and even if they did many group owners have little or no technical capability. That's why so many requested of the Archive Team that their groups be archived.

1000 members of a group each downloading 20 messages every 4 hours (a dribble that wouldn't be noticed) can pull down ... 20k messages in 4 hours.

That's exactly what they're doing, and then they get CAPTCHAs and accounts banned.

Like.... Selenium or puppeteer?

yahoo is going to keep the messages but just delete the art and other uploads or attachments to the messages correct? although apparently they will make some groups private as well essentially closing access.

Such a pity we lost gmane.org.

Lots of knowledge gets lost these days.


I find it curious that at the same time we discuss the 'right to be forgotten' laws there's also the opposite problem of preventing the internet from forgetting something.

That's why I'm not a fan of those laws --- in addition to the fact that in practice they turn into something more like "right to rewrite history".

'right to be forgotten' laws are the result of the whole "numbers have owners" insanity, combined with the fact that the average person will mindlessly use random services to store private data.

Yes! They're both really interesting problems and there's no right answer! You're right to be curious, it's a fascinating set of issues.

I am a self interested party, but I’m personally glad since there’s a post in a Yahoo Group that’s findable through Google, that would absolutely ruin my reputation and life if discovered.

Individuals have always been able to delete their own posts. If you log in there, you can still do it (before the 14th).

Also, see betamaxthetape, above. If anything is archived, they will respond to takedown requests.


Forgot the password to the account.

I see, yes. They have been negligent in many ways. Can you contact the owner of the group or a moderator and make the request?

(We even have groups whose owners have been locked out. Who do you contact at yahoo?)


I don’t want to draw any attention to the post. It’s made it ~15 years in silence so I don’t want to make a change of strategy.

what a PR mess for verizon

[flagged]


Don't do this, but if you do, please wait until after the 14th.

That would be illegal and wouldn’t help regardless. They’d just shut it down earlier…

Why does everything need to be archived? Why can't the stupid things I said 20 years ago in a forum just vanish someday?

(I never posted there but you get my point)


It's not stupid. There were serious groups using that platform. While I never thought it was a good idea, they nevertheless did. My personal concern is community driven medical/biohacking research groups that go back to at least the late 1990's.

> Why can't the stupid things I said 20 years ago in a forum just vanish someday?

Because some might want to read them or use them in some form.


NSA might have a copy just for your concern



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: