Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Why doesn't archive.today get shut down?
21 points by PaulHoule on Dec 7, 2022 | hide | past | favorite | 39 comments
So far as I can tell you can use archive.ph to bypass paywalls on most news sites.

Scientific journal publishers have gotten crackdowns on sites like sci-hub, music companies got Napster shut down, there has been a continuous game of whac-a-mole against torrent sites like the Pirate Bay.

archive.ph never seems to be the focus of a controversy, I never hear about anytbody trying to shut them down, they don't even seem to be struggling with technical countermeasures against their paywall bypass.

Once in a while you see a crazy rant like

https://www.vice.com/en/article/ypw5mj/dear-gamergate-please-stop-stealing-our-shit

but there is no real movement against this site.

How do they get away with it?



Sometimes I wonder why people don't have the foresight to reserve posts like this for their personal thoughts. Don't ruin a good thing with speculation of its demise.


Security through obscurity never lasts.


It does, though, up to a significant point.

I, and people I knew used Z Lib for years, and never wrote peep about it in public or mentioned it to people I did not trust. A lot of people used it this way.

Finally dumbos of tiktok made it viral. And the rich publishing industry reached court, and also made some calls, I guess.

Publishers, big tech desktop software vendor know about piracy channels, and they tolerate it up to a point. But when that starts to get viral, they make calls, meet people, and go through proper channels.

Collective self-vigilance at scale do work, and maybe because of the companies don't care at a small scale.


Why would it get shut down? Companies like Google and Cloudflare do the same thing with AMP based re-hosting of websites without consent. Some site signs up for AMP for it's domain, then any links from that site to other 3rd party sites get sucked up by Google/Cloudflare and re-hosted as an AMP site on their servers without the 3rd party site getting hits from the actual people clicking links.

Re-hosting is not a crime. I mirror sites locally and put them up on a domain subdirectory all the time. I've done it since before the web was commercial.


Caches operate on the very edge of safe harbor, and they don't modify content, operate automatically, and remove/update to match upstream. Operating as a dumb part of network infrastructure is why they're legal, much in the way that your network switch doesn't commit copyright infringement when it forwards packets down the correct port.

It is also worth noting that not everyone agrees that AMP is legal. It has been the subject of lawsuits.

If there's a human picking content, and that content doesn't match upstream, that may be well enough to step outside of safe harbor provisions.


Wait, what? So I can run a NYTimes2ElectricBugaloo.com without permission and not get into trouble? That... can't be right.


> Re-hosting is not a crime

Depends on the jurisdiction, but copyright-infringement usually is a crime.


Maybe it's difficult to justify a takedown if simultaneously they're responding to requests from search engines with full content for indexing, as part of their business model. Either the full content is available to be indexed, or it isn't - which is it?


It's also a lot harder to actually shut a site down if they care to actually not be shut down, see: KiwiFarms for example. I've never seen a more coordinated attempt to remove a site from the Internet and yet they're not only still there and appear to be quite active, but their infrastructure seems to be stronger than ever as a result.

So it's damned if you do, damned if you don't.


I think KF is a bit of a special case since the only reason it’s still floating is because Josh Moon is a pretty competent network engineer and webmaster and has pretty much decided to sacrifice his entire professional career and tens of thousands of his own money in keeping the site alive. Most controversial sites just end up throwing in the towel after they’re kicked off of GoDaddy.

To be clear, I’m not defending Josh’s character or the content on KF, just trying to illuminate something I find fascinating.


My local newspaper will issue copyright takedowns on Reddit when they see archive.org links to their articles. They are owned by McClatchy so I assume this applies to all McClatchy newspapers.


archive.org is the wayback machine

archive.today is another archive website that somewhat flies under the radar.

first rule of archive.today club is you don't talk about archive.today club.


There have been attempts to block it, that's why it has so many domain names. I think one of the earlier ones was archive.it, but it no longer goes there.


I suspect that many sites ignore it because it helps get attention. No one is seriously going to browse a website that way, so an odd article for free won't harm them and may help gain an audience.


Journos tried to bad-mouth archives and especially archive.{is,ph,today} back in 2014 and have been ever since because it shows how much you stealthily edit articles to hide the mistakes you make (lies you tell).


archive.today (and WayBack Machine, and many others) doesn’t do anything supernatural. It can get a full article because publishers let it. And that’s the main reason no one’s shutting down it.

Try it with Elsevier and see that they won’t give an article for free under no circumstanses. There’s no free copies of any of their articles on archive.today.


Go figure, archive.ph / archive.today is not working for me as of 12/8/22. Is anyone else having issues accessing the site?


To the best of my admittedly limited knowledge on the subject, they're relatively new so most news sites aren't terribly aware of them yet, and most people aren't technology-savvy enough to be aware of their existence and ability to make use of them outside circles like ours here on HN and related occupations/sites/etc. That said, they're also not doing anything shady to get by the paywall either. The version they're presenting you is the exact same version the news sites are presenting search engines in order to grab search engine traffic. The news sites are literally participating in an intentional bait-and-switch scheme to bait people with relevant search results that are NOT paywalled, then when a human browser gets there, they throw up a paywalled version in your face via user agent detection, mandatory javascript, etc. (various means are used). archive.ph simply mimics a search engine indexer to get an un-paywalled version, same as Google or any other search engine, in order to retrieve the cleaned up version of the article without a paywall there, and serves that content to the end user. It's not stealing content not already offered in other forms anyway, it's just removing an artificial dark pattern that's literally intended to bait and switch people in the first place. Kind of makes for a weak argument if they do bring it to court in the first place; glass houses, throwing stones and all that.


> they're relatively new so most news sites aren't terribly aware of them yet

It’s been around for many years. Here’s a bit of the history: https://twitter.com/archiveis

(If anyone is feeling generous, Archive.today accepts donations at https://liberapay.com/archiveis/donate . The “Donate” link on the site header also links to that URL.)


Testimonial: I've observed mere users on Facebook using archive.ph and web.archive.org to make "backups" of pages they think will be taken down. Not wide usage to be sure but definitely outside of the techie/HN type community.


Wikipedia says they have been around since 2012

https://en.wikipedia.org/wiki/Archive.today

My guess is that people in the media are pretty aware of how to bypass paywalls because they have to do it to do their jobs (it used to be you could log into most of them with "media/media"; even The New York Times admits that reporters don't get paid enough to afford subscriptions to all of the newspapers that they need to use for fact checking and investigation.)


Most "paywalls" can be bypassed with a clear-cookies or incognito window. They're not actually strong walls, they just want to be annoying enough that some people signup.


It's hosted in Russia I think.


There are a lot of pirate content sites in Russia. I remember one circa 2002 that would let you download MP3's, Ogg's and other music files by the gigabyte.


allofmp3?

I remember that. They sold the tracks for like 5 cents apiece, which was worth it to get consistent ID3 data and sane filenaming. (As opposed to Napster, Limewire, Kazaa, etc.)

But they rug-pulled me! I had about $3.00 in credits on their platform when they went dark. :)


Yep, that’s the one. It was kinda like a music download store like iTunes but it was not legit.


Bruh... delete this thread.


If a news site wants customers to pay for the content, then they should put it behind a walled garden and not paywalls which only apply to certain used-agents and IPs.


Why? Do people not have the freedom to give things for free to those they choose?


I see this as more of a suggestion to providers, not a rule to limit freedoms.


I think providers would put that suggestion directly into the round bin. Internet mass-media companies need to be discoverable in a search engine.


They do but they don’t get to also bitch about workarounds.


Why wouldn't they have the right to ask others to follow the law?


> The unfortunate truth is that when you give data away… you give it away [1]

[1] https://news.ycombinator.com/item?id=32686455


That comment is in the context of a discussion about data for which a license was granted or not necessary.


And the 9th Circuit Court has determined that publicly accessible data does not require a license and the provider cannot dictate the means in which a user can access it. If a provider wants to do that, they need to place that content behind a walled garden.

[1] https://cdn.ca9.uscourts.gov/datastore/opinions/2022/04/18/1...


> LinkedIn has no protected property interest in the data contributed by its users, as the users retain ownership over their profiles. And as to the publicly available profiles, the users quite evidently intend them to be accessed by others

> we cannot, on the record before us, conclude that those interests—or more specifically, LinkedIn’s interest in preventing hiQ from scraping those profiles—are significant enough to outweigh hiQ’s interest in continuing its business

These are key details of the decision you linked, and that's why that decision wouldn't apply to this situation.


Excuse me for interrupting, but what has access to do with publication? The problem that someone might have with an archive-site copying and publishing the content on the archive-site's domain isn't about the copying, it's about the publishing.

If all they did was archive but not make these archives publicly available, it would be different case -- if it was a case at all, because nobody would know that they're doing it.


[dead]


doesn't work with lots of sites.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: