I fail to see how blog posts (which itself is a perfect example of the content to be silently altered or wiped) get the performative power which one could rely on
Yes.
There used to be expanded, but something went wrong. Fixed now
No
I have been told that “Higbee and Associates” copyright trolls [1][2] (and probably their clones) have been used Archive.Today’s snapshots as “evidences” of “crimes” of their victims.
We are not associated with those guys and never heard about them before. Archive’s snapshots are not notarized nor use strong anti-forging technologies, so they cannot serve as legal evidences. Shortly, if you received a copyright claim with Archive’s snapshots as a proof - it is a scam.
There is the second floor: other guys (for example [3]) try to extort more money from the victims of aforementioned scammers for archive snapshots removal in order to protect them from further attacks.
It is a scam too, do not pay them.
3. http://archive.today/2020.02.17-183946/https://sumbit.nl/prijs.html
plurk.com? It works for me. Which page do you have problem with?
The Archive’s ability to preserve short-living content of social media turned it to a lovely instrument of troll wars (Alt-Right vs. SJW, Ukraine vs. Russia, …)
and although the Archive tries to be neutral to those battles, it was often under the fire of technical and social attackers.
The pattern of attacks resulted in our infrastructure became similar to those of Wikileaks, SciHub, 8ch or DailyStormer - many mirror domains, fast-flux IPs for ingress and egress, etc.
If there are attacks which have already been made against one of the websites in this karass, the rest have to be prepared.
Revocation of SSL certificate as the result of some social attack is very likely, so I would even argue for using plain http in links to the Archive.
This https://www.reddit.com/r/Twitter/comments/ce1bea/reverting_back_to_the_old_twitter_interface_for/ ?
Just enabled, let’s see.
Yes, approximately 3-6 months. Useful for debugging and to track spammers.
The logs are not archived to the storage, as they fill the webserver’s disk space they are deleted
It never worked with PDFs actually.
It used to prefix links to PDFs with `http://webcache.googleusercontent.com/search?q=cache:` so a poor google cache’s PDF-to-HTML converted did the job.
But that approach had obvious drawbacks:
1. low rendering quality
2. many PDFs are not in google cache, and this hack does not for them
Examples can be seen here here: archive.today/http://webcache.googleusercontent.com/search?q=cache:*
If it is what you want, you can always prefix links to PDFs with that magic string before submitting to the archive
No.
I see the only reason to do it - to improve performance by forcing http2. But in case of archive, the performance bottleneck is not the network, but the speed of spindle disks.
On the other hand, there are two drawbacks of forcing https:
1. for the bots it is harder to support SSL (for example Perl does not
include SSL libraries by default).
2. certificate authority is an additional point of failure which
could go mad: there were cases when SSL-certificates of controversial
websites have been revoked.
Yes, but it obsolete, it is not so since December 2019′s big update.
The idea of passing client’s IP in X-Forwarded-For was to let the server provide to the archive the same localized version as the client has seen, It worked in 2012, but not in 2020.
It is well recognizable as an interstitial page, caused by too many requests. Explaining this message would require too many words which no one would read. An orange page with captcha on the left does it instantly, as an hieroglyph of the universal Internet language.
No, it should work. There were no outages in the last days.
api.twitter.com responses with “429 Too Many Requests“. It seems I need more twitter accounts
They never worked.
PDF support is in my TODO list, but it is not implemented yet.
So far you can use documentcloud.org or archive.org to store PDF
No
Webpages appear instantly in browsers, so people wonder why archiving took dozens of seconds, sometimes 3-5 minutes.
There are many reasons:
1. The instantly loaded page might have nothing but “loading” spinner, so there are intentional delays.
2. Webpage might have pictures loaded lazily, only when then user scrolls page down. The archiver scrolls the page here and there to load those image, even if the page lacks those lazy elements: it just has no idea so it makes a pessimistic assumption.
3. Webpage might have have analytic scripts which invisibly work in the background. The page looks loaded if you look at the screen, but it is still loading if you look at networks events. It make difficult to detect the moment when the page loading is completed. Even more, there are pages which do not stop loading at all (news feeds, stock market charts, …)
4. Archiving process has more steps than just load a page. It is better to compare with loading a page and then send it to paper printer.
No, both version will be in the archive and linked with <-prev next-> links
Yes, the update broke many things which have to be restored