Avatar
Archive.is blog

September 2021

What percentage of 5-char-codes is used now? Full capacity is (10+26+26)^5 is above 900 millions, it is correct number?

A bit less than a half. Afterwards the codes will be 6-char, 7-char, ...

Sep 03, 2021

0 notes

What is time zone that is displayed for snapshots? Thank you

UTC

Sep 02, 2021

0 notes

Is there a reason why there is so much traffic from Japan to this site?

Maybe because it has localized version (there are only three more or less good ones: Japanese, Korean and Polish), or just popularity in certain communities.

Sep 01, 2021

0 notes

August 2021

When archiving "wayback vefsafn is" the carrusel of dates appears at the beginning (in "web archive org" that doesn't happen). Example: pSP7g. Maybe it can be removed.

yes

Aug 31, 2021

0 notes

Why is access blocked in /t77dr? Can you remove the block? Thanks in advance

yes

Aug 31, 2021

0 notes

Thanks for removing the pop-up window and display full content. Can you make the links within the archived page (especially "THIS REVIEW IS HELPFUL" and "Flag as inappropriate") still hoverable and clickable? See the original pages for details. ysKuG, UmDJf, 5esIa.

No, JavaScript is not stored. It's hard to allow it to run correctly on snapshots.

Aug 29, 2021

1 note

Could you fix GameSpot GameFAQ archives? For example: P6r0D and 3mj5i

yes

Aug 28, 2021

0 notes

Could you incorporate as a charity so that people can make tax deductible donations to you?

In which country?

Aug 28, 2021

0 notes

Is there any structure in place to assure the continuity of this project? I have seen you answer to a similar question that you "can leave a will". I am genuinely concerned about the loss of data archived here, were you - God forbid - to be hit by a bus or something.

I had an email provided by my mobile operator. I considered it very reliable - even if it was hacked, I had a paper contract, I could visit their office to restore.  So I used it in critical places: in contracts, as an email to register on all sorts of utility sites, as an email to restore access to various services. After 10+ years, the mobile operator decided it wasn't their business and shut down the email service.

What structure in place could assure the continuity?

I'm not even asking if there was one in that case (and if there is one in GMail), I just don't understand what could be such a structure to assure continuity of any service. What should I have demanded to avoid data loss?

Archive.Today was born precisely out of an understanding of the fragility of services from which it itself is not immune. And this is not a project to transmit information to distant descendants, it is a project to let latecomers be witnesses. Having the magic to assure continuity, I wouldn't need to invent such weak tools.

Aug 28, 2021

4 notes

What is your perspective on creating a new page for "trending" and "most popular" snapshots? Or is that something that can lead to disasters?

It can be used as a news stream (that's where I get my news), but there's a lot of garbage too. Also, trolls can promote disgusting pictures to the top.

Aug 26, 2021

1 note

If you run out of money, will Wayback Machine take over the backed up pages? It would suck if they all just disappeared. Big fan of service BTW

I can leave a will, but how can you be sure that they will be happy to receive it and that they will dispose of it in such a way that you will like it?

Aug 26, 2021

0 notes

Question - How the heck does this archive retrieve saved pages from many years ago so quickly? Is there some sort of CDN being used?

No, there is no special optimization, simply because copying large amounts (e.g. to CDN edges) would take weeks or months. But. It just so happens that older archives are now running on more powerful servers, and there are fewer requests for them than for newer ones. So I'm getting both complaints about the site being too slow and wondering how it works so fast at the same time :)

Aug 25, 2021

0 notes

Can the pop-up window be removed? Also can the 'Read More' buttons be clickable to expand the text content? ysKuG, UmDJf, 5esIa.

yes, fixed locally, the fix will be deployed in few hours

Aug 25, 2021

0 notes

If you ever get another Instagram/Facebook login credential, you may want to limit the number of exit IPs used by crawlers to just one, so that it looks less suspicious. That might be the reason why the accounts created in past might have gotten banned so quickly.

It is so since long time ago (and still so for Linkedin, DeviantArt, VK, OK, ... and other sites which do eventually ban but are not so paranoid as FB last years).

There is multi-exit VPN using patched Wireguard, which control exit IPs for many websites (it is not only for accounts: many US local media need US IP, etc). This could be an interesting product itself: to avoid seeing “this website is not available in your region“, and yes, to protect accounts from being banned when you are traveling.

An analysis of recent bans of my accounts has convinced me that blocking occurs after visiting questionable pages in different languages from the same account. Apparently, FB algorithms believe that if a normal person reads fake news, then only in one language. Interest in such pages in different languages (from German to Marathi) can lead to the classification of the user as a data journalist or similar undesirable visitor who “do not follow community guidelines“ as they say

Aug 24, 2021

0 notes

If you find the time, could you please fix the body of text on 'EOZ73'? The older versions of The Washington Post seem to have this problem. Thank you kindly.

yes, it seems that WaPo has to be added to the list of slow websites which require pressing F5 after the load. There is more than one snapshot similar to this. I'll fix it in a few hours (there is compilation is in progress).

Aug 24, 2021

1 note

Getting a 403 forbidden error when trying to visit the site, and Down for Everyone or Just Me says the site is down.

Meanwhile people are posting links on Reddit, Reddit is able to download preview images, etc: https://www.reddit.com/domain/archive.is/new/

Also, there are at least 2 services named “Down for Everyone or Just Me“, one in .com and another in .me, and they tell different things.

Aug 23, 2021

0 notes

Is there any word on when the site will be back up/why it went down?

it is not down

Aug 23, 2021

0 notes

For /V0IKY, may you make the crawler wait a bit longer before saving the page so the data being queried can show up? It currently shows the screen for when the information is still loading. Thanks in advance.

Yes. Actually, waiting a bit more does not help. Pressing F5 does.

Aug 22, 2021

0 notes

In 'Reviews' section of archived pages (A few examples: CKEMU, UsI2N, Q55Ti), 'Read More' buttons are unclickable. Can comments be expandable when clicked?

yes

Aug 22, 2021

0 notes

In an anonymous post yesterday, I asked, "I am interested in donating to support Archive Today but don't see any way to do that." I have not yet seen a response.

there is a "donate" button on every page, so the question seemed like a joke

Aug 20, 2021

4 notes

Could you fix the outlink redirect on tiktok? /HHlSN Thank you so much

yes

Aug 20, 2021

0 notes

Currently Archive is available at archivecaslytosk.onion/ Onion services v2 will be deprecated soon Please generate and publish v3 address

Could you tell, where did you find archivecaslytosk.onion ?

Wasn’t there archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion nearby ?

Aug 20, 2021

0 notes

I am interested in donating to support Archive Today but don't see any way to do that. Also, it seems as if Archive Today is not working through the Firefox browser. Seems to work through Chrome though.

I am using Firefox

Aug 19, 2021

2 notes

I do recommend trying to invest some time to try and mess around with the webmaster tools for crawling by Google, Bing, Yandex, etc. There was many valuable pages saved on this project, but what's the use of them if people don't really know about it. An analogy would be public libraries during pandemic. People were unable to access such valuable information, until it was made available through Internet Archive Open Library. This analogy can be discredited through violation of copyright, however.

Own index (using ElasticSearch or Vespa.ai) looks more promising/realiable.

Even in those days when their index coverage was much better than now, there were various other problems: for example, porn snapshots were ranked much higher than the rest.

Aug 19, 2021

0 notes

Will you cut it out with "Error 503 Service Unavailable"?!

Backend was restarting (after crash or upgrade).

Should I make that page more funny like Reddit’s 503: http://archive.is/VnTMO

Aug 19, 2021

2 notes

What is the trickiest site -- LJ! Large discussion not expanded again: Muiw5

fixed

Aug 19, 2021

0 notes

I'd like to also report an issue with the captcha page timing out when trying to submit / view articles. The page will partially load and then just fail, or it will give a connection timeout error without loading anything. I've tried using different DNS nameservers and tried using a friends internet connection on a different provider but the issue keeps persisting. Is there any chance you could use CloudFlare to reduce the number of bots trying to abuse the service?

No, there was a strange story there: all my domains were already banned by Cloudflare when I tried to add them to the CF control panel (I don't know why, I never used CF before), but it happened that one archive user got a job at CF and solved the problem (although he didn't reveal the reason for the ban). He worked there for about 1 year, the day after he quit CF banned the archive domains again, resulting in 10 hours of downtime. This was long before the drama with DailyStormer and 8chan, so it went unnoticed: there wasn't a thread about CF fighting indie sites back then.

Another example of why money cannot buy a quality service if there is no personal connection and mutual interest.

Aug 19, 2021

2 notes

Can you expand the comments on Invidious pages like z2VYw ? Thanks!

yes

Aug 19, 2021

0 notes

The entire bodies of text in both of these articles — 'z1MnF' and 'ssbHy' — didn't save. This seems to be an issue with Defense One, specifically. It happens on every page that's archived. Can you fix them? Thanks for everything you do!

yes, fixed. the adblocker removed too much

Aug 19, 2021

0 notes

For the past several days I've been unable to even reach the captcha page. After several minutes it loads partially, not showing the full elements of the page, and not allowing me to check the mark or to proceed. Not sure if I'm doing anything wrong. Same thing happens when I try to search for the link rather than submit link. Thank you

Please provide information about your network settings

Aug 19, 2021

0 notes

Instagram posts/profiles no longer working, redirects to /accounts/login/ even for content you don't need to login to view: Example: /R7mNt

There is no Instagram content which don't need to login.

If you can access the page without login, it is sort of “promo preview“, after few pages accessed this way, they add your IP into “promo is over“ list and will redirect to /login on every future request.

I just have not enough fresh IPs to abuse this mechanism.

Barinsta-style looks more promising, if it still works.

Aug 19, 2021

0 notes

Would you mind adjusting the body of text in 'sTek0' so it is centered properly? Thank you kindly.

yes

Aug 18, 2021

0 notes

Can we do anything to help financially support the site? I've noticed that it's been slower in these past few months and would like to help keep the site fast. (I've even gotten timeouts when just loading the front page, not even loading archives / submitting new ones)

I do not think.

There will always be crawlers, submitters of SEO doorways, etc., ready to consume all the proposed resources, no matter how large they are. That is, there will always be some pessimized group, by IP, by DNS, by some other features, which will face slowdowns and excessive captchas, and ordinary users will fall into it.

Accounts do not look like a good solution too, it just shifts the same checks into the time of account creation: just try to create a Google/Facebook/Instagram account using VPN or Tor IP - there will be tons of captchas, timeouts, and vague “server errors”. Difficult account registration would prevent from quickly saving hot content, people will simply give up at this stage.

Aug 18, 2021

0 notes

Images attached in a forum post at HiAvM doesn't get archived. Could you look into it?

yes, fixed

Aug 18, 2021

0 notes

what is possibility of creating sitemap to allow for better crawling of saved pages for display on major search engines?

It is there, although in non-standard location submitted via Webmaster Tools. The problem is since 2018 there are almost no visits of search engine bots (one exception is bingbot which crawls like crazy but does not add pages to the index anyway, so I have to block it when it is too active)

Aug 18, 2021

0 notes

Recently I noticed "captcha while reading": I read some long page (book, long LJ discussion, etc.) and earlier archive kept my read position when reload page (close browser and reopen, etc.) But with recent introducing captcha request while just reopen (previously opened) page my read position not retained. This is quite uncomfortable. Please if possible do smth with it.

Captcha should not be there to read the page, only to submit.

The only exception is countering overly active crawlers which make significant load. That means captcha-to-read is almost permanently shown to Amazon, Azure, GoogleCloud, DigitalOcean, Vultr, Linode, LeaseWeb, WorldStream, ... etc IPs, where are the homes of many crawlers and VPN exits.

Aug 18, 2021

0 notes

Can you please remove the cookie box on WLoSO. Thank you

yes

Aug 17, 2021

0 notes

The twitter page /1eLZW did not archive correctly. I think it might be an issue with the old format request added to the header. Could you look into it?

yes, it is available only in new design; fixed

Aug 17, 2021

0 notes

Can you please remove the cookie box on roBGz. Thank you

yes

Aug 17, 2021

0 notes

Can you delete the donation popup in 'yp1vL' at the top of the page? Thanks in advance. Not sure why it archived it for this specific page. It's never archived that popup before.

yes

Aug 17, 2021

0 notes

Is there any free or paid way to archive 1000 links with the help of coding like Python?

Aug 17, 2021

0 notes

The footer on qG3Ly is, for some reason, near the top, covering part of the page. Can this be moved down?

I see no problem there

Aug 17, 2021

0 notes

Why is archive is so slow? It's been incredibly slow for about a month now in the UK. It seems more like a Cloudflare issue as even the bot check screen (Attention required/One more step) takes about a minute to load.

The choice to use Cloudflare is yours, not mine. You could opt-out by changing DNS from 1.1.1.1 to something else.

Aug 17, 2021

0 notes

Hello, have you thought about selling your project?

No.

In case of a money shortage, a more aggressive (Wikipedia-style) asking for donations might help.

Aug 17, 2021

0 notes

Is there a way to alter the archiving on this website to wait/include comments? I3spw

yes, fixed.

Aug 17, 2021

0 notes

For long time I noted that some branches in some expanded LJ discussions are present multiple times. This was not fixed with last fix ("expand"). Is it possible to investigate? See 2-times-branch "August 11 2021, 05:51:28 UTC", 3-times-branch "August 11 2021, 05:55:20 UTC", 4-times-branch "August 11 2021, 05:57:58 UTC" in 4mvnC

yes, there are some invisible "Expand" buttons which have been clicked too

Aug 17, 2021

0 notes

Is it possible that this project will become a permanent service like Google?

Is it a joke about 240 services https://killedbygoogle.com/ ?

Aug 17, 2021

0 notes

Are Google and Imgur links forbidden from being archived?

No, which links?

Aug 17, 2021

0 notes

Again not expanded several branches of LJ discussion (also note "More" buttons there): dmJKp

yes, the fixed version is not yet deployed to production. I rearchived that page manually

Aug 15, 2021

0 notes

Is it possible to expand all comments in substantial LJ discussions? Example (see below there): QR3I1

yes. There was a bug misclicking on some EXPAND buttons

Aug 13, 2021

1 note

You said that before you die of old age you would implement a download zip of your whole site. That's fine but links to archived pages will still be broken if you die if you don't have someone to follow in your footsteps to maintain the site because the site will go offline or somebody will buy your expired domain name using it for another purpose. Do you have plans for someone to take over your site? I have thousands of archived pages, don't want that work to go to waste.

I do not think there are many people willing to maintain such a project, which is also unprofitable. All 4½ projects over there - (IA, Archive.today, Megalodon.jp, half-suspensed WebCite, and paid Pinboard.in) look running on energy and money of a single person each and likely will be greatly changed or shutdown by the heirs.

I could only advise to save everything locally to sync your documents with your own lifespan. Do not rely on clouds.

Aug 13, 2021

13 notes

Recently, no matter which domain and browser i am using to retrieve or archive webpages, archiveis takes AGES to load. What's going on there?

DNS?

Aug 13, 2021

0 notes

Is your code open source? I’d like to use and modify it for a different purpose. Im assuming you used headless chrome but i’d be curious to know what else you used.

There is chrome, but not headless, just patched to add more functions to the remote debug protocol.

Aug 12, 2021

0 notes

Hello! Apparently ArchiveToday is having problems archiving Facebook pages. For example, here: archive ph Zdscy - instead of page's content the message about "Accepting Cookies" is displayed, in French. Also, ArchiveToday gets redirected to FB's login page during archiving when trying to archive full (non-mobile) pages. Hopefully, something can be done about it and ArchiveToday will be able to save FB pages again.

`https://blog.archive.today/post/658859868034318336/bonjour-sauvegardez-vous-des-comptes-instagram`

Aug 12, 2021

1 note

Hello! Apparently ArchiveToday is having problems archiving Facebook pages. For example, here: archive ph Zdscy - instead of page's content the message about "Accepting Cookies" is displayed, in French. Also, ArchiveToday gets redirected to FB's login page during archiving when trying to archive full (non-mobile) pages. Hopefully, something can be done about it and ArchiveToday will be able to save FB pages again.

https://blog.archive.today/post/658859868034318336/bonjour-sauvegardez-vous-des-comptes-instagram

Aug 12, 2021

1 note

Because you have multiple domains, can you tell me which TLDs are the safest in terms of takedowns (DMCA etc)?

.today, because it has no content, only redirect to another domain

Aug 12, 2021

0 notes

You should use hCaptcha instead of reCAPTCHA. It's free, just as effective and way less annoying for the user.

No, it is much more annoying.

Aug 12, 2021

1 note

For the past couple of days, when I try to save pages, the Attention Required! page for captcha is either blank or the captcha header is loaded but not the rest of the page or captcha. I have to reload the page several times before it either goes back to the save page where you enter the url or captcha prompt works. This is annoying.

Please supply such reports with URL the page you accessing, your IP and IP of the server you connect to.

Aug 12, 2021

0 notes

is it possible to save a site continuously with a periodic schedule?

No

Aug 12, 2021

6 notes

is it possible to save a site continuously with a periodic schedule?

No

Aug 12, 2021

6 notes

Hi, do you know what font is used in plain text archives (images), such as qnqmT/image ? Thanks!

Looks like some programmer’s font leaked from the workstation: JetBrains Mono or PragmataPro Mono

Aug 12, 2021

0 notes

Is the website down? It's very slow. downforeveryoneorjustme is showing that it's down as well.

No, it works.

Aug 11, 2021

0 notes

If possible, would you mind fixing 'RQPbP'? The archiving process didn't save any of the actual text from the article. It's basically just a blank page. Thanks in advance (if you can help).

yes

Aug 11, 2021

0 notes

Can you please fix /3qBql, thank you

yes

Aug 11, 2021

0 notes

Will archivecaslytosk.onion be upgraded to Onion version 3.0 site??

archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion

Aug 11, 2021

0 notes

Could you fix iqoly please?

No. I (as a human) see infinite loop “checking your browser“ too. Try archive.org or Google cache.

Aug 10, 2021

1 note

Expand all on vPjiq?

yes

Aug 09, 2021

0 notes

Image omitted at "npYKs" , and re-archiving is not working (just shows npYKs again and again) . Could you help this?

https://archive.is/MZ8mK

Aug 09, 2021

0 notes

Cannot archive anything. Says "Server Error" Anyone else having this problem?

Just out-of-memory.

I will fix in few minutes.

Aug 09, 2021

0 notes

Why is it so tediously slow? I have to use a VPN every time I have to snapshot a page. Today I got an error message saying "this device is paused".

Can you provide more info about your network ?

Aug 08, 2021

1 note

Bonjour, sauvegardez vous des comptes Instagram ? Si oui, depuis combien de temps ? Combien de comptes Instagram sont présents dans vos bases de données ? C'est pour un travail de recherche en Master 2 Photographie. Merci de votre aide, cordialement, Julie T.

I found an 10 years old Facebook account (which is valid for Instagram too). It seems that Facebook blocks old accounts only for few hours on untypical usage, not forever as recently registered accounts.

So far it works, but sometimes it results in a snapshot with “Bloqueo temporal. Parece que has usado de forma indebida esta función por ir demasiado rápido. Se te ha bloqueado temporalmente y no puedes usarla.“ instead of the content.

I have no fallback accounts to handle this.

If someone can donate old FB accounts, it would be helpful (I know there is a black market, but I am not ready to pay to keep FB/IG archiving active).

Aug 07, 2021

0 notes

This page got while livejournal archiving: czS2f

fixed

Aug 07, 2021

0 notes

I've noticed that the captcha renders empty when the archive URL has a percent notation in it, e.g., archive/7wmKm#42% when a captcha is required doesn't have anything in place of the captcha.

yes :(

technically, recaptcha is correct failing on this, http://archive.today/7wmKm#42% is not a valid url, it should be http://archive.today/7wmKm#42%25 following the standard, but #42% was there before recaptcha.

It seems that I have to copy recaptcha.js from google servers and fix it locally.

Aug 07, 2021

0 notes

Could you open all the "Learn More" sections in gJKLQ?

yes

Aug 07, 2021

0 notes

On kTcp6 you are redirected to an adblock as of today. Can you please commit this, but keep this block once. Thanks

Fixed. We updated Chromium recently (85 -> 92), this caused changes in interaction with some sites

Aug 05, 2021

0 notes

Can you please remove the cookie box on mCmko. Thank you

yes

Aug 03, 2021

0 notes

Show all depot files on SteamDB like on D93hv?

Fixed for existing snapshots. If you are going to save more pages like those, it won’t for them until tomorrow’s browser update (I have it locally but not on the servers yet)

Aug 03, 2021

0 notes

Could you please remove the cookie box on /rZEkc and /YbFR4 ...? Thanks!

yes

Aug 03, 2021

0 notes

Can you please make the individual pages visible on N3rhD (#slide1, #slide2 ...). Thank you

yes

Aug 03, 2021

0 notes

Ability to skip captcha for "donors/subscribers"?

Maybe in the future. Currently, there are no accounts so no info on which visitor is a donor

Aug 03, 2021

0 notes

Will the code ever become open source? I'd like to run my own instance if possible.

Unlikely. It has too many hardcoded things specific to my installation. From the type of hardware (like ”that server is too old that it requires kernel-4.4 with a specific patch”) to using a quite exotic operating system.

There is plenty of open-source software in this area: https://github.com/iipc/awesome-web-archiving

Aug 03, 2021

3 notes

Could you close the math input popup and the cookie popup for dhySF?

yes

Aug 02, 2021

0 notes

Can you remove the pop-up of /0ZhZY ? Thanks

yes (it will be deployed in ~10 minutes)

Aug 02, 2021

0 notes

Hi, can you remove the Convey popup ad from rJEO1 ?

yes

Aug 01, 2021

0 notes

Could I setup a site to automatically monitor Twitter accounts with over 10,000 followers or so & automatically backup new Tweets from those accounts? My website will use your site to make a backup of a Tweet as normal on your site then my site will have a page with a link to all that person's archived Tweets. I want to do this to catch politicians out who say bad things then try to delete their Tweets to erase the evidence. Is that only possible with AP access & does your site have API access?

Twitter will likely ban us. We often receive “429 Too many requests” answers from Twitter.

AFAIK, some library (Library of Congress?) does save all tweets in realtime, so you might use them.

Aug 01, 2021

5 notes

Thank you for your project! it's amazing and very usefull! It's appear to me that archived webpages are not adaptive to different screen resolutions (for example for mobile phones). Any plans to that direction?

I guess, no.

1. It is difficult to do reliable, as websites with mobile versions implement it differently. Sometimes it is the same page with separate CSS, but often different pages on different domains (like `m.facebook.com`)

2. Mobile versions strip some content. One might post a link to an archived page from a desktop citing some content, but a mobile reader won’t see it. Currently, everybody sees the same, at the price of mobile users with too small screens have to zoom in.

Aug 01, 2021

1 note

July 2021

I have noticed that Google hosted sites do not archive. Especially blogpost. In some areas the archive normally. Maybe due to where their nearest server is.

I do not know if it covers your case (if not - please provide mode details), but some pages on Google websites (groups.google.*, docs.google.*, ...) require account, and Google forcefully logs users out if they are using old (vulnerable) Chromium. I am going to upgrade Chromium in 1-2 days.

Jul 30, 2021

0 notes

Could you press "Not Now" on LHK9n?

yes

Jul 30, 2021

0 notes

Have you considered allowing small mp4s and webms (<3MB), basically gifs, to be archived?

Videos are allowed on some websites (Twitter, Imgur, Wired, MIT Technology Review, ...).

There is another obstacle besides the size: Chromium often returns broken data when video content is requested via DevTool Protocol (using Network.getResponseBody function). Probably they have already fixed the bug, I need to check.

Jul 29, 2021

1 note

Thank you for your project! it's amazing and very usefull! It's appear to me that archived webpages are not adaptive to different screen resolutions (for example for mobile phones). Any plans to that direction?

I guess, no.

1. It is difficult to do reliable, as websites with mobile versions implement it differently. Sometimes it is the same page with separate CSS, but often different pages on different domains (like `m.facebook.com`)

2. Mobile versions strip some content. One might post a link to an archived page from a desktop citing some content, but a mobile reader won’t see it. Currently, everybody sees the same, at the price of mobile users with too small screens have to zoom in.

Jul 28, 2021

1 note

Hi,My name is Conor from Infolinks, and I would like to purchase advertising on your site. Are you available for a call this week to discuss it? We represent dozens of national advertisers such as Red Bull, Barnes & Noble and Kia.I look forward to hearing from you and setting up a time to chat.

Do you have an email? Or just a signup form? Telephone conversations in English is not my strong point.

Jul 27, 2021

0 notes

As current patrons of this service, should we as users either: 1. promote this website to others to help preserve important websites, 2. Keep silent on our knowledge of this database/service, 3. Stop using the service entirely?

I’d say “no” to all three.

1-2. the website has been working for 9 years already and is quite well known (random people I meet offline know), it is hardly possible and hardly necessary to change anything much here.

3. why?

Jul 27, 2021

2 notes

Not respecting people's privacy, copyright laws, or the veracity of content on your website... Please tell us more about how this archive isn't being well managed and is doomed to die at any moment!

Of course, it is doomed to die at any moment (you should not have any illusions, as well as about the "veracity of content" on the Internet). The only idea is to hold back a little something that is doomed to die a little earlier. I hope that it is obvious after all the deplatforming dramas of the last months (disappearance of @realDonaldTrump, etc)

Jul 27, 2021

2 notes

will you remove a website archived from your site, under a formal DMCA notice?

Usually, no.

1. DMCA applies only to US companies. And it is not something they must obey, it is about providing safe-harbor to Internet companies if they follow. Not being in the US, we do not receive this privilege.

2. It is prone to bogus removals (there are studies on this topic https://www.google.com/search?q=ChillingEffect+bogus+dmca+notices). For example, Twitter has removed our logo from https://twitter.com/archiveis following a formal DMCA notice. Tumblr (where blog.archive.today is) received the same letter and ignored.

3. Indeed, relations with ISPs/registrars are slightly aggravated by ignoring DMCA notifications but voluntarily following them will not change anything. I disclosed a few days ago that even bare domains without any content were attacked and hijacked. A letter like “there are million stolen bitcoins on that server“ or simply something hysterical is more effective than formal ones. And much more often.

We are close to Telegram in this respect: illegal content is removed by requests of authorities (or when we can predict the position of authorities) but “I do not like it and want to shut down“ wrapped in a form that mimics a lawyer's letter - no. That undermines the idea of a webpage archive. If we follow this path, the first step will be to remove what we do not like ourselves.

Jul 26, 2021

8 notes

In the time of proxies the submitter ip is of valuable information only together with the exact time when the connection was made (so they can trace the connection to a specific ip). -- On the other hand the exact time of the page archived is an information of value. -- The question is how the link between ip-address and the connection moment can be scrambled.

If you expect such an orchestrated (me, proxy operator and your home ISP) action against you just for submitting a page to the archive, use Tor. Or proxy via Tor via proxy.

The scrambling task looks far-fetched. Sort of things that students do for grants, like removing racial prejudice from neural networks.

Jul 26, 2021

0 notes

Well, to be fair, it would also be a good idea to have someone take over archive is, if you do die of old age. I don't have a problem with one big zip file, but what if people want the site to continue? Maybe there could be people who love your projects and want to continue archive is. I know sites shut down eventually or get removed by ISPs, but Maybe you can start a ISP?

I heard a story on Minecraft guys frustrated by the unreliability of their domain registrar under complaint storm originated from their competitors so they established their own domain registrar (50k$ setup fee) to serve their own 3 domains. It failed within a year.

The problem is not that ISP/hosting/domain guys are motherfuckers, the problem is it is a high-competitive business saturated with fear, uncertainty, and doubt. They have to play the game “we are cute, it is our clients who are evil, we have already canceled them” or be canceled themselves (the fate of clients is unenviable as well as the “senior”’s). When you are the only customer of your own registrar you are an anomaly and more than vulnerable.

That is why I prefer national (two-letter) domains, at least a national registry has no angst of being canceled (a good illustration is archive.li’s drama: as soon as the Swiss national registry ceased working with end-users and passed them on to resellers, the reseller “switchplus ag” quickly canceled “archive.li” on the first anonymous complaint worrying about its newfound status of big reseller), and the only FUD to act here are personal ones of particular employees.

Jul 26, 2021

0 notes

What's the point of keeping tracking of submitter ip again? There is tech to evade giving out real ip, such as proxy, vpn, onion routing, etc.

Mostly to combat SEO spam, by clustering new snapshots by submitters. Spammers typically use the same VPN company, so patterns in submitter IPs complement patterns in text on snapshots. Also, stored IPs are essential to tighten CAPTCHA rules against an active spammer.

Of course, you can save a page anonymously. It is exactly what the website for: to save a page as quick as possible, without creating accounts, confirm email/phone, bind credit cards. But you cannot save 10000 pages with “buy v1agra”. There are no user accounts, so they are simulated by various heuristics.

Jul 26, 2021

0 notes

In the near future, once full archive finally available for download, will each snapshot include the submitter ip address?

Well, it is a good point against exposing/uploading the archive database in its internal form :)

Jul 26, 2021

0 notes

Since the site began which would you say has outpaced the other: the growth in storage space needed or the growth in space you can buy per dollar? Do you see the site becoming cheaper or more expensive to run as time goes on?

No, it does not grow much last months. It is not about storage: after migrating from PhantomJS to Chromium the pagesaving process got slow and heavy, so CAPTCHA was introduced and ruled out mass submitters who use bots and scripts. Storage prices never were the bottleneck. In the first years it was disk I/O and bandwidth for distributing content (mainly because of bingbot and googlebot), now CPU for the browsers.

There are still unimplemented ideas to be leaner in storage, such as erasure codes instead of data duplication, etc.

Jul 26, 2021

0 notes

I deleted my personal Instagram a few days ago and I noticed that the old URL was archived around the same time on this site. The archived page comes up as "Page not found" so it doesn't show any personal info but I'm curious. Do you automatically archive deleted instagram accounts or was somebody searching my name and archiving it?

Somebody was searching your name and archiving it.

Jul 25, 2021

1 note

May you create throwaway tumblr account to save login restricted blog?

I tried. But I do not understand how it supposes to view login restricted blogs. The restricted content is there but in a very narrow column.

Jul 25, 2021

0 notes

Following up on your project blog response, what are the other thread unrollers that save Twitter tweets?

https://www.google.com/search?q=twitter+thread+unrollers

Jul 25, 2021

0 notes