Archive.is blog

Blog of http://archive.is/ project
  • ask me anything
  • rss
  • archive
  • If the bad guys were to unexpectedly shut down one of your two data centers how long would it take before a second data center would be back online?

    Anonymous

    They are both online, normally balancing the load. When one is off for maintenance, the website works slow.

    • 1 day ago
  • Considering the old successful story of reddit's "give gold to support our server uptime" scheme was in fundraising (granted in their case it turned into a giant scam). But I think the "fundraising progress meter" they used to publicly report on their daily money goals was very effective. It gave a community feel to giving gold. Almost like it was a collective responsibility to keep reddit ad-free by donating to the common good. Have you considered having a daily donation progress bar as well?

    Anonymous

    “daily” is overoptimistic, most days there are zero :)

    • 2 days ago
  • /8rpYA and simular pages from the same parent domain is an interesting case where the archived page is actually an IFrame of a another news page. What do you think is the best way to archive this? Would it be a good idea to log it as a redirect and just follow the page to the iFramed page? Thus Removing the Frame? Or would the archive still work and be more accurate leaving the IFrame?

    Anonymous

    I removed CSS which limited the height of the iframe (only for this site). Is it better now?

    • 2 days ago
  • About: /post/632648485201739776/ - Thanks again!! Could you apply this rule to all new URLs on this portal or do you only fix specific archives?

    Anonymous

    It will be applied to all new URLs after next deployment (later today or tomorrow)

    • 3 days ago
  • In Idealista (leading apartment search website in Spain), can you fix "Leer comentario completo" (read full comment) and "fotos siguientes" (see next photos)? Thanks! /VeJYf

    Anonymous

    fixed

    • 4 days ago
  • is it 1999? asking because you block access to browsers & I haven't seen this retarded shit in decades. you block browsers that are identical to ones you've listed as supported

    Anonymous

    If you are about Brave, I agree that adding ads to the pages and replacing ref.links is very 1999′ish. It was called ActiveX malware back then.

    • 4 days ago
  • Every URL I archive and have archived is blocked by copyright. Why? If I archive with VPN it doesn't get blocked but if I don't use VPN or another IP address it gets blocked. Did I request too much URL's? It says: "In response to a request we received from 'US Digital Millennium Copyright Act' the page is not currently available.If you need it for research, investigation or other purposes, please, inquiry via email, or Search this page in Google Cache Поискать эту страницу в Архив.Орг Search t"

    Anonymous

    There could be a bug. What website are you trying to archive?

    • 4 days ago
  • Can you remove the blocking/login panel that appears on Facebook pages when you are not logged in? It shows up on /4EG79 but not /lHNEb and seems to appear when a person scrolls down the page. Thanks!

    Anonymous

    4EG79 is saved from Archive.org, not from Facebook. It is dangerous to click on buttons “Not now”, “Hide popup”, … Archive.org snapshots, likely they won’t work as intended. On contrary, lHNEb is saved from Facebook and “Not now” has been clicked.

    • 4 days ago
  • How much space Is left in archive. is servers?

    Anonymous

    Not to much. I plan to change data duplication to erasure coding to use space more efficiently.

    • 6 days ago
  • Sometimes it can be important to capture in the archive the original url that the archived page was redirected from. I noticed that you have this feature, thank you. Sometimes the redirect can be several urls before landing on the page that needs to be archived. I ask do you capture the middle redirects? And if so, how urls of the redirect chain do you record? Is it all of them?

    Anonymous

    Yes, new archiver (which works since Dec 2019) records a bit more than the old one, that includes all URLs of intermediate redirects, all URLs of images and scripts, HTTP headers, IP addresses of the servers, etc. I had the idea to visualize it, probably in a form like “Network” tab of Browser’s DevTool. And to use that info to improve adblocker.

    • 6 days ago
  • When an link in an archived page is clicked, it is checked to see if it has also been archived. If so, then the archived page loads, if not then the real url loads. But what if there are three archived versions of that archived out-link page: the out-link with a timestamp one day before the originating page, one with a timestamp one week after, and one with the most latest archive. How do you determine which version to link to?

    Anonymous

    With the closest timestamp to the snapshot you are currently on.

    There are also <-prior and next-> buttons to navigate in time in case of multiple versions.

    • 1 week ago
  • I read in your FAQ that you keep the images at 2x duplication and textual information at 3x. With many websites using the same JavaScript libraries how do you deal with storing commonly referenced libraries say JQuery? Do you use pointers to save on space?

    Anonymous

    JavaScript libraries are not stored, they are executed at the time of capturing and the result of the execution is archived.

    Commonly referenced blobs like background images and fonts are deduplicated, yes.

    • 1 week ago
  • Does it archive entire social media accounts, like a person's Twitter account, or just specific posts?

    Anonymous

    just specific posts

    • 1 week ago
  • Are Wayback Machine links no longer allowed to be backed up in your archive? The archive process seems to keep rejecting them.

    Anonymous

    There is an issue with Wayback Machine snapshots which are just saved to Wayback Machine.

    There seems to be some sort of eventually consistent storage, so if you just saved a link to Wayback Machine and immediately send the WM link to a friend (or feed in to Archive.Today), they might see an empty page on WM. In 10-30 minutes the WM page is visible to everyone

    • 1 week ago
    • 1 notes
  • Can Archive Today have long screenshot of the whole webpage like that of Internet Archive?

    Anonymous

    No, it would double the costs.

    • 1 week ago
  • The new Twitter keeps showing up in new archive saves now. Is there anyway to revert back to the old Twitter for new archives or did Twitter just permanently kill off their old site design?

    Anonymous

    Yes, but old Twitter (or what is left from it) does not show tweets which are marked as “sensitive content“. Apparently, because now it is tailored only for GoogleBot, not for humans

    • 1 week ago
  • is neo-nazi material permitted?

    Anonymous

    I think, yes, although I am not sure about the future.

    So far, the materials which attract the most govt (or quasi-govt) takedown requests are:

    * child porn (from NCMEC, OCLCTIC, ECO.DE, JUGENDSCHUTZ.NET, IHBARWEB, CYBERTIP.CA, MELDPUNT, PAPS.JP, IWF.ORG.UK, HOTLINE.IE, …)

    * ISIS propaganda (from CTIRU and EUROPOL)

    * Cookbooks for drugs and explosives (mainly from ROSKOMNADZOR)

    • 2 weeks ago
  • Sites archived via google as a proxy (using the I'm feeling lucky link) are hit with a redirect interstitial page. /IGtuE

    Anonymous

    Fixed

    • 2 weeks ago
  • Why I still have to do captcha in your onion site? And why the tor browser says my connection to your onion site is insecure and your certificate is not trusted because it's self-signed, once I try to archive or search any page?

    Anonymous

    There is no easy way to obtain a browser-trusted certificate for .onion domain. As far as I know, there are only 2 .onion sites with valid certificate: Facebook and New York Times.

    Anyway, it is merely a show off and it has nothing to do with “secure“: when you visit .onion  websites, traffic is unencrypted only between the browser and Tor service running on the same computer, so plain http is OK.

    • 2 weeks ago
  • is it possible to save all pages of a blog?

    Anonymous

    No

    • 2 weeks ago