Archive.is blog

Blog of http://archive.is/ project
  • ask me anything
  • rss
  • archive
  • Why do some web-pages take longer to archive than others?
    Anonymous

    There are many things the loading time depends on. Number of elements on the page, webserver and network issues.

    Also, some webpages load images and scripts from more than one server, and the slowest server sets the total loading time of the page. Even if that slowest server is needed only for something insignificant like social buttons or ad banner.

    • 13 hours ago
    • 1 notes
  • Is there any way that I can save video, from a site such as YouTube, along with the page when it is archived?
    Anonymous

    No

    • 1 week ago
    • 1 notes
  • Once I archive a page, how long do I have to wait before I can archive the page again, if the content has changed?
    Anonymous

    5 minutes.

    All submissions within 5 minutes timeframe redirect to the previous snapshot instead of taking a new one.

    • 1 week ago
    • 1 notes
  • Is this a non-profit organization? If so, how can one donate money to it?
    Anonymous

    No.

    You can donate to a similar project WebCite here: https://fundrazr.com/campaigns/aQMp7

    • 2 weeks ago
    • 2 notes
  • Is this database mirrored in order to help maintain long-term stability and accessibility?
    Anonymous

    Yes

    • 2 weeks ago
  • Given that you're not happy with the source as it currently stands, would you be able to provide an api-like service? Ideally I'd love to be able to shoot off a URL, and get a ZIP package back. I considered just submitting URLs and grabbing the link at the end, but that seems rather abusive to your server.
    Anonymous

    You may want to use any of open source tools:
    http://code.google.com/p/chrome-scrapbook/
    or https://chrome.google.com/webstore/detail/singlefile/mpiodijhokgodhhofbcjdecpffjipkle?hl=en
    or https://chrome.google.com/webstore/detail/pagearchiver/ihkkeoeinpbomhnpkmmkpggkaefincbn?hl=en

    Also, Microsoft Internet Explorer can save pages in .mht format and it can be easy automated with any scripting language (much easier than any other browser).

    About the API…
    It looks like providing a private service so I must ask, are you ready to pay for such a service (something like $1 per 1000 shots) ?

    • 1 month ago
  • Can you give an option to download webpages in .7z, they are much more efficient than zip.
    Anonymous

    There is only one well compressible file in each archive (html), the rest are png and jpg images which are already compressed and the archiver keeps them untouched. So the choice of the archiver would not affect the resulting size of archives significantly.

    Also, many new unpackers (7z, rar, …) are able to unpack zip-files, but not the other way around.

    • 1 month ago
  • I noticed that you support a download link for pages that have been archived. Would it be possible to support downloading a page in WARC format?
    Anonymous

    It is possible but I am afraid it would not add the value you expect from WARC.

    Archive.is’ snapshots are not result of the crawl but snapshots of the internal browser state.

    So there is almost no metadata and even the original URLs of images are not stored (moreover, some of the images were not downloaded at all but produced by rendering complex WebKit-specific CSS sentences in order the snapshot could be simpler and less dependent on the browser of the user).

    • 2 months ago
  • Can newspaper articles from behind a paywall be archived?
    Anonymous

    Only those which either have “happy hours” of free access or registration-free access to all articles but limits the per-day or per-month number of articles to see.

    Those which always shows “enter you credit card” instead of articles - definitely no.

    • 2 months ago
  • Hi, The following hashbang URLs not working: archive is /RcaO0 Won't webpage capture automatically go to the section concerned? Thanks.
    Anonymous

    this is a bug, thank you for reporting! Not all original hashbangs are preserved :(

    If you want to share a link pointing to a specific part of a long page, you can use specify the percentage in hashbang, e.g. http://archive.is/RcaO0#84.4%

    • 2 months ago
© 2012–2013 Archive.is blog
Next page
  • Page 1 / 5