This tool is an ongoing experiment in better HTML checking, and its behavior remains subject to change
Error: Start tag seen without seeing a doctype first. Expected <!DOCTYPE html>
.
From line 1, column 1; to line 1, column 39
<html style="background-color:#FFFAE1"><head>
Error: Element head
is missing a required instance of child element title
.
From line 1, column 103; to line 1, column 109
archive"/></head><body>
head
:iframe
srcdoc
document or if title information is available from a higher-level protocol: Zero or more elements of metadata content, of which no more than one is a title
element and no more than one is a base
element.title
element and no more than one is a base
element.Error: Bad value http://archive.is/FWVL#40%
for attribute href
on element a
: Percentage ("%") is not followed by two hexadecimal digits.
From line 116, column 115; to line 116, column 151
r example <a href="http://archive.is/FWVL#40%">http:/
Error: Bad value http://archive.is/[2A00:1450:400C:C00::69]
for attribute href
on element a
: Illegal character in path segment: [
is not allowed.
From line 139, column 5; to line 139, column 57
↩<ul>↩<li><a href="http://archive.is/[2A00:1450:400C:C00::69]">http:/
Warning: This document appears to be written in English. Consider adding lang="en"
(or variant) to the html
start tag.
From line 1, column 1; to line 1, column 39
<html style="background-color:#FFFAE1"><head>
There were errors. (Tried in the text/html mode.)
No images in the document.
<html style="background-color:#FFFAE1"><head><meta name="robots" content="noindex,nocache,noarchive"/></head><body><h1 id = "FAQ">FAQ</h1>
↩
↩
<h2 id = "Which_parts_of_web_page_are_saved_">Which parts of web page are saved?</h2>
↩
↩
<ol>
↩
<li>Textual content of the web page.</li>
↩
<li>Images.</li>
↩
<li>Content of the frames.</li>
↩
<li>Content and images loaded or generated by Javascript on Web 2.0 sites</li>
↩
<li>Screenshot of 1024x768 pixels.</li>
↩
</ol>
↩
↩
<h2 id = "Which_parts_of_web_page_are_not_saved_">Which parts of web page are not saved?</h2>
↩
↩
<ol>
↩
<li>Flash and content loaded by flash.</li>
↩
<li>Video and sounds. It has no sense to archive youtube.com unless you want to archive the title of the video and comments. The video itself will not be saved.</li>
↩
<li>PDF</li>
↩
<li>RSS and other XML-pages saved not reliable. Most of them are not saved or saved as blank page.</li>
↩
</ol>
↩
↩
<h2 id = "How_long_does_it_take_to_make_a_snapshot__">How long does it take to make a snapshot ?</h2>
↩
↩
<p>The same time as to load a page into your browser.
↩
Although, saving the pages with heavy scripts or the pages full of Ads may take up to few minutes.
↩
There is 5 minutes timeout, if page is not fully loaded in 5 minutes, the saving considered failed.
↩
It is not often, but it happens.</p>
↩
↩
<h2 id = "It_there_limit_on_the_page_size__">It there limit on the page size ?</h2>
↩
↩
<p>The stored page with all images must be smaller than 50Mb</p>
↩
↩
<h2 id = "What_software_do_you_run_and_how_data_is_stored__">What software do you run and how data is stored ?</h2>
↩
↩
<p>The archive runs Apache Hadoop and Apache Accumulo.
↩
All data is stored on HDFS, textual content is duplicated 3 times among servers in different datacenters and images are duplicated 2 times.
↩
All datacenters are in Europe.</p>
↩
↩
<h2 id = "How_long_the_page_will_be_stored__">How long the page will be stored ?</h2>
↩
↩
<p>Virtually forever.
↩
We have a lot of free space and although the archive grows with time, the storage and bandwidth get cheaper.</p>
↩
↩
<h2 id = "Do_you_delete_my_stored_page_s___">Do you delete my stored page(s) ?</h2>
↩
↩
<p>Pages which violate our hoster's rules (cracks, porn, etc) may be deleted.
↩
Also, completely empty pages (or pages which have nothing but text like “502 Server Timeout”) may be deleted.</p>
↩
↩
<h2 id = "How_is_the_archive_funded_">How is the archive funded?</h2>
↩
↩
<p>It is privately funded; there are no complex finances behind it.
↩
It may look more or less reliable compared to startup-style funding or a university project, depending on which risks are taken into account.</p>
↩
↩
<h2 id = "Will_advertising_appear_on_the_archive_one_day__">Will advertising appear on the archive one day ?</h2>
↩
↩
<p>I cannot make a promise that it will not.
↩
With the current growth rate I am able to keep the archive free of ads.
↩
Well, I can promise it will have no ads at least till the end of 2014.</p>
↩
↩
<h2 id = "How_to_refer_to_the_saved_page__">How to refer to the saved page ?</h2>
↩
↩
<p>Each page has short url http://archive.is/XXXXX, where XXXXX is the unique indentfier of a page.
↩
Also, the page can be refered with urls like</p>
↩
↩
<ul>
↩
<li><a href="http://archive.is/2013/http://www.google.de/">http://archive.is/<strong>2013</strong>/http://www.google.de/</a> - the newest snapshot in year 2013.</li>
↩
<li><a href="http://archive.is/201301/http://www.google.de/">http://archive.is/<strong>201301</strong>/http://www.google.de/</a> - the newest snapshot in January 2013.</li>
↩
<li><a href="http://archive.is/20130101/http://www.google.de/">http://archive.is/<strong>20130101</strong>/http://www.google.de/</a> - the newest snapshot within the day of 1st January 2013.</li>
↩
</ul>
↩
↩
<p>The date can be extended further with hours, minutes and seconds:</p>
↩
↩
<ul>
↩
<li><a href="http://archive.is/2013010103/http://www.google.de/">http://archive.is/<strong>2013010103</strong>/http://www.google.de/</a></li>
↩
<li><a href="http://archive.is/201301010313/http://www.google.de/">http://archive.is/<strong>201301010313</strong>/http://www.google.de/</a></li>
↩
<li><a href="http://archive.is/20130101031355/http://www.google.de/">http://archive.is/<strong>20130101031355</strong>/http://www.google.de/</a></li>
↩
</ul>
↩
↩
<p>Year, month, day, hours, minutes and seconds can be separated with dots, dash or colons to increase readability:</p>
↩
↩
<ul>
↩
<li><a href="http://archive.is/2013-04-17/http://blog.bo.lt/">http://archive.is/<strong>2013-04-17</strong>/http://blog.bo.lt/</a></li>
↩
<li><a href="http://archive.is/2013.04.17-12:08:20/http://blog.bo.lt/">http://archive.is/<strong>2013.04.17-12:08:20</strong>/http://blog.bo.lt/</a></li>
↩
</ul>
↩
↩
<p>It is also possible to refer all snapshots of the given url</p>
↩
↩
<ul>
↩
<li><a href="http://archive.is/http://www.google.de/">http://archive.is/http://www.google.de/</a></li>
↩
</ul>
↩
↩
<p>All saved pages from the domain</p>
↩
↩
<ul>
↩
<li><a href="http://archive.is/www.google.de">http://archive.is/www.google.de</a></li>
↩
</ul>
↩
↩
<p>All saved pages from all the subdomains</p>
↩
↩
<ul>
↩
<li><a href="http://archive.is/*.google.de">http://archive.is/*.google.de</a></li>
↩
</ul>
↩
↩
<h2 id = "Is_there_a_way_to_link_to_the_most_recent_archive_of_an_article_by_including_the_URL_in_an_archive__is_link_">Is there a way to link to the most recent archive of an article by including the URL in an archive. is link?</h2>
↩
↩
<p>Yes.</p>
↩
↩
<p><a href="http://archive.is/newest/http://reddit.com/">http://archive.is/newest/http://reddit.com/</a>
↩
There is also <a href="http://archive.is/oldest/http://reddit.com/">http://archive.is/oldest/http://reddit.com/</a></p>
↩
↩
<h2 id = "How_to_refer_to_exact_part_of_a_long_page__">How to refer to exact part of a long page ?</h2>
↩
↩
<p>There are two options:</p>
↩
↩
<ul>
↩
<li><p>add hashtag with the scroll position as a number between 0 (top of the page) and 100 (bottom). For example <a href="http://archive.is/FWVL#40%">http://archive.is/FWVL#40%</a></p></li>
↩
<li><p>select some text on the page and get URL with hashtag referring to the selection. For example <a href="http://archive.is/FWVL#selection-1493.0-1493.53">http://archive.is/FWVL#selection-1493.0-1493.53</a></p></li>
↩
</ul>
↩
↩
<h2 id = "Does_it_support_any_API__">Does it support any API ?</h2>
↩
↩
<p>archive.is supports MementoWeb API. More info can be found <a href="http://mementoweb.org/depot/native/archiveis/">here</a></p>
↩
↩
<h2 id = "Can_I_have_an_account_to_manage_my_bookmarks__">Can I have an account to manage my bookmarks ?</h2>
↩
↩
<p>No.
↩
But you can keep bookmarks to archived pages in one of the existing bookmark managers, like <a href="https://delicious.com/">Delicious</a>, <a href="http://www.google.com/bookmarks">Google Bookmarks</a>, …</p>
↩
↩
<h2 id = "Why_does_archive_is_not_obey_robots_txt_">Why does archive.is not obey robots.txt?</h2>
↩
↩
<p>Because it is not a free-walking crawler, it saves only one page acting as a direct agent of the human user.
↩
Such services don't obey robots.txt (e.g. <a href="https://support.google.com/webmasters/answer/178852#robots">Google Feedfetcher</a>, screenshot- or pdf-making services, isup.me, …)</p>
↩
↩
<h2 id = "Is_IPv6_supported__">Is IPv6 supported ?</h2>
↩
↩
<p>Yes.</p>
↩
↩
<ul>
↩
<li><a href="http://archive.is/[2A00:1450:400C:C00::69]">http://archive.is/[2A00:1450:400C:C00::69]</a></li>
↩
<li><a href="http://archive.is/ipv6.google.com">http://archive.is/ipv6.google.com</a></li>
↩
</ul>
↩
↩
<h2 id = "Are_domains_with_national_characters_supported__">Are domains with national characters supported ?</h2>
↩
↩
<p>Yes.</p>
↩
↩
<ul>
↩
<li><a href="http://archive.is/www.maroñas.com.uy">http://archive.is/www.maroñas.com.uy</a></li>
↩
<li><a href="http://archive.is/*.测试">http://archive.is/*.测试</a></li>
↩
</ul>
↩
↩
<h2 id = "Do_you_preserve_archivers__privacy__E_g__not_disclose_the_source_IP_address_">Do you preserve archivers' privacy? E.g. not disclose the source IP address?</h2>
↩
↩
<p>Yes.</p>
↩
↩
<p>But take in mind that when you archive a page, your IP is being sent to the the website you archive as though you are using a proxy (in X-Forwarded-For header). This feature allows websites (e.g shops or the sites with weather forecast) target your region, not mine.</p>
↩
↩
<h2 id = "I_found_incorrect_inaccurate_obsolete_informartion__Can_I_request_it_to_be_altered_or_deleted_">I found incorrect/inaccurate/obsolete informartion. Can I request it to be altered or deleted?</h2>
↩
↩
<p>The archive is not a news agency nor an authoritative source of reference information.
↩
It merely certifies that at the given point of time there was a page on the web.
↩
The page might well contain a fairy tale and despite “One day Little Red Riding Hood goes to visit her granny” being a false statement it is not the reason to burn the books.
↩
Note that weather forecasts on the archived pages are outdated as well.</p>
↩
↩
<h2 id = "My_question_is_not_here_">My question is not here!</h2>
↩
↩
<p>More questions and answers: <a href="http://blog.archive.is/archive">http://blog.archive.is/archive</a></p></body></html>
Used the HTML parser. Externally specified character encoding was utf-8.
Total execution time 268 milliseconds.
About this checker • Report an issue • Version: 22.3.8