2

To archive a webpage in the Internet Archive's Wayback Machine, I usually do:

wget --spider 'https://web.archive.org/save/https://example.com'

Is there a similar method that I can use to archive web pages to archive.today?


0

I've analyzed the request of manually saving a file (Firefox' developer tools have a handy 'Copy as cURL' function for this - see the bottom of the post for the actual request). It includes a lot of fluff (user agent, cookies, origin, etc.) which can be omitted, and escaping the slashes in the URL also isn't necessary. Simply executing

curl -v 'https://archive.vn/submit/' \
  --data-raw 'url=https://webapps.stackexchange.com/users/218839/flux'

is already sufficient to archive your profile page. Initially, the response was some HTML containing a 'work in progress' link: https://archive.vn/wip/dk2xB which you can use to monitor the progress and/or as a final link.

<html><body><script>setInterval(function(){document.location.replace("https://archive.vn/wip/dk2xB")},1000)</script><div>
      <img width="48" height="48" style="vertical-align:middle" src="https://archive.vn/loading.gif"/>
      <span style="vertical-align:middle;font-size:48px;padding-left:5px">Loading</span>
      <hr/>
    </div></body></html>

Now that I try it again, a couple of hours later, I don't get HTML as response but a HTTP 302 (Found) with the final URL in the Location header: https://archive.vn/dk2xB.

This is how the archived page looks like:

enter image description here


The original cURL request is

curl 'https://archive.vn/submit/'\
  -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:81.0) Gecko/20100101 Firefox/81.0'\
  -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'\
  -H 'Accept-Language: en-US,en;q=0.5'\
  --compressed\
  -H 'Content-Type: application/x-www-form-urlencoded'\
  -H 'Origin: https://archive.vn'\
  -H 'Connection: keep-alive'\
  -H 'Referer: https://archive.vn/'\
  -H 'Cookie: _ga=GA1.2.661111166.1603535444'\
  -H 'Upgrade-Insecure-Requests: 1'\
  -H 'TE: Trailers'\
  --data-raw 'submitid=1Z%2FjKja%2BtkGo%2BmykS2%2BrMYgTje4YZV9xk8OIlwY4NT2mLExajP7ZRmnTbJku2aMX&url=https%3A%2F%2Fwebapps.stackexchange.com%2Fquestions%2F148066%2Fhow-do-i-archive-a-webpage-to-archive-today-using-wget-or-curl'


Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.