View Post [edit]
Poster: | Cronokkio | Date: | Sep 20, 2019 2:16pm |
Forum: | forums | Subject: | Question about archived images (direct links vs. non-direct) |
Recently, I have discovered a common denominator with the one's that weren't downloading. On these particular webpages, the images were missing "im_" from the archived URL, which should appear after the timestamp in the address. It seems when this "im_" is missing that the image is not actually a direct link to the image, but rather a webpage with script that prevents this Firefox add-on from finding the image.
For example, this image with the "im_" in the URL will work with the add-on:
https://web.archive.org/web/20001027200916im_/http://spin.com.mx/~rcamacho/diego/supergoku.jpg
...But this one, without the "im_" does not work with the add-on:
https://web.archive.org/web/20000620154746/http://www.geocities.com/Tokyo/9051/Gogeta.jpg
The first one with the "im_" in the URL is a direct link with no viewable page source, while the second one, without the "im_" is not a true direct link and has viewable page source that prevents this Firefox add-on from finding the image.
I doubt there is a way this can be fixed. So I am more or less just curious.
My question is, why do some webpage's have true, direct links to the archived images, using the "im_" in the timestamp, while others do not. It appears to be used at random. Why is this?
I can manually add this "im_" to any image timestamp that doesn't have one and the direct image link will then work. Of course that is a lot of manual work and I'd be better off right clicking and saving images manually. So why are some archived webpage's treated differently in this regard?
Thanks for any insight into this.
Reply [edit]
Poster: | himmenting1972 | Date: | Sep 23, 2019 7:06pm |
Forum: | forums | Subject: | Re: Question about archived images (direct links vs. non-direct) |
There are both. if you remove im_ you will get the normal page
see: https://web.archive.org/web/20001027200916/http://spin.com.mx/~rcamacho/diego/supergoku.jpg
if you add the 'im_' you get just the image
go to the link above, right click the image, copy image address. paste in your browser. this is how it is
when it comes to a PDF file, it is 'if_'
Reply [edit]
Poster: | varenhizzle | Date: | Jun 3, 2023 10:42am |
Forum: | forums | Subject: | Re: Question about archived images (direct links vs. non-direct) |
I'm following up on the "im_" vs "if_" difference. I know this is an old post but it came up as the top result for me in Google so figured I'd chime in.
I can't find this documented anywhere but I can tell you about some of the differences.
- Appending "im_" seems to result in you getting the raw direct resource at an archived URL. No Wayback Machine chrome and all page content is in its original form.
- Appending "if_" is quite similar. For HTML pages you still don't get any Wayback Machine chrome but URL resources are re-written to point to the correct wayback machine resource.
Some examples:
- On this 'im_' url you get the raw archive of Google's 2006 homepage. No content is changed: https://web.archive.org/web/20061231111852im_/http://www.google.com/
- On this 'if_' url, the page appears the same but some Wayback Machine JS is inserted and URLs are re-written to point to the correct Wayback resource: https://web.archive.org/web/20061231111852if_/http://www.google.com/
In my above example, with the 'im_' URL the <img src="..." alt="..." /> logo points to the raw "/intl/en_ALL/images/logo.gif" resource. Which of course shouldn't work on the web.archive.org domain, but amazingly the Wayback Machine sees that it's a 404, but then looks at the referrer, and redirects the img request to the correct archived url.
In the 'if_' URL above, the <img src="..." alt="..." /> logo is re-written to point to "/web/20061231111852im_/http://www.google.com/intl/en_ALL/images/logo.gif" so it's served correctly on the first request.
For a pdf, which has no resource URLs to nbe re-written, im_ and if_ are the exact same.
tl;dr; if you want just the raw archive, use 'im_'. If you want the raw webpage, without Wayback Chrome, but still want external resources like JS/CSS/Images to work consistently, use 'if_'. (though even with 'if_' site relative URLs usually work, assuming no name collision with real wayback resources).
This post was modified by varenhizzle on 2023-06-03 17:42:15
Reply [edit]
Poster: | TL7 | Date: | Sep 22, 2019 2:25pm |
Forum: | forums | Subject: | Example please. |
Disclaimer: I am no Internet Archive staff, just a user. But that does not mean I can't help you.
Please first link some example pages with the im_ (I have also encountered “if_”, seemingly no difference) and some without.
This post was modified by TL7 on 2019-09-22 21:25:07