Reply to this post | Go Back
View Post [edit]

Poster: Cronokkio Date: Sep 20, 2019 2:16pm

Forum: forums Subject: Question about archived images (direct links vs. non-direct)

For the longest time I was puzzled why a particular Firefox add-on, which downloads all images in open tabs, was not able to download images from certain archived webpages.

Recently, I have discovered a common denominator with the one's that weren't downloading. On these particular webpages, the images were missing "im_" from the archived URL, which should appear after the timestamp in the address. It seems when this "im_" is missing that the image is not actually a direct link to the image, but rather a webpage with script that prevents this Firefox add-on from finding the image.

For example, this image with the "im_" in the URL will work with the add-on:
https://web.archive.org/web/20001027200916im_/http://spin.com.mx/~rcamacho/diego/supergoku.jpg

...But this one, without the "im_" does not work with the add-on:
https://web.archive.org/web/20000620154746/http://www.geocities.com/Tokyo/9051/Gogeta.jpg

The first one with the "im_" in the URL is a direct link with no viewable page source, while the second one, without the "im_" is not a true direct link and has viewable page source that prevents this Firefox add-on from finding the image.

I doubt there is a way this can be fixed. So I am more or less just curious.

My question is, why do some webpage's have true, direct links to the archived images, using the "im_" in the timestamp, while others do not. It appears to be used at random. Why is this?

I can manually add this "im_" to any image timestamp that doesn't have one and the direct image link will then work. Of course that is a lot of manual work and I'd be better off right clicking and saving images manually. So why are some archived webpage's treated differently in this regard?

Thanks for any insight into this.

Reply to this post
Reply [edit]

Poster: himmenting1972 Date: Sep 23, 2019 7:06pm

Forum: forums Subject: Re: Question about archived images (direct links vs. non-direct)

The answer is quite obvious

There are both. if you remove im_ you will get the normal page

see: https://web.archive.org/web/20001027200916/http://spin.com.mx/~rcamacho/diego/supergoku.jpg

if you add the 'im_' you get just the image

go to the link above, right click the image, copy image address. paste in your browser. this is how it is

when it comes to a PDF file, it is 'if_'

Reply to this post
Reply [edit]

Poster: varenhizzle Date: Jun 3, 2023 10:42am

Forum: forums Subject: Re: Question about archived images (direct links vs. non-direct)

> Please first link some example pages with the im_ (I have also encountered “if_”, seemingly no difference) and some without.

I'm following up on the "im_" vs "if_" difference. I know this is an old post but it came up as the top result for me in Google so figured I'd chime in.

I can't find this documented anywhere but I can tell you about some of the differences.
- Appending "im_" seems to result in you getting the raw direct resource at an archived URL. No Wayback Machine chrome and all page content is in its original form.
- Appending "if_" is quite similar. For HTML pages you still don't get any Wayback Machine chrome but URL resources are re-written to point to the correct wayback machine resource.

Some examples:
- On this 'im_' url you get the raw archive of Google's 2006 homepage. No content is changed: https://web.archive.org/web/20061231111852im_/http://www.google.com/
- On this 'if_' url, the page appears the same but some Wayback Machine JS is inserted and URLs are re-written to point to the correct Wayback resource: https://web.archive.org/web/20061231111852if_/http://www.google.com/

In my above example, with the 'im_' URL the <img src="..." alt="..." /> logo points to the raw "/intl/en_ALL/images/logo.gif" resource. Which of course shouldn't work on the web.archive.org domain, but amazingly the Wayback Machine sees that it's a 404, but then looks at the referrer, and redirects the img request to the correct archived url.

In the 'if_' URL above, the <img src="..." alt="..." /> logo is re-written to point to "/web/20061231111852im_/http://www.google.com/intl/en_ALL/images/logo.gif" so it's served correctly on the first request.

For a pdf, which has no resource URLs to nbe re-written, im_ and if_ are the exact same.

tl;dr; if you want just the raw archive, use 'im_'. If you want the raw webpage, without Wayback Chrome, but still want external resources like JS/CSS/Images to work consistently, use 'if_'. (though even with 'if_' site relative URLs usually work, assuming no name collision with real wayback resources).
This post was modified by varenhizzle on 2023-06-03 17:42:15

Reply to this post
Reply [edit]

Poster: TL7 Date: Sep 22, 2019 2:25pm

Forum: forums Subject: Example please.

Hello.
Disclaimer: I am no Internet Archive staff, just a user. But that does not mean I can't help you.

Please first link some example pages with the im_ (I have also encountered “if_”, seemingly no difference) and some without.

This post was modified by TL7 on 2019-09-22 21:25:07

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Reply to this post | Go Back
View Post [edit]

Poster: Cronokkio Date: Sep 20, 2019 2:16pm

Forum: forums Subject: Question about archived images (direct links vs. non-direct)

Reply to this post
Reply [edit]

Poster: himmenting1972 Date: Sep 23, 2019 7:06pm

Forum: forums Subject: Re: Question about archived images (direct links vs. non-direct)

Reply to this post
Reply [edit]

Poster: varenhizzle Date: Jun 3, 2023 10:42am

Forum: forums Subject: Re: Question about archived images (direct links vs. non-direct)

Reply to this post
Reply [edit]

Poster: TL7 Date: Sep 22, 2019 2:25pm

Forum: forums Subject: Example please.

Poster:	Cronokkio	Date:	Sep 20, 2019 2:16pm
Forum:	forums	Subject:	Question about archived images (direct links vs. non-direct)

Poster:	himmenting1972	Date:	Sep 23, 2019 7:06pm
Forum:	forums	Subject:	Re: Question about archived images (direct links vs. non-direct)

Poster:	varenhizzle	Date:	Jun 3, 2023 10:42am
Forum:	forums	Subject:	Re: Question about archived images (direct links vs. non-direct)

Poster:	TL7	Date:	Sep 22, 2019 2:25pm
Forum:	forums	Subject:	Example please.

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Reply to this post | Go Back View Post [edit]

Poster: Cronokkio Date: Sep 20, 2019 2:16pm Forum: forums Subject: Question about archived images (direct links vs. non-direct)

Reply to this post Reply [edit]

Poster: himmenting1972 Date: Sep 23, 2019 7:06pm Forum: forums Subject: Re: Question about archived images (direct links vs. non-direct)

Reply to this post Reply [edit]

Poster: varenhizzle Date: Jun 3, 2023 10:42am Forum: forums Subject: Re: Question about archived images (direct links vs. non-direct)

Reply to this post Reply [edit]

Poster: TL7 Date: Sep 22, 2019 2:25pm Forum: forums Subject: Example please.

Reply to this post | Go Back
View Post [edit]

Poster: Cronokkio Date: Sep 20, 2019 2:16pm

Forum: forums Subject: Question about archived images (direct links vs. non-direct)

Reply to this post
Reply [edit]

Poster: himmenting1972 Date: Sep 23, 2019 7:06pm

Forum: forums Subject: Re: Question about archived images (direct links vs. non-direct)

Reply to this post
Reply [edit]

Poster: varenhizzle Date: Jun 3, 2023 10:42am

Forum: forums Subject: Re: Question about archived images (direct links vs. non-direct)

Reply to this post
Reply [edit]

Poster: TL7 Date: Sep 22, 2019 2:25pm

Forum: forums Subject: Example please.