I have a local HTML file referencing image and style data in various places in the local file system. I’d like to get a list of all referenced files; or alternatively a command that will copy the HTML files and all referenced files to some clear location (with or without changing the links in the HTML file), so that I can make a self-contained ZIP file of the HTML page.

It seems that wget provides good support for downloading an HTML file including all prerequisites (images, styles) using the --page-requisites flag. Unfortunately it does not support file:// URL.

What are my options here?

share|improve this question
    
Wget does not support other than HTTP, HTTPS, and FTP protocols. So you have 3 options: 1.To expose your files with FTP then load it all. 2.To modify wget source and do the update. 3.Change the tool you are using, try httrack.com will it help for you idk? :) – user1759572 Aug 19 '13 at 22:11
    
Using a different tool is fine, as long as it is a free command-line tool for Linux, preferably packaged in Debian. – Joachim Breitner Aug 19 '13 at 22:41
    
httrack might work, although it seems to insist on putting the full path to the original file in the destination path. An alternative would be wget with a temporary web server like python -m SimpleHTTPServer, but that is shaky due to guessing a free port and killing the web server afterwards. – Joachim Breitner Sep 6 '13 at 9:30

Why not setting up a local Apache server and serve it off localhost?

You can use EasyPHP, MAMP or other to set up easily a local Apache server.

share|improve this answer
    
That would be total overkill for my use. For now I resorted to a xslt script that extract the included images, but of course it is not complete (e.g. styles, images mentioned in styles). – Joachim Breitner Oct 17 '13 at 14:34

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.