I want to extract the URL from within the anchor tags of an html file. This needs to be done in BASH using SED/AWK. No perl please.
What is the easiest way to do this?
| 
 | 
| You could also do something like this (provided you have lynx installed):  | |||||||||||||||||
| 
 | 
| You asked for it: This is a crude tool, so all the usual warnings about attempting to parse HTML with regular expressions apply. | |||||
| 
 | 
| An example, since you didn't provide any sample  | |||||||||||||
| 
 | 
| 
 | |||||
| 
 | 
| You can do it quite easily with the following regex, which is quite good at finding URLs: I took it from John Gruber's article on how to find URLs in text. That lets you find all URLs in a file f.html as follows:  | |||||||||||||||||
| 
 | 
| With the Xidel - HTML/XML data extraction tool, this can be done via: With conversion to absolute URLs:  | |||
|  | 
| I made a few changes to Greg Bacon Solution This fixes two problems: 
 | |||||
| 
 | 
| I am assuming you want to extract a URL from some HTML text, and not parse HTML (as one of the comments suggests). Believe it or not, someone has already done this. OT: The sed website has a lot of good information and many interesting/crazy sed scripts. You can even play Sokoban in sed! | ||||
|  | 
| You can try:  | |||||
| 
 |