0

I have the following sed command: sed 's/\('\''\).*div><div>/,/'

which take everything between a single quote ' and div><div> and replaces it with a comma ,. Which works near enough perfectly to how I want it. However there are some lines that contains two div><div> and my command is taking the second one to be its stoppping point, where I'm trying to cut it off at the first.

To try and provide more clarity, heres the line in the file that I am trying to extract data from:

'>Person A</a></div><div>Teaching A</div></div></td><td width='50%'><div style='height: 50px; margin-bottom: 6px;'><div style='font-weight:bold'>Unknown or external</div><div>Teaching B<

I am trying to replace everything up until Teaching A so my output should look should like ,Teaching A. However the output I am getting is ,Teaching B.

How could I manipulate my sed command to pick up on the first instance of div><div> instead of the last?

| improve this question | |
New contributor
is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our .
0

@AdminBee: I was also going to suggest non-greedy matching
@Dr Little: What was your solution? Please tell us.

If I understand correctly, this should also work: sed 's/\('\''\).*<.a><div><div>/,/'.

Please understand that it is not recommended to parse HTML files with regex. For example, one time I was parsing 10s of thousands of HTML files with vim+regex, it was a time-sensitive task and I regret doing it that way. Why? Because the task probably would have been completed way faster if I used an actual XML/HTML parser to parse the text files / extract lines and data.

| |
New contributor
is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our .

Your Answer

Dr Little is a new contributor. Be nice, and check out our Code of Conduct.

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.