sed and Perl regexp replaces once, with multiple replacements flag

Question

I have the string:

lopy,lopy1,sym,lopy,lopy1,sym"

I want the line to be:

lopy,lopy1,sym,lady,lady1,sym

Which means that all "lad" after the string sym should be replaced. So I ran:

echo "lopy,lopy1,sym,lopy,lopy1,sym" | sed -r 's/(.*sym.*?)lopy/\1lad/g'

I get:

lopy,lopy1,sym,lopy,lad1,sym

Using Perl is not really better:

echo "lopy,lopy1,sym,lopy,lopy1,sym" | perl -pe 's/(.*sym.+?)lopy/${1}lad/g'

yields

lopy,lopy1,sym,lad,lopy1,sym

Not all "lopy" are replaced. What am I doing wrong?

Wiktor Stribiżew · Accepted Answer · 2021-10-04 07:28:45Z

The (.*sym.*?)lopy / (.*sym.+?)lopy patterns are almost the same, .+? matches one or more chars other than line break chars, but as few as possible, and .*? matches zero or more such chars. Mind that sed does not support lazy quantifiers, *? is the same as * in sed. However, the main problem with the regexps you used is that they match sym, then any text after it and then lopy, so when you added g, it just means you want to find more cases of lopy after sym....lopy. And there is only one such occurrence in your string.

You want to replace all lopy after sym, so you can use

perl -pe 's/(?:\G(?!^)|sym).*?\Klopy/lad/g'

See the regex demo. Details:

(?:\G(?!^)|sym) - sym or end of the previous match (\G(?!^))
.*? - any zero or more chars other than line break chars, as few as possible
\K - match reset operator that discards all text matched so far
lopy - a lopy string.

See the online demo:

#!/bin/bash
echo "lopy,lopy1,sym,lopy,lopy1,sym" | perl -pe 's/(?:\G(?!^)|sym).*?\Klopy/lad/g'
# => lopy,lopy1,sym,lad,lad1,sym

If the values are always comma separated, you may replace .*? with ,: (?:\G(?!^)|sym),\Klopy (see this regex demo).

RavinderSingh13 · Accepted Answer · 2021-10-04 20:11:52Z

Since OP has mentioned sed so I am adding awk program here. Which could be better choice in comparison to sed. With shown samples, please try following awk program.

echo "lopy,lopy1,sym,lopy,lopy1,sym" | 
awk -F',sym,' '
{
  first=$1
  $1=""
  sub(/^[[:space:]]+/,"")
  gsub(/lop/,"lad")
  $0=first FS $0
}
1
'

Explanation: Adding detailed explanation for above.

echo "lopy,lopy1,sym,lopy,lopy1,sym" |  ##Printing values and sending as standard output to awk program as an input.
awk -F',sym,' '                         ##Making ,sym, as a field separator here.
{
  first=$1                              ##Creating first which has $1 of current line in it.
  $1=""                                 ##Nullifying $1 here.
  sub(/^[[:space:]]+/,"")               ##Substituting initial space in current line here.
  gsub(/lop/,"lad")                     ##Globally substituting lop with lad in rest of line.
  $0=first FS $0                        ##Adding first FS to rest of edited line here.
}
1                                       ##Printing edited/non-edited line value here.
'

zdim · Accepted Answer · 2021-10-06 23:09:44Z

The problem is that the lopy(s) to replace are after sym, with a pattern like sym.*?lopy, so a global replacement looks for yet more of the whole sym+lopy-after-sym (not just for all lopys after that one sym).^†

To replace all lopys (after the first sym, followed by another sym) we can capture the substring between syms and in the replacement side run code, in which a regex replaces all lopys

echo "lopy,lopy1,sym,lopy,lopy1,sym" | 
    perl -pe's{ sym,\K (.+?) (?=sym) }{ $1 =~ s/lop/lad/gr }ex'

To isolate the substring between syms I use \K after the first sym, which drops matches prior to it, and a positive lookahead for the sym after the substring, which doesn't consume anything. The /e modifier makes the replacement side be evaluated as code. In the replacement side's regex we need /r since $1 can't change, and we want the regex to return anyway. See perlretut.

^† To match all of abbbb we can't say /ab/g, nor /(a)b/g nor /a(b)/g, because that would look for all repetitions of the whole ab in the string (and find only ab in the beginning).

tripleee · Accepted Answer · 2021-10-04 07:38:49Z

sed does not support non-greedy wildcards at all. But your Perl script also fails for other reasons; you are saying "match all occurrences of this" but then you specify a regex which can only match once.

A common simple solution is to split the string, and then replace only after the match:

echo "lopy,lopy1,sym,lopy,lopy1,sym" |
perl -pe 'if (@x = /^(.*?sym,)(.*)/) { $x[1] =~ s/lop/lad/g; s/.*/$x[0]$x[1]/ }'

If you want to be fancy, you can use a lookbehind to only replace the lop occurrences after the first sym.

echo "lopy,lopy1,sym,lopy,lopy1,sym" |
perl -pe 's/(?<=sym.{0,200})lop/lad/'

The variable-length lookbehind generates a warning and is only supported in Perl 5.30+ (you can turn it off with no warnings qw(experimental::vlb));.)

anubhava · Accepted Answer · 2021-10-04 21:21:08Z

Since you have shown an attempted sed command and used sed tag, here is a sed loop based solution:

sed -E -e ':a' -e 's~(sym,.*)lopy~\1lady~g; ta' file

lopy,lopy1,sym,lady,lady1,sym"

Explanation:

:a sets a label a before matching sym,.* pattern
ta jumps pattern matching back to label a after making a substitution

This looping stop when s command has nothing to match i.e. no lopy substring after sym,

current community

your communities

more stack exchange communities

sed and Perl regexp replaces once, with multiple replacements flag

5 Answers 5

Your Answer

Not the answer you're looking for? Browse other questions tagged regex perl sed regex-greedy regexp-replace or ask your own question.

Linked

Hot Network Questions

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged regex perl sed regex-greedy regexp-replace or ask your own question.

Linked

Related