12

The following prints ac | a | bbb | c

    #!/usr/bin/env perl
    use strict;
    use warnings;
    # use re 'debug';
    
    my $str = 'aacbbbcac';
    
    if ($str =~ m/((a+)?(b+)?(c))*/) {
       print "$1 | $2 | $3 | $4\n";
    }

It seems like failed matches do not reset the captured group variables. What am I missing?

Share a link to this question
CC BY-SA 4.0
3
21

it seems like failed matches dont reset the captured group variables

There is no failed matches in there. Your regex matches the string fine. Although there are some failed matches for inner groups in some repetition. Each matched group might be overwritten by the next match found for that particular group, or keep it's value from previous match, if that group is not matched in current repetition.

Let's see how regex match proceeds:

  • First (a+)?(b+)?(c) matches aac. Since (b+)? is optional, that will not be matched. At this stage, each capture group contains following part:

    • $1 contains entire match - aac
    • $2 contains (a+)? part - aa
    • $3 contains (b+)? part - null.
    • $4 contains (c) part - c
  • Since there is still some string left to match - bbbcac. Proceeding further - (a+)?(b+)?(c) matches - bbbc. Since (a+)? is optional, that won't be matched.

    • $1 contains entire match - bbbc. Overwrites the previous value in $1
    • $2 doesn't match. So, it will contain text previously matched - aa
    • $3 this time matches. It contains - bbb
    • $4 matches c
  • Again, (a+)?(b+)?(c) will go on to match the last part - ac.

    • $1 contains entire match - ac.
    • $2 matches a this time. Overwrites the previous value in $2. It now contains - a
    • $3 doesn't matches this time, as there is no (b+)? part. It will be same as previous match - bbb
    • $4 matches c. Overwrites the value from previous match. It now contains - c.

Now, there is nothing left in the string to match. The final value of all the capture groups are:

  • $1 - ac
  • $2 - a
  • $3 - bbb
  • $4 - c.
Share a link to this answer
CC BY-SA 3.0
2

As odd as it seems this is the "expected" behavior. Here's a quote from the perlre docs:

NOTE: Failed matches in Perl do not reset the match variables, which makes it easier to write code that tests for a series of more specific cases and remembers the best match.

Share a link to this answer
CC BY-SA 3.0
-1

For the parenthesis grouping, /(\d+)/ This documentation says to use \1 \2 ... or \g{1} \g{2}. Using $1 or $2... in a substitution regex part will cause an error like: scalar found in pattern

# Example to turn a css href to local css.
# Transforms <link href="http://..." into <link href="css/..."

# ... inside a loop ...

my $localcss = $_; # one line from the file
$localcss =~ s/href.+\/([^\/]+\.css")/href="css\/\1/g ;
Share a link to this answer
CC BY-SA 3.0
5
  • Apart it is not an answer, instead of what?
    – Toto
    Jan 20 '17 at 16:38
  • All uses of $1 $2 above this crashes with the Perl scalars. thanks for the poke. Jan 20 '17 at 17:40
  • 1
    Where have you seen that? \1 is for backreference inside the pattern, $1 is the variable that contains the value of group 1 and it is used in the replacement part or outside the regex, they are two distinct concepts.
    – Toto
    Jan 20 '17 at 18:13
  • I came to this page becase I needed backreference \1 inside substitution. Other people will appreciate it. Thanks for your attention. Jan 20 '17 at 20:33
  • The linked docs clearly say to NOT use \1 in your case, it only still works sometimes: "That's because in PerlThink, the righthand side of an s/// is a double-quoted string. \1 in the usual double-quoted string means a control-A.[...]You can't disambiguate that by saying \{1}000, whereas you can fix it with ${1}000." metacpan.org/pod/release/RJBS/perl-5.18.1/pod/… Nov 27 '18 at 18:43

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.