On Wed, Mar 10, 2010 at 4:06 PM, M. G. Devour <mdevour@eskimo.com> wrote:
Dear Jim,
So there's apparently an important difference between:
boundary="*\/.+ and... boundary="*\/[^"]+
Yes, I believe I explained this on the original thread several days ago.
According to the above page, procmail makes two passes over the string, the first to determine the stingiest match to the *entire* expression; then the second to get the greediest match to the right half of the expression starting with the first character left after the first pass.
Re-read what you just wrote there, particularly the emphasis on "entire".
If the string we're grepping is this:
boundary="abcdefgh"
What is the shortest possible (stingy) match to our two regex's?
You're asking the wrong question. The correct question is "What is the shortest possible match that also results in the entire expression matching?"
boundary="*\/.+ # should match: boundary= # because "* can be null
boundary="*\/[^"]+ # shoud also match: boundary=
Wrong. Are you familiar with Perl? The equivalent expression is this: boundary="*?[^"]+ In other words, you need to consider what would be matched with the \/ operator removed from the expression, and then insert the break between matched portions as far to the left as possible.
In both cases, the remaining unmatched part of the string is:
"abcdefgh"
No, there is no "remaining unmatched portion". Emphasis on "entire", above.
And the right half of our two regex's (greedy) evaluate as:
.+ # matches: "abcdefgh"
[^"]+ # matches: abcdefgh
This last bit is what really took me some time to realize... It's the operation of the + that forces it to match at least one non-" character... meaning that it will, in fact, **skip over** double-quote characters in the remaining string until it finds at least one non-" character, and then match the rest of the string until it hits another double quote! <sigh>
IS THAT RIGHT??? Please!?
No. The + forces [^"] to match at least one non-double-quote, which (when matching the entire string on the first pass) forces "* to consume the double-quote as part of the left portion.
And, going back to my very first attempts at this regex, I tried the ? modifier instead of * and it works too!
Same explanation.