Hide Forgot
Description of problem: regular expressions when using '-' in alpha range will match out of the given range for example: [a-z] will match [A-Ya-z] Version-Release number of selected component (if applicable): pcre-7.8.3.fc12.i686 -yes, this fedora 13 How reproducible: cat << MEL >foo AA aa BB bb CC cc XX xx YY yy ZZ zz MEL echo ----- grep UPPER ----- grep '[A-Z]' foo echo ----- grep lower ----- grep '[a-z]' foo Steps to Reproduce: 1. grep a regular expression alpha range with '-' to match just upper case lines -ie: '[A-Z]' 2. grep a regular expression alpha range with '-' to match just lower case lines - ie: '[a-z]' Actual results: ----- grep [A-Z] ----- AA BB bb CC cc XX xx YY yy ZZ zz ----- grep [a-z] ----- AA aa BB bb CC cc XX xx YY yy zz Expected results: ----- grep [A-Z] ----- AA BB CC XX YY ZZ ----- grep [a-z] ----- aa bb cc xx yy zz Additional info: '[BC]' works correctly, but '[B-C]' behaves as if it were '[B-Cc]' [A-Z] acts is if it is [A-Zb-z] [a-z] acts as if it is [A-Ya-z] Behaves identical to the gawk bug: https://bugzilla.redhat.com/show_bug.cgi?id=683519
Thank you for report, however: (1) You call grep. grep does not use libpcre, if not asked by `-P' option. Otherwise POSIX basic (by default) or extended regular matching is performed by regex(3) call to standard library: $ printf 'A\na\nB\nb\nC\nc\n' | grep '[a-z]' a B b C c $ printf 'A\na\nB\nb\nC\nc\n' | grep -P '[a-z]' a b c (2) If you interpret the expression as PCRE (grep -P, pcregrep), you get expected behaviour: $ printf 'A\na\nB\nb\nC\nc\n' | pcregrep '[a-z]' a b c (3) Results of your grep command depend on locale. The `[a-z]' does not mean all lower case letters. It means characters with ordinal number between oridnals of `a' and `z'. That's equivalent in C locale: $ printf 'A\na\nB\nb\nC\nc\n' | LANG=C grep '[a-z]' a b c But does not have to be in any other, e.g.: $ printf 'A\na\nB\nb\nC\nc\n' | LANG=cs_CZ.UTF-8 grep '[a-z]' a B b C c Use `[[:lower:]]' to express lower-cased letters disregarding current locale: $ printf 'A\na\nB\nb\nC\nc\n' | LANG=cs_CZ.UTF-8 grep '[[:lower:]]' a b c As you can see the result depends on collating of locale and using range operator `-' is undefined out of C locale. regex(7): If two characters in the list are separated by '-', this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, for example, "[0-9]" in ASCII matches any decimal digit. [...] Ranges are very collating- -sequence-dependent, and portable programs should avoid relying on them. See POSIX / Single UNIX Specification for more details. Also there is heavily-commented bug in this bug tracking system about this issue, that I cannot find right now. Reassigning to grep component and closing a not a bug.