From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322) Description of problem: When im using awk/gawk it handle case sensitivity badly and the installed gawk work always in a way case insensitive. I found this problem also on early Redhat-Versions (reproducable also with Redhat7). Version-Release number of selected component (if applicable): gawk-3.1.3-10.1 How reproducible: Always Steps to Reproduce: 1. Create the following files: ---bug.awk: /^[a-z]/ { print "lowercase ",$0; } /^[A-Z]/ { print "UPPERCASE ",$0; } ---bug.dat: abc XYZ 2. Execute "gawk <bug.dat -f bug.awk" 3. Execute "gawk -v IGNORECASE=1 <bug.dat -f bug.awk" 4. Execute "gawk -v IGNORECASE=0 <bug.dat -f bug.awk" Actual Results: 2:> awk <bug.dat -f bug.awk lowercase abc lowercase XYZ UPPERCASE XYZ 3:> awk -v IGNORECASE=1 < bug.dat -f bug.awk lowercase abc UPPERCASE abc lowercase XYZ UPPERCASE XYZ 4:> awk -v IGNORECASE=0 < bug.dat -f bug.awk lowercase abc UPPERCASE abc lowercase XYZ UPPERCASE XYZ You can see two bugs: Command 2: awk think that XYZ matches /^[a-z]/ => thats wrong! Command 4: switching off IGNORECASE dont work!!!! Expected Results: The following output of version 3.0.3 of gawk is in my eyes the CORRECT output: 2:> awk < bug.dat -f bug.awk lowercase abc UPPERCASE XYZ 3:> awk -v IGNORECASE=1 < bug.dat -f bug.awk lowercase abc UPPERCASE abc lowercase XYZ UPPERCASE XYZ 4:> awk -v IGNORECASE=0 < bug.dat -f bug.awk lowercase abc UPPERCASE XYZ Additional info: To reproduce the problem I use the locale de_DE.ISO8859-1 but the problem exists also in other locales. With SUSE9.3 bug of command 2) don't exist, but bug with command 4)!
gawk -v IGNORECASE=0 '/[[:upper:]]/ { print $0 }' ^^^^^^^^^ The [a-z] defines range and not class of chars. It doesn't work correctly with i18n environment, because locale sequence is defined as [aAbBcC..zZ] -- so [A-Z] = [AbBCc..zZ]. You have to use [[:upper:]] or [[:lower:]]. (The old gawk 3.0.3 doesn't support locales)
Exact [a-z] defines a range of chars! By mixing uppercase/lowercase in this range and ignoring the IGNORECASE variable awk scripts won't work! Also in an i18n environment, when the local sequence is [aAbBcC..zZ] than [a- z] means [aAbBcC..z] without Z and [A-Z] means [AbCc..zZ] without a!!!!!! And thats simple a design bug! And - :upper: ist locale dependend - I want a,b,c,d, AND not e.g. abc...zäöü in erman locale So the only and working solution is to write [abcdefghijklmnopqrstuvwxyz] instead of the shortcut [a-z]. So with this change the useful - in [] expressions (sure some compatibility issues exists) becomes useless and dangerous!!!!!!!!!!!!!!!!!!
I don't think so. Well, ask upstream (bug-gawk) for the change.