Bug 222270 - gawk always ignore case
gawk always ignore case
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: gawk (Show other bugs)
4.3
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Karel Zak
Brock Organ
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-01-11 05:00 EST by Horst Ulrich Stolz
Modified: 2007-11-16 20:14 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-08-14 09:38:08 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Horst Ulrich Stolz 2007-01-11 05:00:48 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)

Description of problem:
When im using awk/gawk it handle case sensitivity badly and the installed
gawk work always in a way case insensitive. I found this problem also on early Redhat-Versions (reproducable also with Redhat7).


Version-Release number of selected component (if applicable):
gawk-3.1.3-10.1

How reproducible:
Always


Steps to Reproduce:
1. Create the following files:
---bug.awk:
/^[a-z]/ {
    print "lowercase ",$0;
}
/^[A-Z]/ {
    print "UPPERCASE ",$0;
}
---bug.dat:
abc
XYZ

2. Execute "gawk <bug.dat -f bug.awk"
3. Execute "gawk -v IGNORECASE=1 <bug.dat -f bug.awk"
4. Execute "gawk -v IGNORECASE=0 <bug.dat -f bug.awk"

Actual Results:
2:> awk <bug.dat -f bug.awk
lowercase  abc
lowercase  XYZ
UPPERCASE  XYZ
3:> awk -v IGNORECASE=1 < bug.dat -f bug.awk
lowercase  abc
UPPERCASE  abc
lowercase  XYZ
UPPERCASE  XYZ
4:> awk -v IGNORECASE=0 < bug.dat -f bug.awk
lowercase  abc
UPPERCASE  abc
lowercase  XYZ
UPPERCASE  XYZ

You can see two bugs:
Command 2: awk think that XYZ matches /^[a-z]/ => thats wrong!
Command 4: switching off IGNORECASE dont work!!!!

Expected Results:
The following output of version 3.0.3 of gawk is in my eyes the CORRECT output:
2:> awk < bug.dat -f bug.awk
lowercase  abc
UPPERCASE  XYZ
3:> awk -v IGNORECASE=1 < bug.dat -f bug.awk
lowercase  abc
UPPERCASE  abc
lowercase  XYZ
UPPERCASE  XYZ
4:> awk -v IGNORECASE=0 < bug.dat -f bug.awk
lowercase  abc
UPPERCASE  XYZ


Additional info:
To reproduce the problem I use the locale de_DE.ISO8859-1 but the problem exists also in other locales. With SUSE9.3 bug of command 2) don't exist, but bug with command 4)!
Comment 1 Karel Zak 2007-02-12 06:04:50 EST
gawk -v IGNORECASE=0 '/[[:upper:]]/ { print $0 }'
                        ^^^^^^^^^
The [a-z] defines range and not class of chars. It doesn't work correctly with
i18n environment, because locale sequence is defined as [aAbBcC..zZ] -- so [A-Z]
= [AbBCc..zZ].

You have to use [[:upper:]] or [[:lower:]]. (The old gawk 3.0.3 doesn't support
locales)
Comment 2 Horst Ulrich Stolz 2007-08-14 05:48:26 EDT
Exact [a-z] defines a range of chars! By mixing uppercase/lowercase in this 
range and ignoring the IGNORECASE variable awk scripts won't work!

Also in an i18n environment, when the local sequence is [aAbBcC..zZ] than [a-
z]  means [aAbBcC..z] without Z and [A-Z] means [AbCc..zZ] without a!!!!!! And 
thats simple a design bug!

And - :upper: ist locale dependend - I want a,b,c,d, AND not e.g. abc...zäöü in 
erman locale So the only and working solution is to write 
[abcdefghijklmnopqrstuvwxyz] instead of the shortcut [a-z]. So with this change 
the useful - in [] expressions (sure some compatibility issues exists) becomes 
useless and dangerous!!!!!!!!!!!!!!!!!!
Comment 3 Karel Zak 2007-08-14 09:38:08 EDT
I don't think so. Well, ask upstream (bug-gawk@gnu.org) for the change.

Note You need to log in before you can comment on or make changes to this bug.