Bug 222270 - gawk always ignore case
Summary: gawk always ignore case
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: gawk
Version: 4.3
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Karel Zak
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-01-11 10:00 UTC by Horst Ulrich Stolz
Modified: 2007-11-17 01:14 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-08-14 13:38:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Horst Ulrich Stolz 2007-01-11 10:00:48 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)

Description of problem:
When im using awk/gawk it handle case sensitivity badly and the installed
gawk work always in a way case insensitive. I found this problem also on early Redhat-Versions (reproducable also with Redhat7).


Version-Release number of selected component (if applicable):
gawk-3.1.3-10.1

How reproducible:
Always


Steps to Reproduce:
1. Create the following files:
---bug.awk:
/^[a-z]/ {
    print "lowercase ",$0;
}
/^[A-Z]/ {
    print "UPPERCASE ",$0;
}
---bug.dat:
abc
XYZ

2. Execute "gawk <bug.dat -f bug.awk"
3. Execute "gawk -v IGNORECASE=1 <bug.dat -f bug.awk"
4. Execute "gawk -v IGNORECASE=0 <bug.dat -f bug.awk"

Actual Results:
2:> awk <bug.dat -f bug.awk
lowercase  abc
lowercase  XYZ
UPPERCASE  XYZ
3:> awk -v IGNORECASE=1 < bug.dat -f bug.awk
lowercase  abc
UPPERCASE  abc
lowercase  XYZ
UPPERCASE  XYZ
4:> awk -v IGNORECASE=0 < bug.dat -f bug.awk
lowercase  abc
UPPERCASE  abc
lowercase  XYZ
UPPERCASE  XYZ

You can see two bugs:
Command 2: awk think that XYZ matches /^[a-z]/ => thats wrong!
Command 4: switching off IGNORECASE dont work!!!!

Expected Results:
The following output of version 3.0.3 of gawk is in my eyes the CORRECT output:
2:> awk < bug.dat -f bug.awk
lowercase  abc
UPPERCASE  XYZ
3:> awk -v IGNORECASE=1 < bug.dat -f bug.awk
lowercase  abc
UPPERCASE  abc
lowercase  XYZ
UPPERCASE  XYZ
4:> awk -v IGNORECASE=0 < bug.dat -f bug.awk
lowercase  abc
UPPERCASE  XYZ


Additional info:
To reproduce the problem I use the locale de_DE.ISO8859-1 but the problem exists also in other locales. With SUSE9.3 bug of command 2) don't exist, but bug with command 4)!

Comment 1 Karel Zak 2007-02-12 11:04:50 UTC
gawk -v IGNORECASE=0 '/[[:upper:]]/ { print $0 }'
                        ^^^^^^^^^
The [a-z] defines range and not class of chars. It doesn't work correctly with
i18n environment, because locale sequence is defined as [aAbBcC..zZ] -- so [A-Z]
= [AbBCc..zZ].

You have to use [[:upper:]] or [[:lower:]]. (The old gawk 3.0.3 doesn't support
locales)


Comment 2 Horst Ulrich Stolz 2007-08-14 09:48:26 UTC
Exact [a-z] defines a range of chars! By mixing uppercase/lowercase in this 
range and ignoring the IGNORECASE variable awk scripts won't work!

Also in an i18n environment, when the local sequence is [aAbBcC..zZ] than [a-
z]  means [aAbBcC..z] without Z and [A-Z] means [AbCc..zZ] without a!!!!!! And 
thats simple a design bug!

And - :upper: ist locale dependend - I want a,b,c,d, AND not e.g. abc...zäöü in 
erman locale So the only and working solution is to write 
[abcdefghijklmnopqrstuvwxyz] instead of the shortcut [a-z]. So with this change 
the useful - in [] expressions (sure some compatibility issues exists) becomes 
useless and dangerous!!!!!!!!!!!!!!!!!!

Comment 3 Karel Zak 2007-08-14 13:38:08 UTC
I don't think so. Well, ask upstream (bug-gawk) for the change.


Note You need to log in before you can comment on or make changes to this bug.