Bug 137832

Summary: awk aborts under certain locales
Product: Red Hat Enterprise Linux 4 Reporter: Paul Clements <paul.clements>
Component: gawkAssignee: Karel Zak <kzak>
Status: CLOSED RAWHIDE QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: kzak
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-11-17 17:06:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 135876    
Attachments:
Description Flags
fix? none

Description Paul Clements 2004-11-01 21:39:03 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040116

Description of problem:
With LANG=en_US, awk gets a fatal internal error.

Version-Release number of selected component (if applicable):
gawk-3.1.3-9

How reproducible:
Always

Steps to Reproduce:
# rpm -q redhat-release
redhat-release-3.94AS-1

# rpm -q gawk
gawk-3.1.3-9

# LANG=en_US awk -F "[[:alnum:]]" ' { print } ' < /etc/fstab
awk: fatal error: internal error
Aborted

# LANG=en_US.UTF-8 awk -F "[[:alnum:]]" ' { print } ' < /etc/fstab
LABEL=/                 /                       ext3    defaults     
  1 1
LABEL=/boot             /boot                   ext3    defaults     
  1 2
none                    /dev/pts                devpts  gid=5,mode=620
 0 0
none                    /dev/shm                tmpfs   defaults     
  0 0
none                    /proc                   proc    defaults     
  0 0
none                    /sys                    sysfs   defaults     
  0 0
/dev/sda2               swap                    swap    defaults     
  0 0


Actual Results:  with LANG=en_US, awk aborts
with LANG=en_US.UTF-8, awk succeeds

Expected Results:  awk should succeed in both cases

Additional info:

Comment 1 Karel Zak 2004-11-02 17:32:10 UTC
The problem is in the file regcomp.c in the function build_charclass(). 

There is defined macro BUILD_CHARCLASS_LOOP() that does chars
translation by casetable[] (defined in eval.c). But this table can
contains negative numbers too. If you use negative numbers for
bitset_set() that is inside BUILD_CHARCLASS_LOOP() the "awk" crashs.

The contributed patch resolve this problem, but I unsure if use
unsigned char is best way, because casetable[] maybe expects negative
chars. Maybe better way will rewrite bitset_set(sbcset...), but I
think a lot of things depend on it.

All tests pass with the patch, but tests works with LANG=C only...

Comment 2 Karel Zak 2004-11-02 17:35:10 UTC
Created attachment 106074 [details]
fix?

Comment 3 Karel Zak 2004-11-02 17:44:04 UTC
Note: maybe instead strange casetable[] in eval.c build translation
table (in re.c: make_regexp()) by same way as the "sed":

       for (i = 0; i < sizeof(translate) / sizeof(char); i++)
          translate[i] = tolower (i);

       new_regex->translate = translate;

for me this looks better than static definition in the "awk"...

Comment 5 Karel Zak 2004-11-04 16:42:13 UTC
Florian, thanks for link. The problem is definitely with
unsigned/signed casetable (RE_TRANSLATE_TYPE).

Fixed in the devel tree (FC4).

Comment 6 Karel Zak 2004-11-04 18:02:19 UTC
Fixed in the RHEL-4-HEAD too.