Bug 137832 - awk aborts under certain locales
awk aborts under certain locales
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: gawk (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Karel Zak
Brock Organ
Depends On:
Blocks: 135876
  Show dependency treegraph
Reported: 2004-11-01 16:39 EST by Paul Clements
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-11-17 12:06:20 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
fix? (387 bytes, patch)
2004-11-02 12:35 EST, Karel Zak
no flags Details | Diff

  None (edit)
Description Paul Clements 2004-11-01 16:39:03 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040116

Description of problem:
With LANG=en_US, awk gets a fatal internal error.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
# rpm -q redhat-release

# rpm -q gawk

# LANG=en_US awk -F "[[:alnum:]]" ' { print } ' < /etc/fstab
awk: fatal error: internal error

# LANG=en_US.UTF-8 awk -F "[[:alnum:]]" ' { print } ' < /etc/fstab
LABEL=/                 /                       ext3    defaults     
  1 1
LABEL=/boot             /boot                   ext3    defaults     
  1 2
none                    /dev/pts                devpts  gid=5,mode=620
 0 0
none                    /dev/shm                tmpfs   defaults     
  0 0
none                    /proc                   proc    defaults     
  0 0
none                    /sys                    sysfs   defaults     
  0 0
/dev/sda2               swap                    swap    defaults     
  0 0

Actual Results:  with LANG=en_US, awk aborts
with LANG=en_US.UTF-8, awk succeeds

Expected Results:  awk should succeed in both cases

Additional info:
Comment 1 Karel Zak 2004-11-02 12:32:10 EST
The problem is in the file regcomp.c in the function build_charclass(). 

There is defined macro BUILD_CHARCLASS_LOOP() that does chars
translation by casetable[] (defined in eval.c). But this table can
contains negative numbers too. If you use negative numbers for
bitset_set() that is inside BUILD_CHARCLASS_LOOP() the "awk" crashs.

The contributed patch resolve this problem, but I unsure if use
unsigned char is best way, because casetable[] maybe expects negative
chars. Maybe better way will rewrite bitset_set(sbcset...), but I
think a lot of things depend on it.

All tests pass with the patch, but tests works with LANG=C only...
Comment 2 Karel Zak 2004-11-02 12:35:10 EST
Created attachment 106074 [details]
Comment 3 Karel Zak 2004-11-02 12:44:04 EST
Note: maybe instead strange casetable[] in eval.c build translation
table (in re.c: make_regexp()) by same way as the "sed":

       for (i = 0; i < sizeof(translate) / sizeof(char); i++)
          translate[i] = tolower (i);

       new_regex->translate = translate;

for me this looks better than static definition in the "awk"...
Comment 5 Karel Zak 2004-11-04 11:42:13 EST
Florian, thanks for link. The problem is definitely with
unsigned/signed casetable (RE_TRANSLATE_TYPE).

Fixed in the devel tree (FC4).
Comment 6 Karel Zak 2004-11-04 13:02:19 EST
Fixed in the RHEL-4-HEAD too.

Note You need to log in before you can comment on or make changes to this bug.