Bug 137832 - awk aborts under certain locales
Summary: awk aborts under certain locales
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: gawk   
(Show other bugs)
Version: 4.0
Hardware: All
OS: Linux
Target Milestone: ---
: ---
Assignee: Karel Zak
QA Contact: Brock Organ
Depends On:
Blocks: 135876
TreeView+ depends on / blocked
Reported: 2004-11-01 21:39 UTC by Paul Clements
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-11-17 17:06:20 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
fix? (387 bytes, patch)
2004-11-02 17:35 UTC, Karel Zak
no flags Details | Diff

Description Paul Clements 2004-11-01 21:39:03 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040116

Description of problem:
With LANG=en_US, awk gets a fatal internal error.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
# rpm -q redhat-release

# rpm -q gawk

# LANG=en_US awk -F "[[:alnum:]]" ' { print } ' < /etc/fstab
awk: fatal error: internal error

# LANG=en_US.UTF-8 awk -F "[[:alnum:]]" ' { print } ' < /etc/fstab
LABEL=/                 /                       ext3    defaults     
  1 1
LABEL=/boot             /boot                   ext3    defaults     
  1 2
none                    /dev/pts                devpts  gid=5,mode=620
 0 0
none                    /dev/shm                tmpfs   defaults     
  0 0
none                    /proc                   proc    defaults     
  0 0
none                    /sys                    sysfs   defaults     
  0 0
/dev/sda2               swap                    swap    defaults     
  0 0

Actual Results:  with LANG=en_US, awk aborts
with LANG=en_US.UTF-8, awk succeeds

Expected Results:  awk should succeed in both cases

Additional info:

Comment 1 Karel Zak 2004-11-02 17:32:10 UTC
The problem is in the file regcomp.c in the function build_charclass(). 

There is defined macro BUILD_CHARCLASS_LOOP() that does chars
translation by casetable[] (defined in eval.c). But this table can
contains negative numbers too. If you use negative numbers for
bitset_set() that is inside BUILD_CHARCLASS_LOOP() the "awk" crashs.

The contributed patch resolve this problem, but I unsure if use
unsigned char is best way, because casetable[] maybe expects negative
chars. Maybe better way will rewrite bitset_set(sbcset...), but I
think a lot of things depend on it.

All tests pass with the patch, but tests works with LANG=C only...

Comment 2 Karel Zak 2004-11-02 17:35:10 UTC
Created attachment 106074 [details]

Comment 3 Karel Zak 2004-11-02 17:44:04 UTC
Note: maybe instead strange casetable[] in eval.c build translation
table (in re.c: make_regexp()) by same way as the "sed":

       for (i = 0; i < sizeof(translate) / sizeof(char); i++)
          translate[i] = tolower (i);

       new_regex->translate = translate;

for me this looks better than static definition in the "awk"...

Comment 5 Karel Zak 2004-11-04 16:42:13 UTC
Florian, thanks for link. The problem is definitely with
unsigned/signed casetable (RE_TRANSLATE_TYPE).

Fixed in the devel tree (FC4).

Comment 6 Karel Zak 2004-11-04 18:02:19 UTC
Fixed in the RHEL-4-HEAD too.

Note You need to log in before you can comment on or make changes to this bug.