Bug 45685 - Incomplete/broken MBS_SUPPORT in regexp.c
Summary: Incomplete/broken MBS_SUPPORT in regexp.c
Alias: None
Product: Red Hat Raw Hide
Classification: Retired
Component: glibc   
(Show other bugs)
Version: 1.0
Hardware: i386
OS: Linux
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Aaron Brown
Depends On:
TreeView+ depends on / blocked
Reported: 2001-06-24 18:15 UTC by Leonid Kanter
Modified: 2016-11-24 15:22 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2001-06-29 16:03:04 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
this test program dumps core on glibc with MBS_SUPPORT (918 bytes, patch)
2001-06-24 18:21 UTC, Leonid Kanter
no flags Details | Diff
Modified test program that run OK. (1.06 KB, patch)
2001-06-24 18:28 UTC, Leonid Kanter
no flags Details | Diff

Description Leonid Kanter 2001-06-24 18:15:41 UTC
Description of Problem:

Incomplete/broken MBS_SUPPORT in regexp.c makes impossible regexp functions
usage if pattern contains national characters (> 0x80)

How Reproducible:


Steps to Reproduce:

# rpm -qf /bin/grep
# LC_ALL=en_US.ISO8859-1  egrep -i '?[???]' /etc/passwd
Segmentation fault (core dumped)

(bigzilla may show 8-bit characters in egrep argument as '?')

Here is the beginning for regex_compile function:
2265 #ifdef MBS_SUPPORT
2266 regex_compile (cpattern, csize, syntax, bufp)
2267      const char *cpattern;
2268      size_t csize;
2269 #else
2270 regex_compile (pattern, size, syntax, bufp)
2271      const char *pattern;
2272      size_t size;
2273 #endif /* MBS_SUPPORT */
Clearly if MBS_SUPPORT is defined the meaning of varialbles is complely
However none of the places where this function is called has been changed
and even does strlen on a pattern string which could be in multybyte format.

This effectively breaks grep and possibly other programs when pattern
contains national characters (> 0x80). Problem was actually found while
trying gnumeric-0.65.
What is the recomended solution?

Additional Information:
see two attached examples. If glibc is compiled with MBS_SUPPORT, first
example dumps core. If without MBS_SUPPORT, both examples run OK.

Comment 1 Leonid Kanter 2001-06-24 18:21:45 UTC
Created attachment 21666 [details]
this test program dumps core on glibc with MBS_SUPPORT

Comment 2 Leonid Kanter 2001-06-24 18:28:46 UTC
Created attachment 21667 [details]
Modified test program that run OK.

Comment 3 Leonid Kanter 2001-06-24 18:32:57 UTC
one more attempt to submit 8-bit string to bugzilla:
LC_ALL=en_US.ISO8859-1  egrep -i '?[???]' /etc/passwd

Comment 4 Eugene Kanter 2001-06-24 21:44:07 UTC
From the code example in this bug example is not clear how


treats its arguments differently. Further down in the code one can see that 
cpattern is actually expected in wchar_t format, csize is ignored  and real 
size computed from scratch and bufp also returned with wchar_t format.

Again, no call to regex_compile is changed, thus creating problems to 
unsuspecting regex users.

Even if patch is made for good purpose it is extremely strange to define 
function arguments as const char * and process them as const wchar_t * ....

Comment 5 Jakub Jelinek 2001-06-25 08:41:31 UTC
There is a bug, but not where you think.
If you read the code carefully, you'll notice that arguments to regex_compile
is really const char *, not wchar_t *, a multi-byte string and regex_compile
converts that internally using convert_mbs_to_wc to wide string.
The bug is that convert_mbs_to_wc optimizes for MB_CUR_MAX == 1 by skipping
mbrtowc conversion (only sign extends into wchar_t), but when computing
fastmap truncate_wchar does not check MB_CUR_MAX == 1.
Since June, 18th, glibc's regex.c does things differently (it is compiled three times,
once for MB_CUR_MAX==1, once for MB_CUR_MAX > 1 and once glue code which picks
up the right implementation, so this bug should supposedly be fixed already.
Am just building current CVS snapshot to see if that's the case.

Comment 6 Jakub Jelinek 2001-06-25 10:43:40 UTC
The post-2001-06-18 regex.c code seems to fix it as I've expected.

Comment 7 Leonid Kanter 2001-06-29 15:59:24 UTC
2.2.3-12 solves the problem. But there is another problem in this rpm.

I tried to rebuild glibc-2.2.3-12 for target=i686 on a box running kernel 2.2.19
and found that patch glibc-kernel-2.4.patch is already applied to glibc tarball.
+ /bin/chmod -Rf a+rX,g-w,o-w .
++ uname -r
+ echo 'Patch #0 (glibc-kernel-2.4.patch):'
Patch #0 (glibc-kernel-2.4.patch):
+ patch -p1 -s
Reversed (or previously applied) patch detected!  Assume -R? [n]

Comment 8 Jakub Jelinek 2001-06-29 16:03:01 UTC
Yep, that's a bug which will be fixed in glibc-2.2.3-13 (together with
regex fixes for multi-byte character sets).

Note You need to log in before you can comment on or make changes to this bug.