Bug 1993930
| Summary: | glibc: Comparison problems with strcasecmp in locales with ISO-8859-* encoding. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Paulo Andrade <pandrade> |
| Component: | glibc | Assignee: | Florian Weimer <fweimer> |
| Status: | CLOSED ERRATA | QA Contact: | Sergey Kolosov <skolosov> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.9 | CC: | ashankar, codonell, dj, fweimer, jreznik, mcermak, mnewsome, pfrankli, sipoyare |
| Target Milestone: | rc | Keywords: | Bugfix, Patch, Triaged, ZStream |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | glibc-2.17-325.el7_9 | Doc Type: | Bug Fix |
| Doc Text: |
A defect in strcasecmp can cause non-ASCII single-byte characters from specific character sets including ISO-8859-1, and ISO-8859-15 to be incorrectly sorted. In order to preserve sorted data which is at rest the default comparison is left unchanged and consistent with existing configurations. A new /etc/sysconfig/strcasecmp-nonascii configuration file is used to enable the newer correct comparison which is compatible with Red Hat Enterprise Linux 6 and Red hat Enterprise Linux 8. Deployments which need a strcasecmp that matches previous and new releases should create the file /etc/sysconfig/strcasecmp-nonascii.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-10-12 15:30:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
Paulo, Thanks for the report. I'm reviewing. Created attachment 1815464 [details]
Alternative reproducer
This version removes the encoding ambiguity (no more 8-bit characters in the source), and it logs _NL_CTYPE_NONASCII_CASE.
This was fixed upstream in:
commit 45b8acccaf43ec06d31413c75a8f1737ae3ff0e2
Author: Andreas Schwab <schwab>
Date: Wed Jul 17 10:26:58 2013 +0200
Fix missing declaration of LC_CTYPE nonascii-case element
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=15736
This commit went into glibc 2.19.
This commit changes the return value of nl_langinfo (_NL_CTYPE_NONASCII_CASE) from a pointer to an uint32_t to uint32_t (cast to char *). This not a backportable change.
What we can do is change the assembler implementations to follow the additional indirection. The POWER8 implementation appears impacted as well. If we can trust the locale data, this should be quite safe.
strcasecmp remains utterly broken in multi-byte locales because it is based on tolower applied to individual bytes, not characters. This is something that would have to be fixed upstream. I doubt we would backport such a change.
Created attachment 1815739 [details]
0001-Improve-non-ASCII-locales-in-string-comparison-19939.patch
This is what I've got so far, but it's not ready yet. There is a posix/tst-fnmatch failure on i686 which I still need to investigate.
We also need to backport this additional fix:
commit 5d228a436a8257f082e84671bf8c89b79a2c3853
Author: Andreas Schwab <schwab>
Date: Mon Jul 29 14:58:41 2013 +0200
Fix handling LC_CTYPE nonascii-case fallback in i686 SSE4.2 and SSSE3 strcasecmp/strncasecmp
Created attachment 1816116 [details]
1815739: 0001-Improve-non-ASCII-locales-in-string-comparison-19939.patch
New patch with i686 crash fix with additional (straight from upstream) backport.
Carlos pointed out that this change may invalidate on-disk database indexes because the reported ordering from strcasecmp changes. We really avoid doing that in minor releases. So we probably should not release this fix to 7.9.z because it is way too risky. I thought about this some more. There is no way we can ship the requested behavioral change as a general update to all customers. Even if we successfully mitigate all the risk from unwanted side effects (getting the assembler implementation right was surprisingly difficult), the *expected* behavioral change will be extremely surprising to some customers. Most customers probably will not notice because they use UTF-8 and the (broken) case conversion in strcasecmp does nothing for the non-ASCII parts of UTF-8, but it is very likely that some customers rely on the current (broken) behavior. So I think the best we can do here is to offer a switch to change behavior, but leave the default unchanged. Paulo, would you please ask the customer whether it solves their issue if we provided a glibc update that only conditionally implements the requested behavior? For example, they would have to create a file /etc/sysconfig/strcasecmp-nonascii to activate the different strcasecmp implementation. Thanks. Customer is fine with the suggested workaround, for example, having a file named /etc/sysconfig/strcasecmp-nonascii and when this file exists, rhel-7.9 glibc works like rhel6 or rhel8 glibc, that is, they do not need to special case their code for rhel7. Created attachment 1817195 [details]
0001-Support-etc-sysconfig-strcasecmp-nonascii-1993930.patch
This is the patch I currently have. My testing shows that the function selection is as expected.
It's x86-only (x86_64 and i686). I hope this will be sufficient.
My test program looks like this:
#define _GNU_SOURCE
#include <string.h>
#include <stdio.h>
int (*ptr_strcasecmp) (const char *, const char *) = strcasecmp;
int (*ptr_strncasecmp) (const char *, const char *, size_t) = strncasecmp;
int (*ptr_strcasecmp_l) (const char *, const char *, locale_t) = strcasecmp_l;
int (*ptr_strncasecmp_l) (const char *, const char *, size_t, locale_t) =
strncasecmp_l;
int
main (void)
{
printf ("strcasecmp = %p\n", ptr_strcasecmp);
printf ("strncasecmp = %p\n", ptr_strncasecmp);
printf ("strcasecmp_l = %p\n", ptr_strcasecmp_l);
printf ("strncasecmp_l = %p\n", ptr_strncasecmp_l);
return 0;
}
I run it under GDB, and then check if the addresses are reasonable.
This is without the file:
strcasecmp = 0xf7f4ee00
strncasecmp = 0xf7f4f300
strcasecmp_l = 0xf7f4ee30
strncasecmp_l = 0xf7f4f330
[Inferior 1 (process 37) exited normally]
(gdb) print (void*) 0xf7f4ee00
$1 = (void *) 0xf7f4ee00 <__strcasecmp_sse4_2>
(gdb) print (void*) 0xf7f4f300
$2 = (void *) 0xf7f4f300 <__strncasecmp_sse4_2>
(gdb) print (void*) 0xf7f4ee30
$3 = (void *) 0xf7f4ee30 <__strcasecmp_l_sse4_2>
(gdb) print (void*) 0xf7f4f330
$4 = (void *) 0xf7f4f330 <__strncasecmp_l_sse4_2>
And with the file:
strcasecmp = 0xf7e9eb60
strncasecmp = 0xf7e9ec20
strcasecmp_l = 0xf7e9eb10
strncasecmp_l = 0xf7e9ebc0
[Inferior 1 (process 30) exited normally]
(gdb) print (void*) 0xf7e9eb60
$1 = (void *) 0xf7e9eb60 <__strcasecmp_nonascii>
(gdb) print (void*) 0xf7e9ec20
$2 = (void *) 0xf7e9ec20 <__strncasecmp_nonascii>
(gdb) print (void*) 0xf7e9eb10
$3 = (void *) 0xf7e9eb10 <__strcasecmp_l_nonascii>
(gdb) print (void*) 0xf7e9ebc0
$4 = (void *) 0xf7e9ebc0 <__strncasecmp_l_nonascii>
Thanks for the test packages! User confirms the problem is corrected, and is happy with the provided patch. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (glibc bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3803 |
Created attachment 1814415 [details] testchr.c Issue apparently specific to rhel7 or glibc-2.17. Does not happen with rhel6, rhel8 or fedora. With the attached testchr.c: """ $ gcc -o testchr testchr.c $ ./testchr de_DE.iso885915@euro Locale de_DE.iso885915@euro codeset is ISO-8859-15 Before toupper [swadmin;Testä3@philde|R;R;U;L;131072;64861] After toupper [SWADMIN;TESTÄ3@PHILDE|R;R;U;L;131072;64861] no match $ ./testchr de_DE.iso88591 Locale de_DE.iso88591 codeset is ISO-8859-1 Before toupper [swadmin;Testä3@philde|R;R;U;L;131072;64861] After toupper [SWADMIN;TESTÄ3@PHILDE|R;R;U;L;131072;64861] no match $ ./testchr de_DE.utf8 Locale de_DE.utf8 codeset is UTF-8 Before toupper [swadmin;Testä3@philde|R;R;U;L;131072;64861] After toupper [SWADMIN;TESTä3@PHILDE|R;R;U;L;131072;64861] Match """ Workaround possible for user is to use an UTF8 locale. Suggested workaround is to not mix toupper and strcasecmp, that is, use the same approach to convert and compare characters, to avoid possible issues with special multibyte sequences, and use the patch: """ --- testchr.c.orig 2021-08-10 08:41:58.830009510 -0400 +++ testchr.c 2021-08-10 09:04:06.716204990 -0400 @@ -21,17 +21,27 @@ { int result; +#if 0 if ( (result = strcasecmp(a, b)) > 0) return 1; else if (result < 0) return -1; else return 0; +#else + char *ap, *bp; + for (ap = a, bp = b; *ap && *bp; ++ap, ++bp) { + int ac = toupper(*ap); + int bc = toupper(*bp); + if (ac != bc) + return ac > bc ? 1 : -1; + } + return *ap ? 1 : *bp ? -1 : 0; +#endif } int main(int argc, char **argv) { - if (argc !=2) { printf("Usage %s <lcoale>\n",argv[0]); """