1993930 – glibc: Comparison problems with strcasecmp in locales with ISO-8859-* encoding.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1993930 - glibc: Comparison problems with strcasecmp in locales with ISO-8859-* encoding.

Summary: glibc: Comparison problems with strcasecmp in locales with ISO-8859-* encoding.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	7.9
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Florian Weimer
QA Contact:	Sergey Kolosov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-16 12:28 UTC by Paulo Andrade
Modified:	2024-12-20 20:43 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glibc-2.17-325.el7_9
Doc Type:	Bug Fix
Doc Text:	A defect in strcasecmp can cause non-ASCII single-byte characters from specific character sets including ISO-8859-1, and ISO-8859-15 to be incorrectly sorted. In order to preserve sorted data which is at rest the default comparison is left unchanged and consistent with existing configurations. A new /etc/sysconfig/strcasecmp-nonascii configuration file is used to enable the newer correct comparison which is compatible with Red Hat Enterprise Linux 6 and Red hat Enterprise Linux 8. Deployments which need a strcasecmp that matches previous and new releases should create the file /etc/sysconfig/strcasecmp-nonascii.
Clone Of:
Environment:
Last Closed:	2021-10-12 15:30:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
testchr.c (1.56 KB, text/x-csrc) 2021-08-16 12:28 UTC, Paulo Andrade	no flags	Details
Alternative reproducer (1.65 KB, text/x-csrc) 2021-08-19 06:11 UTC, Florian Weimer	no flags	Details
0001-Improve-non-ASCII-locales-in-string-comparison-19939.patch (8.58 KB, patch) 2021-08-19 15:35 UTC, Florian Weimer	no flags	Details \| Diff
1815739: 0001-Improve-non-ASCII-locales-in-string-comparison-19939.patch (10.79 KB, patch) 2021-08-20 16:38 UTC, Florian Weimer	no flags	Details \| Diff
0001-Support-etc-sysconfig-strcasecmp-nonascii-1993930.patch (11.46 KB, patch) 2021-08-24 17:36 UTC, Florian Weimer	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-93641	None	None	None	2021-08-16 12:29:48 UTC
Red Hat Product Errata	RHBA-2021:3803	None	None	None	2021-10-12 15:30:48 UTC
Sourceware	15736	None	None	None	2021-08-19 06:40:52 UTC

Description Paulo Andrade 2021-08-16 12:28:45 UTC

Created attachment 1814415 [details]
testchr.c

Issue apparently specific to rhel7 or glibc-2.17.
  Does not happen with rhel6, rhel8 or fedora.

  With the attached testchr.c:

"""
$ gcc -o testchr testchr.c

$ ./testchr de_DE.iso885915@euro
Locale de_DE.iso885915@euro
codeset is ISO-8859-15
Before toupper [swadmin;Testä3@philde|R;R;U;L;131072;64861]
After toupper [SWADMIN;TESTÄ3@PHILDE|R;R;U;L;131072;64861]
no match

$ ./testchr de_DE.iso88591
Locale de_DE.iso88591
codeset is ISO-8859-1
Before toupper [swadmin;Testä3@philde|R;R;U;L;131072;64861]
After toupper [SWADMIN;TESTÄ3@PHILDE|R;R;U;L;131072;64861]
no match

$ ./testchr de_DE.utf8
Locale de_DE.utf8
codeset is UTF-8
Before toupper [swadmin;Testä3@philde|R;R;U;L;131072;64861]
After toupper [SWADMIN;TESTä3@PHILDE|R;R;U;L;131072;64861]
Match
"""

  Workaround possible for user is to use an UTF8 locale.

  Suggested workaround is to not mix toupper and strcasecmp, that is,
use the same approach to convert and compare characters, to avoid
possible issues with special multibyte sequences, and use the patch:

"""
--- testchr.c.orig	2021-08-10 08:41:58.830009510 -0400
+++ testchr.c	2021-08-10 09:04:06.716204990 -0400
@@ -21,17 +21,27 @@
 {
 int     result;
 
+#if 0
         if ( (result = strcasecmp(a, b)) > 0)
                 return 1;
         else if (result < 0)
                 return -1;
         else
                 return 0;
+#else
+	char *ap, *bp;
+	for (ap = a, bp = b; *ap && *bp; ++ap, ++bp) {
+		int ac = toupper(*ap);
+		int bc = toupper(*bp);
+		if (ac != bc)
+			return ac > bc ? 1 : -1;
+	}
+	return *ap ? 1 : *bp ?  -1 : 0;
+#endif
 }
 
 int main(int argc, char **argv)
 {
-
         if (argc !=2)
           {
             printf("Usage %s <lcoale>\n",argv[0]);
"""

Comment 4 Carlos O'Donell 2021-08-18 03:40:50 UTC

Paulo,

Thanks for the report. I'm reviewing.

Comment 6 Florian Weimer 2021-08-19 06:11:34 UTC

Created attachment 1815464 [details]
Alternative reproducer

This version removes the encoding ambiguity (no more 8-bit characters in the source), and it logs _NL_CTYPE_NONASCII_CASE.

Comment 7 Florian Weimer 2021-08-19 06:40:53 UTC

This was fixed upstream in:

commit 45b8acccaf43ec06d31413c75a8f1737ae3ff0e2
Author: Andreas Schwab <schwab>
Date:   Wed Jul 17 10:26:58 2013 +0200

    Fix missing declaration of LC_CTYPE nonascii-case element

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=15736

This commit went into glibc 2.19.

This commit changes the return value of nl_langinfo (_NL_CTYPE_NONASCII_CASE) from a pointer to an uint32_t to uint32_t (cast to char *). This not a backportable change.

What we can do is change the assembler implementations to follow the additional indirection. The POWER8 implementation appears impacted as well. If we can trust the locale data, this should be quite safe.

strcasecmp remains utterly broken in multi-byte locales because it is based on tolower applied to individual bytes, not characters. This is something that would have to be fixed upstream. I doubt we would backport such a change.

Comment 8 Florian Weimer 2021-08-19 15:35:26 UTC

Created attachment 1815739 [details]
0001-Improve-non-ASCII-locales-in-string-comparison-19939.patch

This is what I've got so far, but it's not ready yet. There is a posix/tst-fnmatch failure on i686 which I still need to investigate.

Comment 10 Florian Weimer 2021-08-20 09:41:21 UTC

We also need to backport this additional fix:

commit 5d228a436a8257f082e84671bf8c89b79a2c3853
Author: Andreas Schwab <schwab>
Date:   Mon Jul 29 14:58:41 2013 +0200

    Fix handling LC_CTYPE nonascii-case fallback in i686 SSE4.2 and SSSE3 strcasecmp/strncasecmp

Comment 11 Florian Weimer 2021-08-20 16:38:53 UTC

Created attachment 1816116 [details]
1815739: 0001-Improve-non-ASCII-locales-in-string-comparison-19939.patch

New patch with i686 crash fix with additional (straight from upstream) backport.

Comment 13 Florian Weimer 2021-08-20 16:43:29 UTC

Carlos pointed out that this change may invalidate on-disk database indexes because the reported ordering from strcasecmp changes. We really avoid doing that in minor releases. So we probably should not release this fix to 7.9.z because it is way too risky.

Comment 14 Florian Weimer 2021-08-23 07:40:10 UTC

I thought about this some more. There is no way we can ship the requested behavioral change as a general update to all customers. Even if we successfully mitigate all the risk from unwanted side effects (getting the assembler implementation right was surprisingly difficult), the *expected* behavioral change will be extremely surprising to some customers. Most customers probably will not notice because they use UTF-8 and the (broken) case conversion in strcasecmp does nothing for the non-ASCII parts of UTF-8, but it is very likely that some customers rely on the current (broken) behavior.

So I think the best we can do here is to offer a switch to change behavior, but leave the default unchanged. Paulo, would you please ask the customer whether it solves their issue if we provided a glibc update that only conditionally implements the requested behavior? For example, they would have to create a file /etc/sysconfig/strcasecmp-nonascii to activate the different strcasecmp implementation. Thanks.

Comment 15 Paulo Andrade 2021-08-23 16:31:46 UTC

Customer is fine with the suggested workaround, for example, having a file named /etc/sysconfig/strcasecmp-nonascii and when this file exists, rhel-7.9 glibc works like rhel6 or rhel8 glibc, that is, they do not need to special case their code for rhel7.

Comment 17 Florian Weimer 2021-08-24 17:36:33 UTC

Created attachment 1817195 [details]
0001-Support-etc-sysconfig-strcasecmp-nonascii-1993930.patch

This is the patch I currently have. My testing shows that the function selection is as expected.

It's x86-only (x86_64 and i686). I hope this will be sufficient.

My test program looks like this:

#define _GNU_SOURCE
#include <string.h>
#include <stdio.h>

int (*ptr_strcasecmp) (const char *, const char *) = strcasecmp;
int (*ptr_strncasecmp) (const char *, const char *, size_t) = strncasecmp;
int (*ptr_strcasecmp_l) (const char *, const char *, locale_t) = strcasecmp_l;
int (*ptr_strncasecmp_l) (const char *, const char *, size_t, locale_t) =
  strncasecmp_l;

int
main (void)
{
  printf ("strcasecmp = %p\n", ptr_strcasecmp);
  printf ("strncasecmp = %p\n", ptr_strncasecmp);
  printf ("strcasecmp_l = %p\n", ptr_strcasecmp_l);
  printf ("strncasecmp_l = %p\n", ptr_strncasecmp_l);
  return 0;
}

I run it under GDB, and then check if the addresses are reasonable.

This is without the file:

strcasecmp = 0xf7f4ee00
strncasecmp = 0xf7f4f300
strcasecmp_l = 0xf7f4ee30
strncasecmp_l = 0xf7f4f330
[Inferior 1 (process 37) exited normally]
(gdb) print (void*) 0xf7f4ee00
$1 = (void *) 0xf7f4ee00 <__strcasecmp_sse4_2>
(gdb) print (void*) 0xf7f4f300
$2 = (void *) 0xf7f4f300 <__strncasecmp_sse4_2>
(gdb) print (void*) 0xf7f4ee30
$3 = (void *) 0xf7f4ee30 <__strcasecmp_l_sse4_2>
(gdb) print (void*) 0xf7f4f330
$4 = (void *) 0xf7f4f330 <__strncasecmp_l_sse4_2>

And with the file:

strcasecmp = 0xf7e9eb60
strncasecmp = 0xf7e9ec20
strcasecmp_l = 0xf7e9eb10
strncasecmp_l = 0xf7e9ebc0
[Inferior 1 (process 30) exited normally]
(gdb) print (void*) 0xf7e9eb60
$1 = (void *) 0xf7e9eb60 <__strcasecmp_nonascii>
(gdb) print (void*) 0xf7e9ec20
$2 = (void *) 0xf7e9ec20 <__strncasecmp_nonascii>
(gdb) print (void*) 0xf7e9eb10
$3 = (void *) 0xf7e9eb10 <__strcasecmp_l_nonascii>
(gdb) print (void*) 0xf7e9ebc0
$4 = (void *) 0xf7e9ebc0 <__strncasecmp_l_nonascii>

Comment 20 Paulo Andrade 2021-08-26 15:14:33 UTC

  Thanks for the test packages!

  User confirms the problem is corrected, and is happy with the provided patch.

Comment 30 errata-xmlrpc 2021-10-12 15:30:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glibc bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3803

Note You need to log in before you can comment on or make changes to this bug.