1320356 – strxfrm results do not match strcoll

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1320356 - strxfrm results do not match strcoll

Summary: strxfrm results do not match strcoll

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	7.2
Hardware:	x86_64
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	7.3
Assignee:	glibc team
QA Contact:	qe-baseos-tools-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-03-23 00:46 UTC by Tom Lane
Modified:	2019-01-21 15:19 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-11-20 09:44:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
shell script to run strcolltest.c in all utf8 locales (145 bytes, application/x-shellscript) 2016-03-23 00:46 UTC, Tom Lane	no flags	Details
test program (4.31 KB, text/plain) 2016-03-23 00:47 UTC, Tom Lane	no flags	Details
trivial program to show strcoll and strxfrm results for strings on command line (1.54 KB, text/plain) 2016-03-24 16:25 UTC, Tom Lane	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Sourceware	18940	0	None	None	None	2019-01-21 13:49:56 UTC

Description Tom Lane 2016-03-23 00:46:10 UTC

Created attachment 1139298 [details]
shell script to run strcolltest.c in all utf8 locales

Description of problem:
According to the POSIX standard, strxfrm() should produce results that sort the same as strcoll().  This can be shown to fail in some locales, most notably de_DE.utf8.

Version-Release number of selected component (if applicable):
glibc-2.12-1.166.el6_7.7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Run the attached shell script, which invokes the attached program, on a machine with a reasonably complete set of locales installed.

Actual results:
Multiple complaints about "inconsistency between strcoll and strxfrm orders" in
some locales.

Expected results:
It shouldn't say that.

Additional info:
I have personally observed this on an up-to-date RHEL6 machine.  It reportedly happens on RHEL7 as well.  Note that the test program only considers UTF8-encoding locales; I do not know if there are similar problems in other encodings.

There is additional background information in this Postgres bug report thread:
http://www.postgresql.org/message-id/flat/111D0E27-A8F3-4A84-A4E0-B0FB703863DF@s24.com

Comment 1 Tom Lane 2016-03-23 00:47:09 UTC

Created attachment 1139299 [details]
test program

Comment 3 Carlos O'Donell 2016-03-24 06:02:13 UTC

If I understand correctly there are two problems.

(a) Sorting strxfrm-transformed strings does not result in the same answer as using strcoll. This is a requirement of POSIX:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/strxfrm.html
~~~
The transformation is such that if strcmp() is applied to two transformed strings, it shall return a value greater than, equal to, or less than 0, corresponding to the result of strcoll() [CX] [Option Start]  or strcoll_l(), [Option End]  respectively, applied to the same two original strings [CX] [Option Start]  with the same locale. [Option End]
~~~

(b) Error caused in Postgresql by (a) *or* by release-to-release collation changes when used with on-disk indexes?

Which of these is actually the problem you're trying to solve?

Regarding (a), your test randomly generates UTF-8 data within an 8-bit subset of the 11-bit 2-byte range (U+0080 -> U+07ff). There are some really esoteric things in this range including combining characters. While it may be theoretically correct that strxfrm and strcoll are not returning the right ordering it might also be an academic issue that "COMBINING GREEK KORONIS" is not sorted the same between both algorithms (strxfrm vs. strcoll).

It is likely that to fix this would requiring rewriting strxfrm or strcoll and both of those are sufficiently risky changes for production phase 2 that this will likely never get fixed in RHEL6. Therefore I'm moving this to RHEL7. Users have been using these functions with these defects since RHEL6 was first shipped. We would need a concrete application issue to consider any fixes like this in RHEL6.

On that topic, do you have any concrete examples that we might discuss? The goal here would be to try determine the real risk this poses to RHEL customers. I note that we still have some inconsistent sortings in RHEL7 also, and that locale LC_COLLATE data makes a difference, and that the recent strxfrm CVE fixes make a difference too.

Regarding (b), the collations in glibc may change from build to build depending on changes in the algorithms or locales. You cannot rely on the collation stay the same once the process exits (nor can you rely upon it via a shared memory mapping to another process sorting strings in memory). Documenting strxfrm stability is part of upstream bug 18940:
https://sourceware.org/bugzilla/show_bug.cgi?id=18940

Given that this is not fixed upstream, the fix will have to get worked on there before being backported. It's also not likely to get solved easily. While we've been updating to Unicode 8.0, collation is the last holdout given the complexity of the update.

Comment 5 Florian Weimer 2016-03-24 06:26:59 UTC

(In reply to Carlos O'Donell from comment #3)
> We would need a concrete application issue to consider
> any fixes like this in RHEL6.

The PostgreSQL 9.5 index lookup failure described in the referenced mailing list thread qualifies as this, I think.  It does not even involve any funky Unicode features.

Comment 7 Tom Lane 2016-03-24 15:19:26 UTC

The problem is (a), that strxfrm() fails to satisfy the POSIX requirement of consistency with strcoll().  AFAICS, strxfrm() has no conceivable use if it doesn't satisfy that requirement, so I'm not sure why you'd consider this debatable.

I do realize that a fix might be blocked if fixing the bug might change strcoll's behavior rather than strxfrm's.  In Postgres' usage, at least, we expect strcoll's sort ordering to hold still within a given locale definition, and I imagine lots of other applications do too.

That test program only generates printable characters in the ISO8859-1 range, and if you consult the referenced Postgres thread there are examples of inconsistent behavior with strings containing only printable ASCII, so I think you're incorrect to worry about whether strange Unicode features are involved.  One thing I would like to get out of this discussion though is a precise characterization of what strings make it go wrong, as that knowledge would aid the Postgres project in determining how to deal with the bug on unpatched systems.

Comment 8 Carlos O'Donell 2016-03-24 15:35:48 UTC

(In reply to Tom Lane from comment #7)
> I do realize that a fix might be blocked if fixing the bug might change
> strcoll's behavior rather than strxfrm's.  In Postgres' usage, at least, we
> expect strcoll's sort ordering to hold still within a given locale
> definition, and I imagine lots of other applications do too.

Define "within a given locale definition?"

As locales change, either for bug fixes, or political reasons, the strcoll and strxfrm result will change over time.

Given a system, with a fixed set of locales, you are right, the results between strcoll and strxfrm should be the same.

Today there are examples where this isn't the case, and it will not be easy to fix. You argue that this is a black and white scenario, but I think there is some grey in here and that the average language day-to-day use works. From a robustness perspective, you're correct, you can't rely on all sortings to work, nor is it possible to test it (too many combinations), so you have to rely on the library authors to give you that guarantee.

Today there is no such guarantee from glibc, and not in any distribution using glibc. We have two distinct algorithms for strxfrm and strcoll, and while they mostly product the same results, given some locales, and some unicode code points, they are producing different sortings. This will take significant work to fix correctly. The shades of grey exist because this is the way it has always been and it works well for most of the input data users are using. If you want bulletproof, we don't have it. Sorry.
 
> That test program only generates printable characters in the ISO8859-1
> range, and if you consult the referenced Postgres thread there are examples
> of inconsistent behavior with strings containing only printable ASCII, so I
> think you're incorrect to worry about whether strange Unicode features are
> involved.  One thing I would like to get out of this discussion though is a
> precise characterization of what strings make it go wrong, as that knowledge
> would aid the Postgres project in determining how to deal with the bug on
> unpatched systems.

We have no idea what strings cause differences or why.

Could you please provide examples that cause strxfrm and strcoll to return different results for ASCII? Note that ISO-8859-1 is not ASCII and has characters beyond the 7-bit range.

Comment 9 Carlos O'Donell 2016-03-24 15:38:39 UTC

When I ask for examples I'm looking for self-contained programs with input and output which I can use upstream when discussing this issue. The testcase that is posted is random and not deterministic, so produces what appears to be varying results (we have similar problems with test-sort.sh in glibc which is fuzzing the sorting). Giving us a deterministic reproducer is the best first step.

Comment 10 Tom Lane 2016-03-24 16:25:13 UTC

Created attachment 1140050 [details]
trivial program to show strcoll and strxfrm results for strings on command line

Comment 11 Tom Lane 2016-03-24 16:29:10 UTC

Well, here's a couple of deterministic examples using the program I just uploaded:

$ LANG=de_DE.utf8 ./trivialtest " & )s"  "s "
Using LC_COLLATE = "de_DE.utf8"
Using LC_CTYPE = "de_DE.utf8"
strcoll(" & )s","s ") = 1
strcmp(strxfrm results) = -1

$ LANG=de_DE.utf8 ./trivialtest "   x" "X"   
Using LC_COLLATE = "de_DE.utf8"
Using LC_CTYPE = "de_DE.utf8"
strcoll("   x","X") = 1
strcmp(strxfrm results) = -7

However, I do not know whether there is more than one bug contributing to the failures found by the random-search program, so I wouldn't recommend ignoring it altogether ...

Comment 12 Tom Lane 2016-03-24 16:35:13 UTC

some even simpler variants:

$ LANG=de_DE.utf8 ./trivialtest "xxx" "x xx"
Using LC_COLLATE = "de_DE.utf8"
Using LC_CTYPE = "de_DE.utf8"
strcoll("xxx","x xx") = 6
strcmp(strxfrm results) = -1

$ LANG=de_DE.utf8 ./trivialtest "x x x" "xxx"
Using LC_COLLATE = "de_DE.utf8"
Using LC_CTYPE = "de_DE.utf8"
strcoll("x x x","xxx") = -6
strcmp(strxfrm results) = 1

Comment 13 Carlos O'Donell 2016-03-24 18:11:47 UTC

For testing purposes I'm using a chroot with only de_DE.UTF-8 installed, libc, ld.so, and the binary.

> $ LANG=de_DE.utf8 ./trivialtest "xxx" "x xx"
> Using LC_COLLATE = "de_DE.utf8"
> Using LC_CTYPE = "de_DE.utf8"
> strcoll("xxx","x xx") = 6
> strcmp(strxfrm results) = -1

Works fine with Rawhide (2.24), F24 (2.23), F23 (2.22), RHEL 7.3 (devel).

e.g.
./chroot-run.sh 'xxx' 'x xx'
INFO: Using LC_COLLATE = "de_DE.utf8"
INFO: Using LC_CTYPE = "de_DE.utf8"
INFO: Inputs are "xxx" and "x xx".
PASS: -1 and -1 are both less than zero.

> $ LANG=de_DE.utf8 ./trivialtest "x x x" "xxx"
> Using LC_COLLATE = "de_DE.utf8"
> Using LC_CTYPE = "de_DE.utf8"
> strcoll("x x x","xxx") = -6
> strcmp(strxfrm results) = 1

Works fine with Rawhide (2.24), F24 (2.23), and F23 (2.22), RHEL 7.3 (2.17).

> $ LANG=de_DE.utf8 ./trivialtest " & )s"  "s "
> Using LC_COLLATE = "de_DE.utf8"
> Using LC_CTYPE = "de_DE.utf8"
> strcoll(" & )s","s ") = 1
> strcmp(strxfrm results) = -1

Works fine with Rawhide (2.24), F24 (2.23), and F23 (2.22).

Fails on RHEL 7.3 (2.17)

INFO: Using LC_COLLATE = "de_DE.utf8"
INFO: Using LC_CTYPE = "de_DE.utf8"
INFO: Inputs are " & )s" and "s ".
FAIL: 1 and -1 are not the same.


> $ LANG=de_DE.utf8 ./trivialtest "   x" "X"   
> Using LC_COLLATE = "de_DE.utf8"
> Using LC_CTYPE = "de_DE.utf8"
> strcoll("   x","X") = 1
> strcmp(strxfrm results) = -7

Works fine with Rawhide (2.24), F24 (2.23), and F23 (2.22).

Fails on RHEL 7.3 (2.17)

INFO: Using LC_COLLATE = "de_DE.utf8"
INFO: Using LC_CTYPE = "de_DE.utf8"
INFO: Inputs are "   x" and "X".
FAIL: 1 and -7 are not the same.

The biggest change in this area occured in 2015-01-13 commit 0f9e585480edcdf1e30dc3d79e24b84aeee516fa which rewrote most of strxfrm to fix BZ #16009 and went into glibc 2.21.

What glibc are you using? If you have glibc >= 2.21 and you can reproduce ASCII sorting discrepancies between strxfrm and strcoll then we'd like to know that.

It seems like the above results are probably from glibc == 2.12 from RHEL6, and in a production phase 2 setting I've already argued we are not going to fix this since it would change existing sortings that customers are already handling in their systems.

However, for RHEL7 it is more feasible that we backport the strxfrm rewrite to fix the ASCII sorting discrepancies.

Comment 14 Tom Lane 2016-03-24 19:37:51 UTC

> What glibc are you using?

The one currently shipped in RHEL6, as I stated in the initial report.

Your results match some reports on the Postgres lists that people were unable to reproduce any problem with glibc 2.22 and later.  However, I'm unsure that that means this is a known/fixed problem, as I couldn't find anything in the glibc bugzilla suggesting it was known that strxfrm might fail to match strcoll.  It might be only accidentally masked by other changes.  In any case, I'd sure like to know exactly what fixed it, as we still need to characterize the problem better for our users.

Comment 15 Peter Geoghegan 2016-04-13 06:47:02 UTC

(In reply to Tom Lane from comment #14)
> Your results match some reports on the Postgres lists that people were
> unable to reproduce any problem with glibc 2.22 and later.  However, I'm
> unsure that that means this is a known/fixed problem, as I couldn't find
> anything in the glibc bugzilla suggesting it was known that strxfrm might
> fail to match strcoll.  It might be only accidentally masked by other
> changes.  In any case, I'd sure like to know exactly what fixed it, as we
> still need to characterize the problem better for our users.

The test program reports no issues with glibc 2.22 due to this commit, it seems:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=8540f6d2a74fe9d67440535ebbcfa252180a3172

I determined this mechanically, using git-bisect. I have no idea why this might be, although I note that the commit does touch localedata.

Comment 16 Carlos O'Donell 2016-04-13 14:15:04 UTC

(In reply to Peter Geoghegan from comment #15)
> (In reply to Tom Lane from comment #14)
> > Your results match some reports on the Postgres lists that people were
> > unable to reproduce any problem with glibc 2.22 and later.  However, I'm
> > unsure that that means this is a known/fixed problem, as I couldn't find
> > anything in the glibc bugzilla suggesting it was known that strxfrm might
> > fail to match strcoll.  It might be only accidentally masked by other
> > changes.  In any case, I'd sure like to know exactly what fixed it, as we
> > still need to characterize the problem better for our users.
> 
> The test program reports no issues with glibc 2.22 due to this commit, it
> seems:
> 
> https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;
> h=8540f6d2a74fe9d67440535ebbcfa252180a3172
> 
> I determined this mechanically, using git-bisect. I have no idea why this
> might be, although I note that the commit does touch localedata.

I think you have made a mistake somewhere. The above change doesn't make any real changes to strxfrm or strcoll results.

You do provide a very good data point, that in glibc 2.22 you were not able to reproduce the problem. The glibc 2.22 release includes the strxfrm rewrite that went into 2.21.

So the next steps would be:
* Backport strxfrm fixes to rhel7
* Verify that for the test case we have that strxfrm and strcoll sort the same way in ASCII and for some non-ASCII examples.

We would evaluate this for rhel-7.4.

Comment 17 Peter Geoghegan 2016-04-13 22:52:45 UTC

(In reply to Carlos O'Donell from comment #16)
> (In reply to Peter Geoghegan from comment #15)
> > (In reply to Tom Lane from comment #14)
> > > Your results match some reports on the Postgres lists that people were
> > > unable to reproduce any problem with glibc 2.22 and later.  However, I'm
> > > unsure that that means this is a known/fixed problem, as I couldn't find
> > > anything in the glibc bugzilla suggesting it was known that strxfrm might
> > > fail to match strcoll.  It might be only accidentally masked by other
> > > changes.  In any case, I'd sure like to know exactly what fixed it, as we
> > > still need to characterize the problem better for our users.
> > 
> > The test program reports no issues with glibc 2.22 due to this commit, it
> > seems:
> > 
> > https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;
> > h=8540f6d2a74fe9d67440535ebbcfa252180a3172
> > 
> > I determined this mechanically, using git-bisect. I have no idea why this
> > might be, although I note that the commit does touch localedata.
> 
> I think you have made a mistake somewhere. The above change doesn't make any
> real changes to strxfrm or strcoll results.

Well, this was black-box debugging, and I'm not sure that there were no other factors, but that commit did seem to be where the problem went away. We have a reliable test case, and since 2.22 is not believed to be affected; it must follow that it's possible to finger one commit using git bisect.

I didn't rerun locale-gen. Could that be a factor?

If I'm not competent to write a test case that fingers the commit that made things behave consistently again, someone else should.

> You do provide a very good data point, that in glibc 2.22 you were not able
> to reproduce the problem. The glibc 2.22 release includes the strxfrm
> rewrite that went into 2.21.
> 
> So the next steps would be:
> * Backport strxfrm fixes to rhel7
> * Verify that for the test case we have that strxfrm and strcoll sort the
> same way in ASCII and for some non-ASCII examples.
> 
> We would evaluate this for rhel-7.4.

Why are you so confident that the problem is strxfrm(), and not strcoll()?

If we assume for the sake of argument that this is a strxfrm() bug and strcoll() is a reliable source of truth, then I find it very curious that Germany's Austrian neighbors differ on this point about how text should be collated (this is 2.12, fwiw):

[vagrant@localhost ~]$ ./trivialtest de_DE.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: 6
[vagrant@localhost ~]$ ./trivialtest de_AT.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1

The "broken" strxfrm *agrees* with the "happens to be unaffected by bug" strxfrm() for a distinct though culturally very similar locale in this example. The "working" strcoll() results *disagree*, though.

I'm not an expert, but based on this it seems likely that the bug is actually in strcoll(). This seems implausible as a legitimate cultural difference between Austria and Germany. From a Postgres POV, our concern may actually be strxfrm()'s failure to be bug compatible with strcoll(), and not simply that strxfrm() gives wrong answers.

I'm not going to argue with you about what version of RHEL you should target, which in any case seems premature, but I don't think there is any ambiguity about the result that either strcoll() or strxfrm() should give. The locale definitions are unambiguous, and cleanly separate the implementation from the expected final sort order.

Comment 18 Peter Geoghegan 2016-04-14 03:16:19 UTC

(In reply to Peter Geoghegan from comment #17)
> If we assume for the sake of argument that this is a strxfrm() bug and
> strcoll() is a reliable source of truth, then I find it very curious that
> Germany's Austrian neighbors differ on this point about how text should be
> collated (this is 2.12, fwiw):

> I'm not an expert, but based on this it seems likely that the bug is
> actually in strcoll(). This seems implausible as a legitimate cultural
> difference between Austria and Germany.

Note that on the same glibc 2.12 system, there are sometimes different answers when one replaces an ascii-safe latin alphabet character to a punctuation character in *both* strings being compared. This seems rather surprising, at least to me.  (A UCA-style algorithm gives only "level 4" weight to punctuation [1]). 

Example of this:

[vagrant@localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'x xxf' 'xxxf'
"x xxf" -> 2323231101080808080102020202010235 (17 bytes)
"xxxf" -> 2323231101080808080102020202 (14 bytes)
strcmp(arg1, arg2) result: 1
strcoll(arg1, arg2) result: -6

When we swap the f character with a " character in both strings:

[vagrant@localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'x xx"' 'xxx"'
"x xx"" -> 2323230108080801020202010235034b (16 bytes)
"xxx"" -> 232323010808080102020201044b (14 bytes)
strcmp(arg1, arg2) result: -2
strcoll(arg1, arg2) result: -6

Notice that the first case where a Latin alphabet character is used sees strxfrm() and strcoll() disagree. Whereas, if we swap one ascii-safe Latin alphabet character with a double quote character (or some other punctuation character), they agree. Again, this doesn't seem like it has much to do with German collation rules.

Although I could be wrong, my suspicion is that the second case's "failure to fail" is indicative of the nature of the underlying problem. There may be an incorrectly applied optimization here for strcoll() only. In general, strcoll() seems more likely have bugs than strxfrm(). See http://unicode.org/faq/collation.html#5, where it states:

"""
...implementers must be careful to produce implementations that accurately reproduce the results of the Unicode Collation Algorithm as they optimize their own algorithms. It is easy to perform careless optimizations — especially with Incremental Comparison algorithms — that fail this test.
 
"""

(The point is that strxfrm() is *never* incremental, of course.)

It would be rather bad for Postgres users if there was a bug in strcoll() which led to comparisons that lack "transitive consistency" -- we are critically reliant on that [2]. So far, there is no evidence of that ever happening, thankfully.

If there is a bug in strcoll(), in general we would prefer that that bug *not* be fixed in released versions of glibc. Although, it is conceivable that changing strcoll() behavior in a stable release would be the lesser evil for us, if and only if the "transitive consistency" of strcoll() needed to be repaired (which, as I said, is not in evidence here, to my great relief).

Thanks

[1] http://unicode.org/reports/tr10/#Multi_Level_Comparison
[2] https://github.com/postgres/postgres/blob/REL9_4_STABLE/src/backend/access/nbtree/README#L589

Comment 19 Carlos O'Donell 2016-04-14 15:48:17 UTC

(In reply to Peter Geoghegan from comment #17)
> > I think you have made a mistake somewhere. The above change doesn't make any
> > real changes to strxfrm or strcoll results.
> 
> Well, this was black-box debugging, and I'm not sure that there were no
> other factors, but that commit did seem to be where the problem went away.
> We have a reliable test case, and since 2.22 is not believed to be affected;
> it must follow that it's possible to finger one commit using git bisect.
> 
> I didn't rerun locale-gen. Could that be a factor?
> 
> If I'm not competent to write a test case that fingers the commit that made
> things behave consistently again, someone else should.

If you aren't running in a chroot/container/VM with the newly installed glibc and the newly build locales, then you haven't isolated your environment sufficiently for testing.

The use of locale-gen is an Ubuntu-specific operation that has nothing to do with testing a pristine glibc 2.22. If you're not testing a pristine glibc 2.22, then other patches might impact the results. If you're not installing locale sources in the right place for Ubuntu then locale-gen won't recompile from the right sources etc. etc.

Right now I'm talking about pristine glibc builds (no distribution changes).

I don't know that tracking down _exactly_ which commit made changes really matters.

What matters is two things:

* Going forward we need testing to verify we meet the POSIX requirement for strxfrm/strcoll equilvance.
* Make strxfrm/strcoll return results as required for equivalence.

> I'm not going to argue with you about what version of RHEL you should
> target, which in any case seems premature, but I don't think there is any
> ambiguity about the result that either strcoll() or strxfrm() should give.
> The locale definitions are unambiguous, and cleanly separate the
> implementation from the expected final sort order.

I agree that an audit of both is really required. Starting with strxfrm since it has been updated recently.

Comment 20 Carlos O'Donell 2016-04-14 15:54:57 UTC

(In reply to Peter Geoghegan from comment #18)
> It would be rather bad for Postgres users if there was a bug in strcoll()
> which led to comparisons that lack "transitive consistency" -- we are
> critically reliant on that [2]. So far, there is no evidence of that ever
> happening, thankfully.

Glad to hear it.

> If there is a bug in strcoll(), in general we would prefer that that bug
> *not* be fixed in released versions of glibc. Although, it is conceivable
> that changing strcoll() behavior in a stable release would be the lesser
> evil for us, if and only if the "transitive consistency" of strcoll() needed
> to be repaired (which, as I said, is not in evidence here, to my great
> relief).

The decision to upgrade a released version of glibc depends on several factors which are non-technical in nature, particularly for RHEL.

In this case RHEL6 is entering production phase 2, and so changing strcoll or strxfrm is a non-starter. This means that existing Postgres packages for those distributions have to tolerate the above discussed issues.

In RHEL7 we have more flexibility to fix these things, particularly given the fact that strxfrm and strcoll results are are subject to change, just like the bug fix might change the results, so too might a locale update.

Thanks for your feedback.

Comment 25 Carlos O'Donell 2018-04-03 03:09:38 UTC

We've updated a lot of collation rules to match CLDR and ISO 14651. Along with that I'm working on some strcoll testing, particularly for full code-point sorting with C.UTF-8 which should let use validate more of these issues. We aren't done here yet, but I wanted to note that we are making progress.

Comment 26 Florian Weimer 2018-11-20 09:44:54 UTC

We believe that this issue will be addressed in Red Hat Enterprise Linux 8 with an import from CLDR, as part of the rebase to glibc 2.28.

Note You need to log in before you can comment on or make changes to this bug.