Bug 1276711 - sort: sort results don't match when processing same source data set while cs_CZ.utf8 lang set
sort: sort results don't match when processing same source data set while cs_...
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
23
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Carlos O'Donell
Fedora Extras Quality Assurance
: Reopened
: 1269895 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-30 11:18 EDT by Ondrej Kozina
Modified: 2015-11-12 00:22 EST (History)
18 users (show)

See Also:
Fixed In Version: glibc-2.22-5.fc23
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-12 00:22:08 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
First source file (10.83 KB, text/plain)
2015-10-30 11:18 EDT, Ondrej Kozina
no flags Details
Second source file (10.83 KB, text/plain)
2015-10-30 11:19 EDT, Ondrej Kozina
no flags Details
Not matching res.00 (doesn't match with res.01) cz_CS.utf-8 set (10.83 KB, text/plain)
2015-10-30 11:47 EDT, Ondrej Kozina
no flags Details
Not matching res.01 (doesn't match with res.00) cz_CS.utf-8 set (10.83 KB, text/plain)
2015-10-30 11:47 EDT, Ondrej Kozina
no flags Details
matching res.00 (with LC_ALL=C set) (10.83 KB, text/plain)
2015-10-30 11:54 EDT, Ondrej Kozina
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Sourceware 18589 None None None Never

  None (edit)
Description Ondrej Kozina 2015-10-30 11:18:28 EDT
Created attachment 1087964 [details]
First source file

Description of problem:

I've experienced strange behaviour while sorting following data. I have two source files: 'src.00' and 'src.01'.

Both source files contain same set of data (lines). They only differ in order in which data (lines) is stored.

When I sort the source files with LC_ALL=C set the sorted output is always the same. In other words files with sorted output res.00 and res.01 are equal.

To get files res.00 and res.01 I used following commands:
cat src.00 | sort > res.00
cat src.01 | sort > res.01

On the other hand when I sort the source files with following locales set:
LANG=cs_CZ.utf8
LC_CTYPE="cs_CZ.utf8"
LC_NUMERIC="cs_CZ.utf8"
LC_TIME="cs_CZ.utf8"
LC_COLLATE="cs_CZ.utf8"
LC_MONETARY="cs_CZ.utf8"
LC_MESSAGES=en_US.utf8
LC_PAPER="cs_CZ.utf8"
LC_NAME="cs_CZ.utf8"
LC_ADDRESS="cs_CZ.utf8"
LC_TELEPHONE="cs_CZ.utf8"
LC_MEASUREMENT="cs_CZ.utf8"
LC_IDENTIFICATION="cs_CZ.utf8"
LC_ALL=

The res.00 and res.01 generated with same procedure as above will differ. Is it expected?

Version-Release number of selected component (if applicable):
coreutils-8.24-4.fc24.x86_64
Comment 1 Ondrej Kozina 2015-10-30 11:19 EDT
Created attachment 1087965 [details]
Second source file
Comment 2 Ondrej Kozina 2015-10-30 11:24:36 EDT
Hope I'll make it more clear:

I don't expect result files to be equal being sorted with different locales set, but I would expect it to be sorted into same result while processed under specific locales
Comment 3 Kamil Dudka 2015-10-30 11:38:15 EDT
I am not able to reproduce it with coreutils-8.24-4.fc24.x86_64 and cs_CZ.utf8.  Please attach also the output of sort you are getting in both of the cases.
Comment 4 Ondrej Kozina 2015-10-30 11:47 EDT
Created attachment 1087967 [details]
Not matching res.00 (doesn't match with res.01) cz_CS.utf-8 set
Comment 5 Ondrej Kozina 2015-10-30 11:47 EDT
Created attachment 1087969 [details]
Not matching res.01 (doesn't match with res.00) cz_CS.utf-8 set
Comment 6 Ondrej Kozina 2015-10-30 11:54 EDT
Created attachment 1087971 [details]
matching res.00 (with LC_ALL=C set)
Comment 7 Ondrej Kozina 2015-10-30 11:57:51 EDT
res.00 and res.01 doesn't match. These are results of sort run with cs_CZ.utf-8 being set as stated above.

res.00.C is result being created by sort run with
export LC_ALL=C
cat src.00 | sort > res.00.C
Comment 8 Ondrej Kozina 2015-10-30 11:59:18 EDT
(res.00.C and res.01.C matched so I didn't upload the identical matching file res.01.C)
Comment 9 Kamil Dudka 2015-10-30 12:13:08 EDT
The attached output looks broken to me.  Do you have valid locale data installed?

Please paste output of the following commands:
rpm -q glibc-common
rpm -V glibc-common
Comment 10 Kamil Dudka 2015-10-30 12:42:30 EDT
I have just reproduced it on a rawhide machine.  It happened only if unavailable locales were requested (like the incorrectly spelled cz_CS.utf-8 string you use somewhere in this bug report).  After upgrading my rawhide machine, the issue went away.  I guess it was glibc update what fixed it, will try to confirm.  

Anyway, we should consider an explicit fallback to POSIX locales in case the requested locales are not available as Pádraig suggests in bug #1270480 comment #8.
Comment 11 Kamil Dudka 2015-10-30 12:53:13 EDT
Unfortunately, a downgrade of glibc did not reintroduce the issue.  It could be that it happens only if the process generating locale-archive crashes during the installation of glibc (bug #1270480 comment #7), which I failed to reproduce either.

Please update glibc\* packages to the latest available version and check whether this bug is still reproducible.
Comment 12 Pádraig Brady 2015-10-30 22:06:01 EDT
Maybe this is the recent issue with cs_CZ I analyzed at:
https://bugzilla.opensuse.org/show_bug.cgi?id=948165#c10

Re falling back to "C" upon setlocale() failure.
That's what we do, but this is silent.
We really should be bleating for sort(1) at least.
Comment 13 Ondrej Kozina 2015-11-02 03:24:05 EST
[root@frawhide ~]# rpm -q glibc-common
glibc-common-2.22.90-8.fc24.x86_64

[root@frawhide ~]# rpm -V glibc-common; echo $?
0

The typo in locale name (utf8 -> utf-8) was only in my answers (and bug description) but not in my system settings. The only relevant and correct names are in comment #1. My apologies for any confusion in this matter.

Perhaps you've found yet another way how to reproduce the bug...

I'm going to try reproduce it with updated glibc (if any update available) as requested
Comment 14 Ondrej Kozina 2015-11-02 03:50:09 EST
After update to glibc-common-2.22.90-13 the bug is gone.
Comment 15 Ondrej Kozina 2015-11-02 04:58:10 EST
(Accidentally closed this bug. Letting maintainers to decide its fate...)
Comment 16 Pádraig Brady 2015-11-02 06:56:43 EST
While I couldn't find the commit fixing the bug in fedora, I'm fairly sure this is the bug mentioned in comment #12
Comment 17 Ondrej Oprala 2015-11-03 10:32:21 EST
I came to the same conclusion...Andreas (on the SUSE bz) also notes this should be fixed by a glibc patch...
Comment 18 Pádraig Brady 2015-11-06 12:34:12 EST
Actually I don't see an update for this issue in F23, so reassigning component.

This is to track the backport of this to F23:
https://sourceware.org/git/?p=glibc.git;a=commit;h=87701a58
Comment 19 Carlos O'Donell 2015-11-09 21:24:42 EST
Fixed in f23. Waiting for final builds before bodhi.

http://koji.fedoraproject.org/koji/taskinfo?taskID=11762533
Comment 20 Fedora Update System 2015-11-09 22:54:01 EST
glibc-2.22-5.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-4563ef63aa
Comment 21 Fedora Update System 2015-11-10 21:23:24 EST
glibc-2.22-5.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update glibc'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-4563ef63aa
Comment 22 Pavel Raiskup 2015-11-11 09:08:15 EST
*** Bug 1269895 has been marked as a duplicate of this bug. ***
Comment 23 Pavel Raiskup 2015-11-11 09:50:14 EST
I'm just curious, have we installed test-case for this issue?  I mean
something like: 'assert(strcoll("cx", "ch") < 0)' in cs_CZ?
Comment 24 Pádraig Brady 2015-11-11 09:57:43 EST
glibc now has:
https://sourceware.org/git/?p=glibc.git;a=blob;f=string/bug-strcoll2.c
Comment 25 Fedora Update System 2015-11-12 00:21:54 EST
glibc-2.22-5.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.