Red Hat Bugzilla – Bug 1276711
sort: sort results don't match when processing same source data set while cs_CZ.utf8 lang set
Last modified: 2015-11-12 00:22:08 EST
Created attachment 1087964 [details]
First source file
Description of problem:
I've experienced strange behaviour while sorting following data. I have two source files: 'src.00' and 'src.01'.
Both source files contain same set of data (lines). They only differ in order in which data (lines) is stored.
When I sort the source files with LC_ALL=C set the sorted output is always the same. In other words files with sorted output res.00 and res.01 are equal.
To get files res.00 and res.01 I used following commands:
cat src.00 | sort > res.00
cat src.01 | sort > res.01
On the other hand when I sort the source files with following locales set:
The res.00 and res.01 generated with same procedure as above will differ. Is it expected?
Version-Release number of selected component (if applicable):
Created attachment 1087965 [details]
Second source file
Hope I'll make it more clear:
I don't expect result files to be equal being sorted with different locales set, but I would expect it to be sorted into same result while processed under specific locales
I am not able to reproduce it with coreutils-8.24-4.fc24.x86_64 and cs_CZ.utf8. Please attach also the output of sort you are getting in both of the cases.
Created attachment 1087967 [details]
Not matching res.00 (doesn't match with res.01) cz_CS.utf-8 set
Created attachment 1087969 [details]
Not matching res.01 (doesn't match with res.00) cz_CS.utf-8 set
Created attachment 1087971 [details]
matching res.00 (with LC_ALL=C set)
res.00 and res.01 doesn't match. These are results of sort run with cs_CZ.utf-8 being set as stated above.
res.00.C is result being created by sort run with
cat src.00 | sort > res.00.C
(res.00.C and res.01.C matched so I didn't upload the identical matching file res.01.C)
The attached output looks broken to me. Do you have valid locale data installed?
Please paste output of the following commands:
rpm -q glibc-common
rpm -V glibc-common
I have just reproduced it on a rawhide machine. It happened only if unavailable locales were requested (like the incorrectly spelled cz_CS.utf-8 string you use somewhere in this bug report). After upgrading my rawhide machine, the issue went away. I guess it was glibc update what fixed it, will try to confirm.
Anyway, we should consider an explicit fallback to POSIX locales in case the requested locales are not available as Pádraig suggests in bug #1270480 comment #8.
Unfortunately, a downgrade of glibc did not reintroduce the issue. It could be that it happens only if the process generating locale-archive crashes during the installation of glibc (bug #1270480 comment #7), which I failed to reproduce either.
Please update glibc\* packages to the latest available version and check whether this bug is still reproducible.
Maybe this is the recent issue with cs_CZ I analyzed at:
Re falling back to "C" upon setlocale() failure.
That's what we do, but this is silent.
We really should be bleating for sort(1) at least.
[root@frawhide ~]# rpm -q glibc-common
[root@frawhide ~]# rpm -V glibc-common; echo $?
The typo in locale name (utf8 -> utf-8) was only in my answers (and bug description) but not in my system settings. The only relevant and correct names are in comment #1. My apologies for any confusion in this matter.
Perhaps you've found yet another way how to reproduce the bug...
I'm going to try reproduce it with updated glibc (if any update available) as requested
After update to glibc-common-2.22.90-13 the bug is gone.
(Accidentally closed this bug. Letting maintainers to decide its fate...)
While I couldn't find the commit fixing the bug in fedora, I'm fairly sure this is the bug mentioned in comment #12
I came to the same conclusion...Andreas (on the SUSE bz) also notes this should be fixed by a glibc patch...
Actually I don't see an update for this issue in F23, so reassigning component.
This is to track the backport of this to F23:
Fixed in f23. Waiting for final builds before bodhi.
glibc-2.22-5.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-4563ef63aa
glibc-2.22-5.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update glibc'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-4563ef63aa
*** Bug 1269895 has been marked as a duplicate of this bug. ***
I'm just curious, have we installed test-case for this issue? I mean
something like: 'assert(strcoll("cx", "ch") < 0)' in cs_CZ?
glibc now has:
glibc-2.22-5.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.