| Summary: | sort in F15 behaves different than sort in CentOS and debian for de_DE.UTF-8 locale | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Till Maas <opensource> |
| Component: | glibc | Assignee: | Jeff Law <law> |
| Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rawhide | CC: | fweimer, jakub, kdudka, law, maxamillion, opensource, ovasik, pfrankli, schwab, twaugh |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-07-18 22:15:16 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Till Maas
2011-10-31 09:11:30 UTC
Well, that would be hard - as the multibyte support in sort varies in the Linux distributions - is added by coreutils-i18n.patch in Fedora. As there is no upstream for this patch, this patch may vary (and varies) in the different distributions. Sorting depends on the LC_COLLATE and LC_NUMERIC settings from glibc - which may differ on different systems as well. I would say this is not a bug and my only recommendation here is to use C locales where the output is predictable and more consistent between systems. Thank you for the fast reply. IMHO there can be only one order that is correct for the shown lists. Also no multibyte characters are included, therefore the sort order for de_DE.UTF-8 should match the order for de_DE.* locales on Fedora, which is also not the case. And afaics coreutils is still developed by upstream, why won't they accept the patch? It doesn't matter, locales affect the sorting order - LC_COLLATE and LC_NUMERIC affects how to sort behaves. Additionally - multibyte patch is quite "stupid" - it sorts everything via multibyte path with multibyte locales(and multibyte path is 2-20+ times slower in the case of sort). I really recommend to use the LC_ALL=C for consistent results. To second part - yes, coreutils upstream is active, but multibyte patch has wrong design, it has to be rewritten from scratch to be accepted by upstream (too much of duplicate code, too big performance impact, almost no test coverage(in fact activating only one 'cut' test for multibyte discovered two bugs in the patch) ... ) ... it's far away from being acceptable for upstream (but I have to keep it in Fedora due to legacy reasons). Cleanup - as this is caused by locale specific collation order from glibc, so moving there - there is nothing what I can do about it in coreutils. Still, likely notabug. As far as I know, the F15 collation order is the most correct. CentOS 5 is probably using the slightly out of date bits from RHEL 5. DIACRIT_FORWARD is one of the changes that are probably missing from that era glibc. Can't speak for why Debian differs.... |