From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030703 Description of problem: Is there some "upstream" for glibc nowdays? I tried using glibcbug first, but my message bounced. The way to sort a phrase in Swedish depends on the context. There is one principle, called the "dictionary principle", where letter by letter is compared, and spaces between words are ignored. That is, no surprise, the common way to sort a dictionary. With the other principle, called the "word principle", the words are compared with each other, and only if the first word is the same in both phrases, the second word is taken into consideration. This is used in phone books, libraries, and other places. Technically, you could do this by sorting a space first of all, before all letters. The different principles are used in different contexts. The current locale definitions which comes with glibc applies the dictionary principle. I believe this is a good default; in most places where the definition is used it is approrpiate. But what would be the "correct" way to make it possible to choose the word principle where THAT is appropriate? The background for this request is a letter I got from a user working at a library (Rolf Johansson <rojo>). They use Linux for their databases. Their database system, PostgreSQL, leaves collation order to the system's locale definition. This means they get dictionary order. They would need word order, since that is well established among libraries. What is the correct thing for him to do? What kind of patch/program modification, should be done to make it possible? Version-Release number of selected component (if applicable): glibc-2.3.1-36 How reproducible: Always Steps to Reproduce: 1. cat > apa a conto a priori apparat ^D 2. env LANG=sv_SE sort apa Actual Results: a conto apparat a priori Expected Results: In most contexts, the order I get. But in some application areas, I expect a conto a priori apparat Additional info: Defining a new collation order in the locale is obviously one way to do this. But I'm uncertain if it is the best way. What would you suggest? I don't know if this problem is applicable to other languages too.
I was rather baffled by the new sort order as well, but have recently realized that sort appears to be sorting without regard to non-alphanumeric characters in non-C locales on the first pass. So we currently sort to: aaaaaaa A and G motor vehicles abalone Andersen, Hans Christian $$$ and no sense $$$ and sense Antigone Is this -really- the specified behavior for UTF-8 locales? I don't personally know of anyone who wants or expects this behavior. Can we at least get switches added to sort and join that will selectively disable this behavior and pay attention to non-alphanumerics in the sort?
This is not about UTF-8 locales, but about what sorting is common for various languages. If you look into a dictionary, you'll see the order you get. ANd it is certainly not something recent, sort has been behaving like that for a few years already. As for the original request, I think such non-standard handling belongs into the applications which need such handling.
In the particular case that would make the application significantly more complex. It today is using a PostgreSQL database, and functions like sorting is done by the database. It is not tempting to have to redo it in the application. Currently (or last I heard), they had defined a non-standard locale instead. That was deemed to be less complicated. To me it feels unfortunate one should have to do that.
If the different sorting order can be expressed using the specification language localedef can provide (and I think it can, just define a high enough priority to whitespaces), then define your own locale sv_SE@wordorder or so. This data need not come with glibc, just put it in a separate package and use localedef at installation time to create the binary form. I have no interest for glibc to get into these kinds of details. We provide a good default, anything else is up to specialized "localization" packages. I'm closing this bug as WONTFIX since something like this will not get into the upstream nor RH glibc package.