+++ This bug was initially created as a clone of Bug #236212 +++ Description of problem: In French, the thousands separator is a space, for example, "1 024". The fr_FR locale in RHEL 5 has the thousands separator incorrectly defined as a null. This was fixed with upstream commit bfe6bf1: http://sourceware.org/git/?p=glibc.git;a=commit;h=bfe6bf17a38b407eb1210f6f7e6d2561c7da05aa Please update the RHEL 5 fr_FR locale definitions. Version-Release number of selected component (if applicable): glibc-2.5-58 How reproducible: Always Steps to Reproduce: env LC_ALL=fr_FR.UTF-8 /usr/bin/printf "%'d\n" 4294967296 Actual Results: 4294967296 Expected Results: 4 294 967 296
Commit b632f9a also updates the self-tests when building glibc: http://sourceware.org/git/?p=glibc.git;a=commit;h=b632f9a81640db676905250257e677b415c963f9
Created attachment 477140 [details] patch to correct French thousands separator The upstream patches combined for RHEL 5
Hmm, even with the patch, it's not printing correctly: [user@localhost ~]$ rpm -q glibc glibc-2.5-58.bz675259.x86_64 glibc-2.5-58.bz675259.i686 [user@localhost ~]$ env LC_ALL=fr_FR.UTF-8 /usr/bin/printf "%'d\n" 4294967296 4294967296 However, another quick test shows that the thousands separator is correctly set to a space: [user@localhost ~]$ cat thousands_sep.c #include<locale.h> #include<stdio.h> int main(void) { struct lconv locale_structure; struct lconv *locale_ptr=&locale_structure; setlocale(LC_ALL, "fr_FR.UTF-8"); locale_ptr=localeconv(); printf("Thousands Separator: '%s'\n",locale_ptr->thousands_sep); } [user@localhost ~]$ gcc -o thousands_sep thousands_sep.c [user@localhost ~]$ ./thousands_sep Thousands Separator: ' ' I'll do some more research next week.
Oh, I think I found the problem: the grouping also has to be changed. LC_NUMERIC decimal_point "<U002C>" thousands_sep "<U0020>" -grouping 0;0 +grouping 3 END LC_NUMERIC This was fixed with commit d03eba1: http://sourceware.org/git/?p=glibc.git;a=commit;h=d03eba121c430068fe97f3f85495b2d1fe9b694f Reported upstream at: http://sourceware.org/bugzilla/show_bug.cgi?id=6040 I'll try fixing the grouping too.
Yes, changing the grouping fixed the problem. I did a quick rebuild of the fr_FR locales instead of rebuilding all of glibc to verify it. 1. First, as root, edit the fr_FR locale file and changing the grouping in the LC_NUMERIC from 0;0 to 3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vim /usr/share/i18n/locales/fr_FR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2. Next, still as root, compile the fr_FR and fr_FR@euro locale files: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ zcat /usr/share/i18n/charmaps/ISO-8859-1.gz > \ /usr/share/i18n/charmaps/ISO-8859-1 zcat /usr/share/i18n/charmaps/UTF-8.gz > \ /usr/share/i18n/charmaps/UTF-8 localedef -c -f /usr/share/i18n/charmaps/ISO-8859-1 \ -i /usr/share/i18n/locales/fr_FR \ /usr/lib/locale/fr_FR localedef -c -f /usr/share/i18n/charmaps/ISO-8859-1 \ -i /usr/share/i18n/locales/fr_FR@euro \ /usr/lib/locale/fr_FR@euro localedef -c -f /usr/share/i18n/charmaps/UTF-8 \ -i /usr/share/i18n/locales/fr_FR@euro \ /usr/lib/locale/fr_FR.utf8 build-locale-archive rm /usr/share/i18n/charmaps/ISO-8859-1 rm /usr/share/i18n/charmaps/UTF-8 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3. Finally, test it out: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [user@localhost ~]$ env LC_ALL=fr_FR /usr/bin/printf "%'d\n" 4294967296 4 294 967 296 [user@localhost ~]$ env LC_ALL=fr_FR.UTF-8 /usr/bin/printf "%'d\n" 4294967296 4 294 967 296 [user@localhost ~]$ env LC_ALL=fr_FR.ISO-8859-1 /usr/bin/printf "%'d\n" 4294967296 4 294 967 296 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It works!
Created attachment 477151 [details] patch to correct French thousands separator and grouping
Created attachment 478946 [details] patch for French, Spanish, and German locales for LC_NUMERIC We've discovered more problems with LC_NUMERIC settings in some Spanish and German locales. I've compared the fr_*, es_*, and de_* files in RHEL 5 glibc against the latest Unicode CLDR (Common Locale Data Repository), version 1.9, and patched the LC_NUMERIC section where appropriate. See the attached patch. http://unicode.org/Public/cldr/1.9.0/posix.zip http://unicode.org/cldr/trac/browser/tags/release-1-9/posix/
Testing the patch from comment 11. All French, Spanish and German locales have a grouping of 3 now and non-null thousands separator. [username@localhost ~]$ rpm -q glibc-common glibc-common-2.5-58.bz675259.2.x86_64 [username@localhost ~]$ cat numeric.sh #!/bin/bash for L in /usr/share/locale/{fr,es,de}_* ; do loc=$(basename $L).UTF-8 echo $loc env LC_ALL=$loc printf "%d = %'d\n" 4294967296 4294967296 env LC_ALL=$loc printf "%f = %'f\n" 1234.1234 1234.1234 echo done [username@localhost ~]$ ./numeric.sh fr_BE.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 fr_CA.UTF-8 4294967296 = 4 294 967 296 1234,123400 = 1 234,123400 fr_CH.UTF-8 4294967296 = 4'294'967'296 1234.123400 = 1'234.123400 fr_FR.UTF-8 4294967296 = 4 294 967 296 1234,123400 = 1 234,123400 es_AR.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_CL.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_CO.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_CR.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_DO.UTF-8 4294967296 = 4,294,967,296 1234.123400 = 1,234.123400 es_EC.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_ES.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_GT.UTF-8 4294967296 = 4,294,967,296 1234.123400 = 1,234.123400 es_HN.UTF-8 4294967296 = 4,294,967,296 1234.123400 = 1,234.123400 es_MX.UTF-8 4294967296 = 4,294,967,296 1234.123400 = 1,234.123400 es_NI.UTF-8 4294967296 = 4,294,967,296 1234.123400 = 1,234.123400 es_PA.UTF-8 4294967296 = 4,294,967,296 1234.123400 = 1,234.123400 es_PE.UTF-8 4294967296 = 4,294,967,296 1234.123400 = 1,234.123400 es_PR.UTF-8 4294967296 = 4,294,967,296 1234.123400 = 1,234.123400 es_SV.UTF-8 4294967296 = 4,294,967,296 1234.123400 = 1,234.123400 es_UY.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_VE.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 de_AT.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 de_CH.UTF-8 4294967296 = 4'294'967'296 1234.123400 = 1'234.123400 de_DE.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400
Miroslav, You are correct, Andreas failed to include the fix for several locales which were noted as incorrect in c13. However, the desired output c13 is incorrect (and thus the patch is also incorrect) in that many of the thousands separators for the es_* locals are wrong. I had a discussion about these problems with Uli a month or so ago. I'm not going to try and backport the exact changes Uli made as they include a variety of unrelated fixes. However, I do have a patch which fixes the thousands separator and grouping for all the locals mentioned in this BZ. I expect I'll have those builds today. I'll also include the correct output for the numeric.sh testscript.
Here's the proper output for numeric.sh. fr_BE.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 fr_CA.UTF-8 4294967296 = 4 294 967 296 1234,123400 = 1 234,123400 fr_CH.UTF-8 4294967296 = 4'294'967'296 1234.123400 = 1'234.123400 fr_FR.UTF-8 4294967296 = 4 294 967 296 1234,123400 = 1 234,123400 es_AR.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_CL.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_CO.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_CR.UTF-8 4294967296 = 4 294 967 296 1234,123400 = 1 234,123400 es_DO.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_EC.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_ES.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_GT.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_HN.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_MX.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_NI.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_PA.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_PE.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_PR.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_SV.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_UY.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 es_VE.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 de_AT.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400 de_CH.UTF-8 4294967296 = 4'294'967'296 1234.123400 = 1'234.123400 de_DE.UTF-8 4294967296 = 4.294.967.296 1234,123400 = 1.234,123400
For future reference, what sources did you use for the locales? I based my patch on the POSIX data at unicode.org. Is this not a reliable source?
I'm not sure what source Uli used for all of them; however, he explicitly noted that CLDR is not considered accurate or authoritative. Generally, gov't specs are considered authoritative.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0260.html