675259 – incorrect numeric settings for French, Spanish, and German locales

Bug 675259 - incorrect numeric settings for French, Spanish, and German locales

Summary: incorrect numeric settings for French, Spanish, and German locales

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	5.6
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Jeff Law
QA Contact:	qe-baseos-tools-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	236212
Blocks:	688720 1047909
TreeView+	depends on / blocked

Reported:	2011-02-04 18:13 UTC by Jeff Bastian
Modified:	2019-04-16 13:59 UTC (History)
CC List:	2 users (show)
Fixed In Version:	glibc-2.5-79
Doc Type:	Bug Fix
Doc Text:
Clone Of:	236212
Clones:	688720 (view as bug list)
Environment:
Last Closed:	2012-02-21 06:32:55 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
patch to correct French thousands separator (1.47 KB, patch) 2011-02-04 23:09 UTC, Jeff Bastian	no flags	Details \| Diff
patch to correct French thousands separator and grouping (1.16 KB, patch) 2011-02-05 00:09 UTC, Jeff Bastian	no flags	Details \| Diff
patch for French, Spanish, and German locales for LC_NUMERIC (9.71 KB, patch) 2011-02-15 18:48 UTC, Jeff Bastian	no flags	Details \| Diff
Show Obsolete (2) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2012:0260	0	normal	SHIPPED_LIVE	glibc bug fix update	2012-02-20 15:06:42 UTC

Description Jeff Bastian 2011-02-04 18:13:06 UTC

+++ This bug was initially created as a clone of Bug #236212 +++

Description of problem:
In French, the thousands separator is a space, for example, "1 024".  The fr_FR locale in RHEL 5 has the thousands separator incorrectly defined as a null.

This was fixed with upstream commit bfe6bf1:
http://sourceware.org/git/?p=glibc.git;a=commit;h=bfe6bf17a38b407eb1210f6f7e6d2561c7da05aa

Please update the RHEL 5 fr_FR locale definitions.


Version-Release number of selected component (if applicable):
glibc-2.5-58

How reproducible:
Always

Steps to Reproduce:
env LC_ALL=fr_FR.UTF-8 /usr/bin/printf "%'d\n" 4294967296

Actual Results:
4294967296

Expected Results:
4 294 967 296

Comment 1 Jeff Bastian 2011-02-04 18:17:27 UTC

Commit b632f9a also updates the self-tests when building glibc:

http://sourceware.org/git/?p=glibc.git;a=commit;h=b632f9a81640db676905250257e677b415c963f9

Comment 3 Jeff Bastian 2011-02-04 23:09:11 UTC

Created attachment 477140 [details]
patch to correct French thousands separator

The upstream patches combined for RHEL 5

Comment 5 Jeff Bastian 2011-02-04 23:17:03 UTC

Hmm, even with the patch, it's not printing correctly:

[user@localhost ~]$ rpm -q glibc
glibc-2.5-58.bz675259.x86_64
glibc-2.5-58.bz675259.i686

[user@localhost ~]$ env LC_ALL=fr_FR.UTF-8 /usr/bin/printf "%'d\n" 4294967296
4294967296


However, another quick test shows that the thousands separator is correctly set to a space:

[user@localhost ~]$ cat thousands_sep.c
#include<locale.h>
#include<stdio.h>
int main(void)
{
    struct lconv locale_structure;
    struct lconv *locale_ptr=&locale_structure;

    setlocale(LC_ALL, "fr_FR.UTF-8");

    locale_ptr=localeconv();
    printf("Thousands Separator: '%s'\n",locale_ptr->thousands_sep);
}

[user@localhost ~]$ gcc -o thousands_sep thousands_sep.c

[user@localhost ~]$ ./thousands_sep
Thousands Separator: ' '


I'll do some more research next week.

Comment 6 Jeff Bastian 2011-02-04 23:27:58 UTC

Oh, I think I found the problem: the grouping also has to be changed.
 LC_NUMERIC
 decimal_point             "<U002C>"
 thousands_sep             "<U0020>"
-grouping                  0;0
+grouping                  3
 END LC_NUMERIC

This was fixed with commit d03eba1:
http://sourceware.org/git/?p=glibc.git;a=commit;h=d03eba121c430068fe97f3f85495b2d1fe9b694f

Reported upstream at:
http://sourceware.org/bugzilla/show_bug.cgi?id=6040

I'll try fixing the grouping too.

Comment 7 Jeff Bastian 2011-02-05 00:04:23 UTC

Yes, changing the grouping fixed the problem.

I did a quick rebuild of the fr_FR locales instead of rebuilding all of glibc to verify it.

1. First, as root, edit the fr_FR locale file and changing the grouping in 
   the LC_NUMERIC from 0;0 to 3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vim /usr/share/i18n/locales/fr_FR
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2. Next, still as root, compile the fr_FR and fr_FR@euro locale files:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
zcat /usr/share/i18n/charmaps/ISO-8859-1.gz > \
     /usr/share/i18n/charmaps/ISO-8859-1
zcat /usr/share/i18n/charmaps/UTF-8.gz > \
     /usr/share/i18n/charmaps/UTF-8
localedef -c -f /usr/share/i18n/charmaps/ISO-8859-1 \
             -i /usr/share/i18n/locales/fr_FR \
             /usr/lib/locale/fr_FR
localedef -c -f /usr/share/i18n/charmaps/ISO-8859-1 \
             -i /usr/share/i18n/locales/fr_FR@euro \
             /usr/lib/locale/fr_FR@euro
localedef -c -f /usr/share/i18n/charmaps/UTF-8 \
             -i /usr/share/i18n/locales/fr_FR@euro \
             /usr/lib/locale/fr_FR.utf8
build-locale-archive
rm /usr/share/i18n/charmaps/ISO-8859-1
rm /usr/share/i18n/charmaps/UTF-8
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

3. Finally, test it out:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[user@localhost ~]$ env LC_ALL=fr_FR /usr/bin/printf "%'d\n" 4294967296 
4 294 967 296

[user@localhost ~]$ env LC_ALL=fr_FR.UTF-8 /usr/bin/printf "%'d\n" 4294967296
4 294 967 296

[user@localhost ~]$ env LC_ALL=fr_FR.ISO-8859-1 /usr/bin/printf "%'d\n" 4294967296
4 294 967 296
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It works!

Comment 8 Jeff Bastian 2011-02-05 00:09:48 UTC

Created attachment 477151 [details]
patch to correct French thousands separator and grouping

Comment 11 Jeff Bastian 2011-02-15 18:48:01 UTC

Created attachment 478946 [details]
patch for French, Spanish, and German locales for LC_NUMERIC

We've discovered more problems with LC_NUMERIC settings in some Spanish and German locales.

I've compared the fr_*, es_*, and de_* files in RHEL 5 glibc against the latest Unicode CLDR (Common Locale Data Repository), version 1.9, and patched the LC_NUMERIC section where appropriate.  See the attached patch.

http://unicode.org/Public/cldr/1.9.0/posix.zip
http://unicode.org/cldr/trac/browser/tags/release-1-9/posix/

Comment 13 Jeff Bastian 2011-02-15 22:32:22 UTC

Testing the patch from comment 11.  All French, Spanish and German locales have a grouping of 3 now and non-null thousands separator.

[username@localhost ~]$ rpm -q glibc-common
glibc-common-2.5-58.bz675259.2.x86_64

[username@localhost ~]$ cat numeric.sh 
#!/bin/bash

for L in /usr/share/locale/{fr,es,de}_* ; do
  loc=$(basename $L).UTF-8
  echo $loc
  env LC_ALL=$loc printf "%d = %'d\n" 4294967296 4294967296
  env LC_ALL=$loc printf "%f = %'f\n" 1234.1234 1234.1234
  echo
done

[username@localhost ~]$ ./numeric.sh 
fr_BE.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

fr_CA.UTF-8
4294967296 = 4 294 967 296
1234,123400 = 1 234,123400

fr_CH.UTF-8
4294967296 = 4'294'967'296
1234.123400 = 1'234.123400

fr_FR.UTF-8
4294967296 = 4 294 967 296
1234,123400 = 1 234,123400

es_AR.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_CL.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_CO.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_CR.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_DO.UTF-8
4294967296 = 4,294,967,296
1234.123400 = 1,234.123400

es_EC.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_ES.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_GT.UTF-8
4294967296 = 4,294,967,296
1234.123400 = 1,234.123400

es_HN.UTF-8
4294967296 = 4,294,967,296
1234.123400 = 1,234.123400

es_MX.UTF-8
4294967296 = 4,294,967,296
1234.123400 = 1,234.123400

es_NI.UTF-8
4294967296 = 4,294,967,296
1234.123400 = 1,234.123400

es_PA.UTF-8
4294967296 = 4,294,967,296
1234.123400 = 1,234.123400

es_PE.UTF-8
4294967296 = 4,294,967,296
1234.123400 = 1,234.123400

es_PR.UTF-8
4294967296 = 4,294,967,296
1234.123400 = 1,234.123400

es_SV.UTF-8
4294967296 = 4,294,967,296
1234.123400 = 1,234.123400

es_UY.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_VE.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

de_AT.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

de_CH.UTF-8
4294967296 = 4'294'967'296
1234.123400 = 1'234.123400

de_DE.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

Comment 21 Jeff Law 2011-12-13 20:52:50 UTC

Miroslav,

You are correct, Andreas failed to include the fix for several locales which were noted as incorrect in c13.  

However, the desired output c13 is incorrect (and thus the patch is also incorrect) in that many of the thousands separators for the es_* locals are wrong.  I had a discussion about these problems with Uli a month or so ago.

I'm not going to try and backport the exact changes Uli made as they include a variety of unrelated fixes.  However, I do have a patch which fixes the thousands separator and grouping for all the locals mentioned in this BZ.  I expect I'll have those builds today.  I'll also include the correct output for the numeric.sh testscript.

Comment 22 Jeff Law 2011-12-14 19:49:56 UTC

Here's the proper output for numeric.sh.

fr_BE.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

fr_CA.UTF-8
4294967296 = 4 294 967 296
1234,123400 = 1 234,123400

fr_CH.UTF-8
4294967296 = 4'294'967'296
1234.123400 = 1'234.123400

fr_FR.UTF-8
4294967296 = 4 294 967 296
1234,123400 = 1 234,123400

es_AR.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_CL.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_CO.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_CR.UTF-8
4294967296 = 4 294 967 296
1234,123400 = 1 234,123400

es_DO.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_EC.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_ES.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_GT.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_HN.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_MX.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_NI.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_PA.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_PE.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_PR.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_SV.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_UY.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

es_VE.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

de_AT.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

de_CH.UTF-8
4294967296 = 4'294'967'296
1234.123400 = 1'234.123400

de_DE.UTF-8
4294967296 = 4.294.967.296
1234,123400 = 1.234,123400

Comment 23 Jeff Bastian 2011-12-14 21:04:26 UTC

For future reference, what sources did you use for the locales?

I based my patch on the POSIX data at unicode.org.  Is this not a reliable source?

Comment 24 Jeff Law 2011-12-14 21:10:09 UTC

I'm not sure what source Uli used for all of them; however, he explicitly noted that CLDR is not considered accurate or authoritative.  Generally, gov't specs are considered authoritative.

Comment 25 errata-xmlrpc 2012-02-21 06:32:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0260.html

Note You need to log in before you can comment on or make changes to this bug.