Bug 524223

Summary: incomplete/wrong LC_NUMERIC handling
Product: [Fedora] Fedora Reporter: Karel Volný <kvolny>
Component: glibcAssignee: Andreas Schwab <schwab>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 11CC: fweimer, jakub, petr.pisar, schwab
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-28 14:43:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
numberstest.c none

Description Karel Volný 2009-09-18 13:00:47 UTC
Created attachment 361645 [details]
numberstest.c

Description of problem:
Experimenting with localised input and output, I have found a few problems within the handling of numbers.
And while I think this is upstream problem, I report here as per http://www.gnu.org/software/libc/bugs.html#what_to_report

Version-Release number of selected component (if applicable):
glibc-2.10.1-5.x86_64

How reproducible:
always

Steps to Reproduce:
1. compile the attached reproducer (gcc -o numberstest numberstest.c)
2. ./numberstest
  
Actual results:
*** C ***                                                     
Negative sign test:
1) U002D -1 = -1.000000
2) U2013 –1 = 0.000000
3) U2212 −1 = 0.000000
Decimal point test:
1) 123456.7890 = 123456.789062
2) 123456,7890 = 123456.000000
Thousands separator test:
1)       123456 = 123456.000000
2) U002C 123,456 = 123.000000
3) U002E 123.456 = 123.456001
4) U0020 123 456 = 123.000000
5) U00A0 123 456 = 123.000000
6) U202F 123 456 = 123.000000

*** cs_CZ.UTF-8 ***
Negative sign test:
1) U002D -1 = -1,000000
2) U2013 –1 = 0,000000
3) U2212 −1 = 0,000000
Decimal point test:
1) 123456.7890 = 123456,000000
2) 123456,7890 = 123456,789062
Thousands separator test:
1)       123456 = 123456,000000
2) U002C 123,456 = 123,456001
3) U002E 123.456 = 123,000000
4) U0020 123 456 = 123,000000
5) U00A0 123 456 = 123,000000
6) U202F 123 456 = 123,000000

Expected results:
*** C ***
Negative sign test:
1) U002D -1 = -1.000000
2) U2013 –1 = 0.000000
3) U2212 −1 = 0.000000
Decimal point test:
1) 123456.7890 = 123456.789000
2) 123456,7890 = 123456.000000
Thousands separator test:
1)       123456 = 123456.000000
2) U002C 123,456 = 123.000000
3) U002E 123.456 = 123.456000
4) U0020 123 456 = 123.000000
5) U00A0 123 456 = 123.000000
6) U202F 123 456 = 123.000000

*** cs_CZ.UTF-8 ***
Negative sign test:
1) U002D -1 = -1,000000
2) U2013 –1 = -1,000000
3) U2212 −1 = -1,000000
Decimal point test:
1) 123456.7890 = 123 456,000000
2) 123456,7890 = 123 456,789000
Thousands separator test:
1)       123456 = 123 456,000000
2) U002C 123,456 = 123,456001
3) U002E 123.456 = 123,000000
4) U0020 123 456 = 123 456,000000
5) U00A0 123 456 = 123 456,000000
6) U202F 123 456 = 123 456,000000

Additional info:
As for the negative sign:
- the C locale uses ASCII (or, to be precise, ANSI_X3.4-1968) and so that it is ok not to recognize the extended characters
- on the other hand, *.UTF-8 locales use encoding that maps to Unicode and should be able to handle the extended characters properly; in this case, according to the Unicode standard chapter 6.2, all those characters should be understood as minus (and the file translit_neutral has it like this)

As for the decimal point test:
- the cs_CZ locale defines non-breaking space as the separator, so both the results should be grouped by 3 digits separated by space
- note that this test serves mainly as a check that locale handling is involved, as we can see that the point<->comma exchange works properly

As for the thousands separator:
- the same as for the decimal point test applies
- as Unicode is involved, all the space variants should be handled the same (again, in translit_neutral there is already a relation saying that U202F has the equivalents U00A0 and U0020)

Comment 1 Petr Písař 2009-09-21 13:29:15 UTC
The thousand separator will apply to output only if apostrophe modifier is used. E.g. printf("%'d\n", 1024).

Question is whether it is and whether it can be applied to scanf(3) input?

The thousand separator and record separator could clash in some locales:

setlocale(LC_ALL, "cs_CZ.UTF-8); 
scanf("%'d %'d", &number1, &number2);

scanf(3) could be confused when reading "123 456" or "1 024 2 048". We could constrain the input thousand separator just to output thousand separator and not to apply translit_neutral. However this brings another exceptions and more complicated processing.

Comment 2 Andreas Schwab 2009-09-21 14:15:19 UTC
Your expectations are wrong. scanf is consistent with strtod.

Comment 3 Petr Písař 2009-09-21 15:16:21 UTC
Wait a moment. strtod says:

> The expected form of the subject sequence is an optional plus or minus sign,
> then…

> In other than the C or POSIX locales, other implementation-defined subject
> sequences may be accepted.

This does not constrain the `minus' sign just onto U+002D character. Why not to allow real arithmetic minus U+2212 and minus-long-as-digit U+2012. This says Unicode:

> When interpreting formulas, U+002D hyphen-minus, U+2012 figure dash,
> and U+2212 minus sign should each be taken as indicating a minus sign

The proposed change is motivated with this idea: If we allow to printf numbers in locale specific format, we should allow to scanf numbers in locale specific format either. Otherwise simple select-and-paste will not work.

As I demonstrated before there can be problems with thousand separator in input. However I can't see any problem with Unicode minus sign in input in UTF-8 locale.

Comment 4 Karel Volný 2009-09-22 08:50:22 UTC
(In reply to comment #2)
> Your expectations are wrong. scanf is consistent with strtod.  

sorry to bother your circles, but I cannot accept such resolution

if there is a problem with strtod that results in a problem with scanf, please do not close this bug just because I've used scanf in my example

I admit that I'm not a programmer and I might get things wrong - for example, as Petr has said, I omitted using the apostrophe character to get the output grouped ... 'cause what I have found regarding the matter was this: http://www.gnu.org/software/libc/manual/html_node/Formatting-Numbers.html#Formatting-Numbers that suggests using ‘^’ to explicitly disable the grouping, and it adds "By default grouping is enabled." - yes, this talks about strfmon and not printf, but the manual simply does not bother to say there's a difference or to describe all the cases of locale handling (and the new C standard - http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf - does not seem to be more verbose on the subject)

while I understand you cannot waste your time teaching people C, I'd like to kindly ask you to improve my testcase, so that it works as desired - i.e. to demonstrate proper reading of different forms of the input, so we can make sure that we are able to properly read back what we have printed ... to my best knowledge, this is currently impossible if locales are involved

reading the abovementioned standard on strtod:

> In other than the "C" locale, additional locale-specific subject sequence
> forms may be accepted.

- so, while it does not require the implementation to really accept localised input, it allows the possibility and I (and maybe a few others :-)) would like to see this working (maybe adding "RFE" to the subject would be appropriate ...)

Comment 5 Andreas Schwab 2009-09-22 11:40:43 UTC
Please file an enhancement request upstream.  There is nothing wrong with the current implementation.

Comment 6 Bug Zapper 2010-04-28 10:27:19 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Bug Zapper 2010-06-28 14:43:26 UTC
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.