Bug 487844

Summary: Aliasing and Optimizations cause numeric conversion errors
Product: [Fedora] Fedora Reporter: Mamoru TASAKA <mtasaka>
Component: nsprAssignee: Kai Engert (:kaie) (inactive account) <kengert>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: adam, ascii79, bgamari, bnocera, bugs, caillon, craig, dirtyepic, felix, jakub, jlayton, kengert, kevin, moneta.mace, nicolas.mailhot, sangu.fedora, seandarcy, thaytan, tmraz, twoerner, vonbrand, wtc, xgl-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-31 20:05:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 446452    
Attachments:
Description Flags
screenshot of thunderbird with new nspr
none
screenshot of thunderbird with old nspr
none
rh487844.patch
none
my trial patch
none
Patch v3
none
Patch v4
none
Patch v5
none
wtc's patch (using union U) none

Description Mamoru TASAKA 2009-02-28 06:49:02 UTC
Created attachment 333591 [details]
screenshot of thunderbird with new nspr

Description of problem:
After F-11 mass rebuild, all of firefox/thunderbird/kazehakase
shows too long spaces between each lines (see attached),
which renders these applications quite unusable.

Either chaning nspr-4.7.3-4.fc11 to
- the previous nspr-4.7.3-3.fc11
- or the recompiled one with using "-O0" compilation flags instead
  of "-O2"
seems to resolve this issue, so perhaps this is related
to gcc optimization

Version-Release number of selected component (if applicable):
nspr-4.7.3-4.fc11.i586

affected:
firefox-3.1-0.7.beta2.fc11.i586
thunderbird-2.0.0.19-1.fc10.i386
kazehakase-0.5.6-6.svn3761_trunk.fc11.i586

How reproducible:
100%

Steps to Reproduce:
1. see above
2.
3.
  
Actual results:
See attached

Expected results:
See attached

Comment 1 Mamoru TASAKA 2009-02-28 06:49:56 UTC
Created attachment 333592 [details]
screenshot of thunderbird with old nspr

Comment 2 Sachin Garg 2009-03-01 05:41:31 UTC
similar behavior.  Downgrading nspr helps. Thnx Mamoru

Comment 3 Horst H. von Brand 2009-03-01 18:33:35 UTC
Same here, with x86_64. Thanks!

Comment 4 Ben Gamari 2009-03-01 22:07:48 UTC
*** Bug 487969 has been marked as a duplicate of this bug. ***

Comment 5 Mamoru TASAKA 2009-03-02 13:39:40 UTC
After some investigation, 
it seems that compiling mozilla/nsprpub/pr/src/misc/prdtoa.c
with -O0 instead of -O2 and reinstalling libnspr4.so works.

Comment 6 Thomas Woerner 2009-03-02 13:43:28 UTC
I have an additional problem: Fonts are very huge in firefox and thunderbird
(only two characters visible on a 1920x1200 screen).

Downgrading nspr solves my problem, too.

Comment 7 Mamoru TASAKA 2009-03-02 14:32:22 UTC
even -O1 works. I cannot debug anything further for now.

Comment 8 Jakub Jelinek 2009-03-02 17:30:33 UTC
Try -O2 -fno-strict-aliasing, if that helps, look at -O2 -Wall build to see if there were any aliasing violations.  You can also try -O2 -fno-inline, if the problem is still reproduceable with it, try to add __attribute__((optimize (0)))
on various routines to narrow it to one routine.

Comment 9 François Cami 2009-03-02 20:40:13 UTC
*** Bug 488082 has been marked as a duplicate of this bug. ***

Comment 10 Mamoru TASAKA 2009-03-02 20:58:11 UTC
Actually:

mozilla/nsprpub/pr/src/misc/prdtoa.c creates a lot of warnings like:
-----------------------------------------------------------
../../.././mozilla/nsprpub/pr/src/misc/prdtoa.c:1254: warning: dereferencing pointer 'd.65' does break strict-aliasing rules
../../.././mozilla/nsprpub/pr/src/misc/prdtoa.c:1254: note: initialized from here
../../.././mozilla/nsprpub/pr/src/misc/prdtoa.c:1256: warning: dereferencing pointer 'd.65' does break strict-aliasing rules
../../.././mozilla/nsprpub/pr/src/misc/prdtoa.c:1256: note: initialized from here
../../.././mozilla/nsprpub/pr/src/misc/prdtoa.c:1261: warning: dereferencing pointer 'd.65' does break strict-aliasing rules
../../.././mozilla/nsprpub/pr/src/misc/prdtoa.c:1261: note: initialized from here
(and so many)
-----------------------------------------------------------

by something like:
-----------------------------------------------------------
typedef union { double d; unsigned int L[2]; } U;
#define word0(x) ((U*)&x)->L[1]

int main(){
	double d;
#define d0 word0(d)
	d0 = 3;
	return 0;
}
-----------------------------------------------------------

and -fno-strict-aliasing workarounds this issue.

Comment 11 Mamoru TASAKA 2009-03-02 21:45:58 UTC
*** Bug 488150 has been marked as a duplicate of this bug. ***

Comment 12 Kevin Kofler 2009-03-02 23:22:47 UTC
Presumably the same dtoa code which is also used in KJS. See also bug 485968.

Compiling just that file with -fno-strict-aliasing should be enough.

But what I think is the proper fix is to use this for __GNUC__:
#define word0(x) ((U)x).L[1]
instead of:
#define word0(x) ((U*)&x)->L[1]

Cast to union is a GNU extension, but it's the aliasing-safe way to do this. I think the code used right now is not actually valid, even if older GCCs accepted it. That said, I wouldn't complain if GCC restored backwards compatibility there... So should this stay filed against gcc or be reassigned to nspr?

I haven't had the time to test the above fix (the cast to union) on KJS yet though (currently using -fno-strict-aliasing on the dtoa code).

Comment 13 Jakub Jelinek 2009-03-03 10:00:56 UTC
GCC isn't going to change to make invalid code work as the author intended.
So, either use -fno-strict-aliasing to compile this, or fix the aliasing bugs.
The ((U*)&x)->L[1] casts might have shut up aliasing warnings in earlier GCCs, but doesn't make it valid code; the effective type of the objects as per ISO C99, 6.5/6 is still double, not U, so it can't be accessed using an lvalue of an incompatible type.

Comment 14 Jakub Jelinek 2009-03-03 10:03:48 UTC
Created attachment 333857 [details]
rh487844.patch

Completely untested patch.  Both the word{0,1} and Storeinc macros were violating aliasing rules when used on double (resp. ULong) objects.

Comment 15 Mamoru TASAKA 2009-03-03 10:55:46 UTC
(In reply to comment #14)
> Created an attachment (id=333857) [details]
> rh487844.patch
> 
> Completely untested patch.  Both the word{0,1} and Storeinc macros were
> violating aliasing rules when used on double (resp. ULong) objects.

Mostly works, however with thunderbird (both
thunderbird-2.0.0.19-1.fc10.i386 and thunderbird-2.0.0.18-2.fc11.i386)
when I
- click "Get Mail"
- some mails are received
- popup window appears which shows like "local holder has 2 new
  messages"
then this popup window "flickers", while with previous nspr this popup
window does not flicker.

Comment 16 Mamoru TASAKA 2009-03-03 11:01:56 UTC
Created attachment 333865 [details]
my trial patch

Rather I tried to use union U to shut up all warnings related
to aliasing warning and some warnings related to uninitialized
value.

This patch seems to work both for firefox/thunderbird, and flickering
issue on popup window of thunderbird does not appear (as before).
Is this patch reasonable?

Comment 17 Jan Schmidt 2009-03-03 15:10:51 UTC
This patch the problem for me.

Comment 18 Wan-Teh Chang 2009-03-03 18:26:30 UTC
I maintain NSPR and am familiar with the history of prdtoa.c.
prdtoa.c is based on David Gay's dtoa.c (http://www.netlib.org/fp/dtoa.c)
with only trivial changes.

The union U at
http://mxr.mozilla.org/nspr/source/nsprpub/pr/src/misc/prdtoa.c#359
was actually a previous attempt at fixing aliasing problems.  It is
described in the "changes" file under "Fri Sep 17 01:39:25 EDT 1999":
http://www.netlib.org/fp/changes

Does this bug mean the union U fix is still not correct?

Re: fixing the bug: I'd prefer a fix that doesn't use GCC extensions.
I am not embarassed to admit that I don't fully understand the
aliasing rules, and there seems to be widespread ignorance about
aliasing rules.  Is it possible to come up with a portable fix for
this bug?  If not, I'm willing to compile prdtoa.c or the entire
NSPR with -fno-strict-aliasing.

Comment 19 Kevin Kofler 2009-03-03 18:56:35 UTC
> I maintain NSPR and am familiar with the history of prdtoa.c.
> prdtoa.c is based on David Gay's dtoa.c (http://www.netlib.org/fp/dtoa.c)
> with only trivial changes.

This is the same as the one used in KJS.

> The union U at
> http://mxr.mozilla.org/nspr/source/nsprpub/pr/src/misc/prdtoa.c#359
> was actually a previous attempt at fixing aliasing problems.
>
> Does this bug mean the union U fix is still not correct?

Exactly. It's not correct because you're only casting a pointer to x to refer to the union and not x itself.

> Re: fixing the bug: I'd prefer a fix that doesn't use GCC extensions.
> I am not embarassed to admit that I don't fully understand the
> aliasing rules, and there seems to be widespread ignorance about
> aliasing rules.  Is it possible to come up with a portable fix for
> this bug?

Well, it's possible, but it'd have to look like this:
union U tmp;
tmp.D=x;
foo=tmp.L[1];
so you'd have to change how these macros work entirely.

Cast to union is a GCC extension.

> If not, I'm willing to compile prdtoa.c or the entire NSPR with
> -fno-strict-aliasing.

I don't think that's a good solution when a better fix exists. The GCC-specific code is under #ifdef __GNUC__ already. Other compilers could be added if they support cast to union. And -fno-strict-aliasing is also a GCC-specific flag, so using it doesn't really increase portability.

FWIW, as far as KJS is concerned, I plan to test Jakub's patch and commit it to KJS's copy of that dtoa code (in upstream KDE).

Comment 20 Jakub Jelinek 2009-03-03 19:06:39 UTC
See http://c0x.coding-guidelines.com/6.5.html (if you don't have official C99 standard copy), particularly lines 948 and 960-966 on that page or
info gcc --index-search=fstrict-aliasing
Yes, the previous attempt to fix the aliasing issues was insufficient, because the effective type is still double and is accessed through lvalue of type U.

Without GCC extensions, you could e.g. use inlines as in:
#ifdef IEEE_8087
static inline ULong word0 (double d) { U u; u.d = d; return u.L[1]; }
static inline ULong word1 (double d) { U u; u.d = d; return u.L[0]; }
#else
static inline ULong word0 (double d) { U u; u.d = d; return u.L[0]; }
static inline ULong word1 (double d) { U u; u.d = d; return u.L[1]; }
#endif

Note that both what I have in the patch above and this would mean word0 and word1 are rvalues, so in the few places where word0 and word1 is actually used to set or change part of a double value, you'd need to use a different macro/inline,
say have
static inline double set_words (ULong word0, ULong word1)
{
  U u;
#ifdef IEEE_8087
  u.L[1] = word0;
  u.L[0] = word1;
#else
  u.L[0] = word0;
  u.L[1] = word1;
#endif
  return u.d;
}

and perhaps also have macro/inlines for setting just half of the word, though
obviously setting both together will be more efficient.
Say
static inline double set_word0 (double d, ULong word0)
{
  U u;
  u.d = d;
#ifdef IEEE_8087
  u.L[1] = word0;
#else
  u.L[0] = word0;
#endif
  return u.d;
}
and similarly for set_word1.

See e.g.
http://sourceware.org/cgi-bin/cvsweb.cgi/~checkout~/libc/sysdeps/ieee754/ldbl-128/math_ldbl.h?rev=1.2&content-type=text/plain&cvsroot=glibc
for what macros uses glibc for similar purposes.

Comment 21 Christopher Aillon 2009-03-03 20:54:36 UTC
*** Bug 488054 has been marked as a duplicate of this bug. ***

Comment 22 Kai Engert (:kaie) (inactive account) 2009-03-03 21:14:20 UTC
Created attachment 333929 [details]
Patch v3

This is an attempt to combine Mamoru Tasaka's patch from comment 16, but removing the _GNUC_ sections, and using Jakub's proposals from comment 20.

... But I have not yet changed the dval macro to an inline function.

I don't have a fedora rawhide / gcc 4.4.0 installation now.
With this patch, do you still get the aliasing warnings?

Comment 23 Kevin Kofler 2009-03-03 22:08:10 UTC
I tried applying Jakub's patch from comment #14 to the copy of the dtoa code in KJS, but it doesn't work (the aliasing warnings disappear, but Konqueror still crashes, whereas when I compile the file with -fno-strict-aliasing, it doesn't). Chances are it'll also not work properly for NSPR as the code is essentially the same.

We haven't tested the effects of the patch v2 or v3 on the KJS code yet.

Comment 24 Wan-Teh Chang 2009-03-03 22:54:05 UTC
Jakub: thanks a lot for the info.  I've asked Kai Engert
to work on a patch using that approach.  Since NSPR treats
prdtoa.c as third-party code, I'll need to ask the author
(David Gay) how he wants to fix this bug.  I will compile
prdtoa.c with -fno-strict-aliasing until this bug is fixed
in the upstream dtoa.c.

Comment 25 Kai Engert (:kaie) (inactive account) 2009-03-04 01:46:37 UTC
Created attachment 333961 [details]
Patch v4

v3 had bugs.

This patch works for me, it passes the dtoa test program in optimized mode.

Comment 26 Kai Engert (:kaie) (inactive account) 2009-03-04 02:58:18 UTC
Created attachment 333965 [details]
Patch v5

Patch v4 still used "cast to union" for the dval macro, and we don't want that.

The "use a function" approach doesn't work for the dval macro, in my understanding. This would require to pass union U by value, and I don't like that idea.

So I propose to simply refer to union member d directly.

This unveiled another strange property of this code: Macro dval is used on plain doubles. I think we should not cast a double to a union pointer and take out the double member... Let's stop using the macro when we have a double already.

Also, I tried to make the code work with both YES_ALIAS, this require to make function set_words available unconditionally.

This patch has some more context lines (diff -u8)

Comment 27 Kai Engert (:kaie) (inactive account) 2009-03-04 03:03:22 UTC
Changing bug summary from
  Too long spaces between each lines
to
  Aliasing and Optimizations cause numeric conversion errors

Comment 28 Jakub Jelinek 2009-03-04 10:02:44 UTC
Changing the double variables in the functions (except for the inlines) to have U type seems unnecessary to me, even without that you are with the new word0/word1/set_words/set_word0/set_word1 inlines accessing them only through double type.
And dval(x) could very well be defined to (x) in that case, again, there is no aliasing violation.

On the other side, if you change all the variables that you apply {word{0,1},set_word{s,0,1}} on to U (for function arguments, at least if the function is exported from the CU, you'd need to keep them double and just have a corresponding automatic U type variable you'd uparam.d = param; in the beginning of the function), then you could avoid introducing set_word{s,0,1} inlines and just define word0/word1/dval as macros that would work on U type values instead of double.  In that case
#ifdef IEEE_8087
#define word0(x) ((x).L[1])
#define word1(x) ((x).L[0])
#else
#define word0(x) ((x).L[0])
#define word1(x) ((x).L[1])
#endif
#define dval(x) ((x).d)

I'd say you should choose one of these (or some other you come up with), but it is
completely unnecessary to use the first one and have U type variables from the second solution in many places.

Comment 29 Wan-Teh Chang 2009-03-04 19:27:52 UTC
Created attachment 334043 [details]
wtc's patch (using union U)

This patch implements the second solution Jakub described
in comment 28 (using the union U).  Please see the upstream
NSPR bug for a description of this patch:
https://bugzilla.mozilla.org/show_bug.cgi?id=439144#c15

Since I don't have gcc 4.4, I don't know if there are
any remaining type-punned pointer warnings after this
patch is applied.  When I wrote this patch, I only tried
to make the code compile, without considering the
type-punned pointer warnings.

Comment 30 Kai Engert (:kaie) (inactive account) 2009-03-04 22:51:09 UTC
For those who suffer from the bug, yesterday a new NSPR package got built into Rawhide which uses a workaround. Hopefully this workaround helps you. If it doesn't, please let us know. The new package version is 4.7.3-5

The workaround uses compiler options -ffloat-store -fno-strict-aliasing

We'll remove these options once we have a real fix (code changes) added upstream.

Comment 31 Wan-Teh Chang 2009-03-05 04:05:52 UTC
Comment on attachment 334043 [details]
wtc's patch (using union U)

Jakub reminded me that the Storeinc macro also has
aliasing problems.

I studied the code and found that Storeinc is used
only if NO_LONG_LONG is defined.  Since all the platforms
we support have 'long long' (required by C99), I don't
think we'll ever compile prdtoa.c with -DNO_LONG_LONG,
so I decided to not fix Storeinc.

Comment 32 Horst H. von Brand 2009-03-08 19:09:54 UTC
(In reply to comment #30)
> For those who suffer from the bug, yesterday a new NSPR package got built into
> Rawhide which uses a workaround. Hopefully this workaround helps you. If it
> doesn't, please let us know. The new package version is 4.7.3-5

It works here (x86_64)

Comment 33 Thomas Woerner 2009-03-09 09:41:51 UTC
It is also working for me.

Comment 34 Mamoru TASAKA 2009-03-12 19:25:52 UTC
Surprisingly, the same codes are also used in ruby
and now ruby does some very odd behavior (bug 489990)

Comment 35 Wan-Teh Chang 2009-03-19 13:39:59 UTC
Comment on attachment 334043 [details]
wtc's patch (using union U)

I submitted this patch to the author of dtoa.c on March 5.  He happened
to be dealing with a couple of other issues related to dtoa.c, and switched
to a use of union U similar to our patches.  He released a new version of
dtoa.c on Monday (2009-03-16) that fixes not only the strict aliasing issues
but also the parentheses and some (but not all) "may be used uninitialized"
gcc warnings.  You can get the new dtoa.c from http://www.netlib.org/fp/dtoa.c
and read the change log at http://www.netlib.org/fp/changes.

So I'm going to apply this patch to NSPR's prdtoa.c and back out the
-fno-strict-aliasing workaround.  This patch is not exactly a backport
of the author's fix, but this patch is smaller so I think it's more suitable.
I will upgrade to the new dtoa.c later because that requires more testing.

I'm going to submit NSPR's other dtoa.c patch (see
https://bugzilla.mozilla.org/show_bug.cgi?id=362134) to the
author of dtoa.c.  If you have patched your copy of dtoa.c, I
encourage you to submit your patches to the author.

Comment 36 Kai Engert (:kaie) (inactive account) 2009-03-31 20:05:28 UTC
I'll close this bug based on the feedback we've received.
We'll get the better fix with a future nspr version update (and then remove the patch).