Bug 175462

Summary: wrong code generated on x86_64 for test involving LONGLONG_MIN
Product: [Fedora] Fedora Reporter: Tom Lane <tgl>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: hhorak
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-12-12 09:07:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
source code to reproduce bug none

Description Tom Lane 2005-12-11 04:10:11 UTC
Description of problem:
Attached program generates incorrect result when compiled on current buildfarm x86_64 compiler.

Version-Release number of selected component (if applicable):
gcc version 4.1.0 20051207 (Red Hat 4.1.0-0.6)

How reproducible:
100%

Steps to Reproduce:
1. gcc -O2  -mtune=nocona   decimal.c
2. ./a.out
  
Actual results:
Program prints
dec2ll: 0
ll result: -9223372036854775808

Expected results:
Should print
ll result: 9223372036854775807

Additional info:
The expected result is obtained if I omit either -O2 or -mtune=nocona from gcc command.

The problem appears to be generation of incorrect code for the if-test labeled "boundary case"
in decimal2longlong.

Test case is extracted from mysql 5.0.16, which is currently unable to build in FC5 because of this 
bug :-(

Comment 1 Tom Lane 2005-12-11 04:10:12 UTC
Created attachment 122104 [details]
source code to reproduce bug

Comment 2 Jakub Jelinek 2005-12-12 09:07:53 UTC
This code has undefined behaviour.
  /* boundary case: 9223372036854775808 */
  if (unlikely(from->sign==0 && x < 0 && -x < 0))
  {
    *to= -1-x;
    return E_DEC_OVERFLOW;
  }
This is relying on -x wrapping around, so either you must use -fwrapv
in which case you get the behaviour it relies on, or better yet just
if (unlikely(from->sign== 0 && x == LONGLONG_MIN))
...
Another thing is that LONGLONG_MIN definition is also wrong:
#define LONGLONG_MIN    ((long long) 0x8000000000000000LL)
should have been
#define LONGLONG_MIN    (-LONGLONG_MAX - 1)
plus there is a question why it doesn't use LLONG_MIN and LLONG_MAX macros
instead (or at least define these macros to LLONG_MIN resp. LLONG_MAX if those
macros are defined), as LLONG_M{IN,AX} are ISO C99 required macros
(or use LONG_LONG_M{IN,AX} that are the GNU variants of the same).

Comment 3 Tom Lane 2005-12-12 21:12:45 UTC
The proposed dependence on LONGLONG_MIN is a really sucky answer, mainly because
it forces the code to make specific assumptions about the actual width of "x".
I hadn't seen anyone use the "x < 0 && -x < 0" trick before, but I think it's a
pretty nice solution to the problem of checking for the minimum possible value
of a signed type.

Should I presume that gcc will also feel free to break every other test for
integer overflow that programmers might think to use?  For instance I see this
code in Postgres for the integer minus operator:
	result = -arg;
	if (arg != 0 && SAMESIGN(result, arg))
		... report error ...
A compiler that is prepared to make the deduction that broke MySQL seems
entirely capable of concluding that this test must always fail, too.

I have a feeling that -fwrapv is about to become part of the standard CFLAGS.

Comment 4 Jakub Jelinek 2005-12-12 21:23:19 UTC
Why is dependence on LONGLONG_MIN a sucky answer?  The code already depends on
it, just look 7 lines above this.  Also, you know you are using long long type,
which has LLONG_MIN as minimum, so it is nothing hackish.
The snippet you wrote might or might not be problematic, depending first of
all on whether result is unsigned or not (and, if it is signed and therefore
would trigger undefined behaviour if arg was LLONG_MIN, still if SAMESIGN
is defined in a way that doesn't allow the compiler to see through it, might
not be optimized out).

Now, -fwrapv can be an answer if you are unwilling to fix the broken code,
but be prepared that the performance will be terrible, as GCC will not be able
to optimize many loops in a way that it is allowed by the standard.

Comment 5 Tom Lane 2005-12-12 22:17:56 UTC
The reason it's sucky is that when you're dealing with a thicket of typedefs
that equate to different things on different machines, it's messy to keep track
of which underlying type a variable really is.  It's nicer to be able to write
code that works without any specific assumptions about how wide a datatype is. 
Furthermore, for more-general problems such as detecting whether an addition
overflowed, "use LONGLONG_MIN" simply isn't an answer.

As for the other argument: in the words of the sage, "I can make your program
arbitrarily fast ... if it doesn't have to give the right answer."  This sort of
willingness to play games with time-honored C semantics isn't helping your
cause.  Everyone who has a large program to maintain is using
-fno-strict-aliasing, because it's impractical if not impossible to figure out
what not using it might break. I think that -fwrapv will soon be in that
category too.  If you were adopting a less invasive policy or at least issuing
reasonably complete warnings, perhaps there would be more uptake for these
optimizations.

Comment 6 Jakub Jelinek 2005-12-12 22:33:21 UTC
You are wrong, most large programs don't use -fno-strict-aliasing.  And as
I said, GCC and every other compiler I know of has been taking advantage of
signed overflow undefinedness for years, it is nothing new.
Please define what does "the right answer" above mean?  You have undefined
code, anything is the right answer.
And talking about (x < 0 && -x < 0) as nicer code sounds really weird, that's
really disgusting code even if it was ever supposed to work as mysql expected.