1864107 – m4: FTBFS in Fedora rawhide/f33

Bug 1864107 - m4: FTBFS in Fedora rawhide/f33

Summary: m4: FTBFS in Fedora rawhide/f33

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	m4
Sub Component:
Version:	33
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Vitezslav Crhonek
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	F33FTBFS
TreeView+	depends on / blocked

Reported:	2020-08-03 18:00 UTC by Fedora Release Engineering
Modified:	2020-10-13 09:47 UTC (History)
CC List:	4 users (show)
Fixed In Version:	m4-1.4.18-16.fc34
Clone Of:
Environment:
Last Closed:	2020-10-13 09:47:44 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
build.log (32.00 KB, text/plain) 2020-08-03 18:00 UTC, Fedora Release Engineering	no flags	Details
root.log (32.00 KB, text/plain) 2020-08-03 18:00 UTC, Fedora Release Engineering	no flags	Details
state.log (945 bytes, text/plain) 2020-08-03 18:00 UTC, Fedora Release Engineering	no flags	Details
test-float.i (105.05 KB, text/plain) 2020-08-11 03:11 UTC, Carlos O'Donell	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
GNU Compiler Collection	95450	0	P3	RESOLVED	[10/11 regression] Wrong long double folding	2020-10-13 09:19:32 UTC

Internal Links: 1863737

Description Fedora Release Engineering 2020-08-03 18:00:23 UTC

m4 failed to build from source in Fedora rawhide/f33

https://koji.fedoraproject.org/koji/taskinfo?taskID=47996902


For details on the mass rebuild see:

https://fedoraproject.org/wiki/Fedora_33_Mass_Rebuild
Please fix m4 at your earliest convenience and set the bug's status to
ASSIGNED when you start fixing it. If the bug remains in NEW state for 8 weeks,
m4 will be orphaned. Before branching of Fedora 34,
m4 will be retired, if it still fails to build.

For more details on the FTBFS policy, please visit:
https://fedoraproject.org/wiki/Fails_to_build_from_source

Comment 1 Fedora Release Engineering 2020-08-03 18:00:25 UTC

Created attachment 1705778 [details]
build.log

file build.log too big, will only attach last 32768 bytes

Comment 2 Fedora Release Engineering 2020-08-03 18:00:26 UTC

Created attachment 1705779 [details]
root.log

file root.log too big, will only attach last 32768 bytes

Comment 3 Fedora Release Engineering 2020-08-03 18:00:27 UTC

Created attachment 1705780 [details]
state.log

Comment 4 Vitezslav Crhonek 2020-08-04 07:43:23 UTC

make check fails on ppc64le:

../build-aux/test-driver: line 107: 3320412 Aborted                 (core dumped) "$@" > $log_file 2>&1
FAIL: test-float

Comment 5 Vitezslav Crhonek 2020-08-04 09:10:57 UTC

test-float.c:318: assertion 'm + m > m' failed

Program received signal SIGABRT, Aborted.
0x00007ffff7d88f04 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7d88f04 in raise () from /lib64/libc.so.6
#1  0x00007ffff7d69868 in abort () from /lib64/libc.so.6
#2  0x0000000100000e20 in test_float () at test-float.c:165
#3  main () at test-float.c:359

Comment 6 Vitezslav Crhonek 2020-08-06 13:53:27 UTC

Workarounded by disabling %check on ppc64le for now.

Comment 7 Carlos O'Donell 2020-08-10 21:21:24 UTC

Reopening. This is a bug in gnulib's detection of a working float.h. You're going to need to update this.

When run on POWER9 hardware I see the following:

cat test-float.log
test-float.c:318: assertion 'm + m > m' failed
FAIL test-float (exit status: 134)

Adding instrumentation I see the following:

LDBL_MAX = inf
m = inf
test-float.c:320: assertion 'm + m > m' failed
Aborted (core dumped)

This can't be right and looks like a compiler issue.

We should not have ended up with LDBL_MAX being equal to INF to start with.

Unfortunately I can't reduce this, the appropriate extracted code works as intended.

Leading up to the assert:

=> 0x0000000100000c8c <+876>:	addis   r9,r2,-2
   0x0000000100000c90 <+880>:	addi    r9,r9,-25152
   0x0000000100000c94 <+884>:	lfd     f0,0(r9)
   0x0000000100000c98 <+888>:	lfd     f1,8(r9)
   0x0000000100000c9c <+892>:	stfd    f0,304(r1)
   0x0000000100000ca0 <+896>:	stfd    f1,312(r1)
   0x0000000100000ca8 <+904>:	lfd     f1,304(r1)
   0x0000000100000cac <+908>:	lfd     f2,312(r1)
   0x0000000100000cb0 <+912>:	lfd     f3,304(r1)
   0x0000000100000cb4 <+916>:	lfd     f4,312(r1)

Parameters should be f1-f4.

$f1 == inf
$f2 == 0

$f3 == inf
$f4 == 0

So we are about to do "m + m" and the value of m is already wrong.

   0x0000000100000cb8 <+920>:	bl      0x100001758 <__gcc_qadd+8>

Do the add.

   0x0000000100000cbc <+924>:	nop
   0x0000000100000cc0 <+928>:	lfd     f0,304(r1)

Reload half of m.

   0x0000000100000cc4 <+932>:	fmr     f12,f1
   0x0000000100000cc8 <+936>:	fmr     f13,f2

Move result from f1/f2 to f12/f13.

   0x0000000100000ccc <+940>:	lfd     f1,312(r1)

Reload other half of m.

=> 0x0000000100000cd0 <+944>:	fcmpu   cr0,f12,f0

Compare INF to INF and the assert (INF + INF > INF) fails.

What is odd is that 304/312 + r1 is stored to by this earlier sequence (you see it in the original disassembly):

=> 0x0000000100000c8c <+876>:	addis   r9,r2,-2
   0x0000000100000c90 <+880>:	addi    r9,r9,-25152
   0x0000000100000c94 <+884>:	lfd     f0,0(r9)
   0x0000000100000c98 <+888>:	lfd     f1,8(r9)

Address 0+$r9 is 0x100001cc0 and it's here:

100000000-100010000 r-xp 00000000 fd:00 1774810                          /root/rpmbuild/BUILD/m4-1.4.18/tests/test-float
100010000-100020000 r--p 00000000 fd:00 1774810                          /root/rpmbuild/BUILD/m4-1.4.18/tests/test-float
100020000-100030000 rw-p 00010000 fd:00 1774810                          /root/rpmbuild/BUILD/m4-1.4.18/tests/test-float

That value is in the executable image, probably a constant pool.

It's odd that we'd load INF from a constant pool that should contain LDBL_MAX?

The pre-processed source is more interesting:

  {
    volatile long double m =
# 315 "test-float.c" 3
                            (gl_LDBL_MAX.ld)
# 315 "test-float.c"
                                    ;
    int n;

    do { if (!(m + m > m)) { fprintf (
# 318 "test-float.c" 3 4
   stderr
# 318 "test-float.c"
   , "%s:%d: assertion '%s' failed\n", "test-float.c", 318, "m + m > m");
# 318 "test-float.c" 3
   rpl_fflush
# 318 "test-float.c"
   (
# 318 "test-float.c" 3 4
   stderr
# 318 "test-float.c"
   ); abort (); } } while (0);

It looks like we're triggering the generation and inclusion of lib/float.h, and that doesn't work.

149 union gl_long_double_union
150   {
151     struct { double hi; double lo; } dd;
152     long double ld;
153   };
154 extern const union gl_long_double_union gl_LDBL_MAX;
155 # define LDBL_MAX (gl_LDBL_MAX.ld)

 24 const union gl_long_double_union gl_LDBL_MAX =
 25   { { DBL_MAX, DBL_MAX / (double)134217728UL / (double)134217728UL } };

Eventually this loaded value is invalid.

This is either a compiler problem or a problem in the gnulib float.h headers.

Comment 8 Carlos O'Donell 2020-08-11 03:11:39 UTC

Created attachment 1711030 [details]
test-float.i

Attaching pre-processed test-float.i

Comment 9 Carlos O'Donell 2020-08-11 03:52:50 UTC

Removing float.h from inclusion reveals the next problem.

test-float.c:324: assertion 'x + x == x' failed
Aborted (core dumped)

This is what I was expecting given my review of the code.

#include <stdio.h>
#include <assert.h>
#include <float.h>
#include <math.h>

int
main (void)
{
  int n = 107;
  volatile long double m = LDBL_MAX;
  volatile long double pow2_n = powl (2, n);
  volatile long double x = m + (m / pow2_n);

  printf ("n = %d\n", n);
  printf ("m = %Lf (%La)\n", m, m);
  printf ("pow2_n = %Lf (%La)\n", pow2_n, pow2_n);
  printf ("m / pow2_n = %Lf (%La)\n", (m / pow2_n), (m / pow2_n));
  printf ("x = %Lf (%La)\n", x, x);

  if (x > m)
    assert (x + x == x);
  return 0;
}

gcc -o ~/test-ldbl-max ~/test-ldbl-max.c -lm

~/test-ldbl-max
n = 107
m = 179769313486231580793728971405301199252069012264752390332004544495176179865349768338004270583473493681874097135387894924752516923758125018237039690323659469736010689648748751591634331824498526377862231967249520608291850653495428451067676993116107021027413767397958053860876625383538022115414866471826801819648.000000 (0x1.fffffffffffff7ffffffffffff8p+1023)
pow2_n = 162259276829213363391578010288128.000000 (0x1p+107)
m / pow2_n = 1107913932560222581216724223049124694376931327937918798971295069363205703164244740389102844506567402654244799528342026118673562844811584683014545030137100678976901567468093855075985516353544747282849589098225960074532039651619564827101237983225846137075291097947344654582153216.000000 (0x1.fffffffffffff7ffffffffffff8p+916)
x = 179769313486231580793728971405301199252069012264752390332004544495176179865349768338004270583473493681874097135387894924752516923758125018237039690323659469736010689648748751591634331824498526377862231967249520608291850653495428451067676993116107021027413767397958053860876625383538022115414866471826801819648.000000 (0x1.fffffffffffff7ffffffffffffcp+1023)
test-ldbl-max: /root/test-ldbl-max.c:21: main: Assertion `x + x == x' failed.
Aborted (core dumped)

There is a representable value that is in theory larger than LDBL_MAX and so we assert.

Note that x > m, because 0x1.fffffffffffff7ffffffffffffcp+1023 > 0x1.fffffffffffff7ffffffffffff8p+1023, but x + x most certainly INF not x.

Is this a problem with __LDBL_MAX__ as defined by the compiler?

Comment 10 Carlos O'Donell 2020-08-11 03:54:01 UTC

Marek, What do you make of the test case in comment #9?

Comment 11 Ben Cotton 2020-08-11 14:16:32 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle.
Changing version to 33.

Comment 12 Marek Polacek 2020-08-11 16:06:30 UTC

(In reply to Carlos O'Donell from comment #10)
> Marek, What do you make of the test case in comment #9?

Looks like there indeed is a bug in GCC: https://gcc.gnu.org/PR95450.  It hasn't been fixed yet.  I'll try to bisect it.

Comment 13 Vitezslav Crhonek 2020-08-18 10:30:53 UTC

Thank you very much for your investigation of the issue. I'll remove the workaround when the bug is fixed in GCC.

Comment 14 Vitezslav Crhonek 2020-10-13 09:47:44 UTC

The bug in GCC has been fixed, workaround is no longer needed.

Note You need to log in before you can comment on or make changes to this bug.