Bug 1864107

Summary:

m4: FTBFS in Fedora rawhide/f33

Product:

[Fedora] Fedora

Reporter:

Fedora Release Engineering <releng>

Component:

Assignee:

Vitezslav Crhonek <vcrhonek>

Status:

CLOSED RAWHIDE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

codonell, mpolacek, praiskup, vcrhonek

Target Milestone:

---

Keywords:

Reopened

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

m4-1.4.18-16.fc34

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-10-13 09:47:44 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1803234

Attachments:

Description	Flags
build.log	none
root.log	none
state.log	none
test-float.i	none

Description Fedora Release Engineering 2020-08-03 18:00:23 UTC

m4 failed to build from source in Fedora rawhide/f33

https://koji.fedoraproject.org/koji/taskinfo?taskID=47996902


For details on the mass rebuild see:

https://fedoraproject.org/wiki/Fedora_33_Mass_Rebuild
Please fix m4 at your earliest convenience and set the bug's status to
ASSIGNED when you start fixing it. If the bug remains in NEW state for 8 weeks,
m4 will be orphaned. Before branching of Fedora 34,
m4 will be retired, if it still fails to build.

For more details on the FTBFS policy, please visit:
https://fedoraproject.org/wiki/Fails_to_build_from_source

Comment 1 Fedora Release Engineering 2020-08-03 18:00:25 UTC

Created attachment 1705778 [details]
build.log

file build.log too big, will only attach last 32768 bytes

Comment 2 Fedora Release Engineering 2020-08-03 18:00:26 UTC

Created attachment 1705779 [details]
root.log

file root.log too big, will only attach last 32768 bytes

Comment 3 Fedora Release Engineering 2020-08-03 18:00:27 UTC

Created attachment 1705780 [details]
state.log

Comment 4 Vitezslav Crhonek 2020-08-04 07:43:23 UTC

make check fails on ppc64le:

../build-aux/test-driver: line 107: 3320412 Aborted                 (core dumped) "$@" > $log_file 2>&1
FAIL: test-float

Comment 5 Vitezslav Crhonek 2020-08-04 09:10:57 UTC

test-float.c:318: assertion 'm + m > m' failed

Program received signal SIGABRT, Aborted.
0x00007ffff7d88f04 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7d88f04 in raise () from /lib64/libc.so.6
#1  0x00007ffff7d69868 in abort () from /lib64/libc.so.6
#2  0x0000000100000e20 in test_float () at test-float.c:165
#3  main () at test-float.c:359

Comment 6 Vitezslav Crhonek 2020-08-06 13:53:27 UTC

Workarounded by disabling %check on ppc64le for now.

Comment 7 Carlos O'Donell 2020-08-10 21:21:24 UTC

Reopening. This is a bug in gnulib's detection of a working float.h. You're going to need to update this.

When run on POWER9 hardware I see the following:

cat test-float.log
test-float.c:318: assertion 'm + m > m' failed
FAIL test-float (exit status: 134)

Adding instrumentation I see the following:

LDBL_MAX = inf
m = inf
test-float.c:320: assertion 'm + m > m' failed
Aborted (core dumped)

This can't be right and looks like a compiler issue.

We should not have ended up with LDBL_MAX being equal to INF to start with.

Unfortunately I can't reduce this, the appropriate extracted code works as intended.

Leading up to the assert:

=> 0x0000000100000c8c <+876>:	addis   r9,r2,-2
   0x0000000100000c90 <+880>:	addi    r9,r9,-25152
   0x0000000100000c94 <+884>:	lfd     f0,0(r9)
   0x0000000100000c98 <+888>:	lfd     f1,8(r9)
   0x0000000100000c9c <+892>:	stfd    f0,304(r1)
   0x0000000100000ca0 <+896>:	stfd    f1,312(r1)
   0x0000000100000ca8 <+904>:	lfd     f1,304(r1)
   0x0000000100000cac <+908>:	lfd     f2,312(r1)
   0x0000000100000cb0 <+912>:	lfd     f3,304(r1)
   0x0000000100000cb4 <+916>:	lfd     f4,312(r1)

Parameters should be f1-f4.

$f1 == inf
$f2 == 0

$f3 == inf
$f4 == 0

So we are about to do "m + m" and the value of m is already wrong.

   0x0000000100000cb8 <+920>:	bl      0x100001758 <__gcc_qadd+8>

Do the add.

   0x0000000100000cbc <+924>:	nop
   0x0000000100000cc0 <+928>:	lfd     f0,304(r1)

Reload half of m.

   0x0000000100000cc4 <+932>:	fmr     f12,f1
   0x0000000100000cc8 <+936>:	fmr     f13,f2

Move result from f1/f2 to f12/f13.

   0x0000000100000ccc <+940>:	lfd     f1,312(r1)

Reload other half of m.

=> 0x0000000100000cd0 <+944>:	fcmpu   cr0,f12,f0

Compare INF to INF and the assert (INF + INF > INF) fails.

What is odd is that 304/312 + r1 is stored to by this earlier sequence (you see it in the original disassembly):

=> 0x0000000100000c8c <+876>:	addis   r9,r2,-2
   0x0000000100000c90 <+880>:	addi    r9,r9,-25152
   0x0000000100000c94 <+884>:	lfd     f0,0(r9)
   0x0000000100000c98 <+888>:	lfd     f1,8(r9)

Address 0+$r9 is 0x100001cc0 and it's here:

100000000-100010000 r-xp 00000000 fd:00 1774810                          /root/rpmbuild/BUILD/m4-1.4.18/tests/test-float
100010000-100020000 r--p 00000000 fd:00 1774810                          /root/rpmbuild/BUILD/m4-1.4.18/tests/test-float
100020000-100030000 rw-p 00010000 fd:00 1774810                          /root/rpmbuild/BUILD/m4-1.4.18/tests/test-float

That value is in the executable image, probably a constant pool.

It's odd that we'd load INF from a constant pool that should contain LDBL_MAX?

The pre-processed source is more interesting:

  {
    volatile long double m =
# 315 "test-float.c" 3
                            (gl_LDBL_MAX.ld)
# 315 "test-float.c"
                                    ;
    int n;

    do { if (!(m + m > m)) { fprintf (
# 318 "test-float.c" 3 4
   stderr
# 318 "test-float.c"
   , "%s:%d: assertion '%s' failed\n", "test-float.c", 318, "m + m > m");
# 318 "test-float.c" 3
   rpl_fflush
# 318 "test-float.c"
   (
# 318 "test-float.c" 3 4
   stderr
# 318 "test-float.c"
   ); abort (); } } while (0);

It looks like we're triggering the generation and inclusion of lib/float.h, and that doesn't work.

149 union gl_long_double_union
150   {
151     struct { double hi; double lo; } dd;
152     long double ld;
153   };
154 extern const union gl_long_double_union gl_LDBL_MAX;
155 # define LDBL_MAX (gl_LDBL_MAX.ld)

 24 const union gl_long_double_union gl_LDBL_MAX =
 25   { { DBL_MAX, DBL_MAX / (double)134217728UL / (double)134217728UL } };

Eventually this loaded value is invalid.

This is either a compiler problem or a problem in the gnulib float.h headers.

Comment 8 Carlos O'Donell 2020-08-11 03:11:39 UTC

Created attachment 1711030 [details]
test-float.i

Attaching pre-processed test-float.i

Comment 9 Carlos O'Donell 2020-08-11 03:52:50 UTC

Removing float.h from inclusion reveals the next problem.

test-float.c:324: assertion 'x + x == x' failed
Aborted (core dumped)

This is what I was expecting given my review of the code.

#include <stdio.h>
#include <assert.h>
#include <float.h>
#include <math.h>

int
main (void)
{
  int n = 107;
  volatile long double m = LDBL_MAX;
  volatile long double pow2_n = powl (2, n);
  volatile long double x = m + (m / pow2_n);

  printf ("n = %d\n", n);
  printf ("m = %Lf (%La)\n", m, m);
  printf ("pow2_n = %Lf (%La)\n", pow2_n, pow2_n);
  printf ("m / pow2_n = %Lf (%La)\n", (m / pow2_n), (m / pow2_n));
  printf ("x = %Lf (%La)\n", x, x);

  if (x > m)
    assert (x + x == x);
  return 0;
}

gcc -o ~/test-ldbl-max ~/test-ldbl-max.c -lm

~/test-ldbl-max
n = 107
m = 179769313486231580793728971405301199252069012264752390332004544495176179865349768338004270583473493681874097135387894924752516923758125018237039690323659469736010689648748751591634331824498526377862231967249520608291850653495428451067676993116107021027413767397958053860876625383538022115414866471826801819648.000000 (0x1.fffffffffffff7ffffffffffff8p+1023)
pow2_n = 162259276829213363391578010288128.000000 (0x1p+107)
m / pow2_n = 1107913932560222581216724223049124694376931327937918798971295069363205703164244740389102844506567402654244799528342026118673562844811584683014545030137100678976901567468093855075985516353544747282849589098225960074532039651619564827101237983225846137075291097947344654582153216.000000 (0x1.fffffffffffff7ffffffffffff8p+916)
x = 179769313486231580793728971405301199252069012264752390332004544495176179865349768338004270583473493681874097135387894924752516923758125018237039690323659469736010689648748751591634331824498526377862231967249520608291850653495428451067676993116107021027413767397958053860876625383538022115414866471826801819648.000000 (0x1.fffffffffffff7ffffffffffffcp+1023)
test-ldbl-max: /root/test-ldbl-max.c:21: main: Assertion `x + x == x' failed.
Aborted (core dumped)

There is a representable value that is in theory larger than LDBL_MAX and so we assert.

Note that x > m, because 0x1.fffffffffffff7ffffffffffffcp+1023 > 0x1.fffffffffffff7ffffffffffff8p+1023, but x + x most certainly INF not x.

Is this a problem with __LDBL_MAX__ as defined by the compiler?

Comment 10 Carlos O'Donell 2020-08-11 03:54:01 UTC

Marek, What do you make of the test case in comment #9?

Comment 11 Ben Cotton 2020-08-11 14:16:32 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle.
Changing version to 33.

Comment 12 Marek Polacek 2020-08-11 16:06:30 UTC

(In reply to Carlos O'Donell from comment #10)
> Marek, What do you make of the test case in comment #9?

Looks like there indeed is a bug in GCC: https://gcc.gnu.org/PR95450.  It hasn't been fixed yet.  I'll try to bisect it.

Comment 13 Vitezslav Crhonek 2020-08-18 10:30:53 UTC

Thank you very much for your investigation of the issue. I'll remove the workaround when the bug is fixed in GCC.

Comment 14 Vitezslav Crhonek 2020-10-13 09:47:44 UTC

The bug in GCC has been fixed, workaround is no longer needed.