Bug 2136459

Summary: test failure in rawhide/s390x - float does not match!
Product: [Fedora] Fedora Reporter: Dan Horák <dan>
Component: libreofficeAssignee: Caolan McNamara <caolanm>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: caolanm, dtardon, erack, sbergman, sgallagh
Target Milestone: ---   
Target Release: ---   
Hardware: s390x   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-12-06 15:35:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 467765    

Description Dan Horák 2022-10-20 10:41:56 UTC
Description of problem:
Looks like there is a real build issue with the latest libreoffice package, there is an actual test failure breaking the build. The other problem is increased memory requirements ...

from my local rebuild in mock
...
[build CXX] sal/qa/osl/socket.cxx
/builddir/build/BUILD/libreoffice-7.4.2.3/sal/qa/osl/setthreadname/test-setthreadname.cxx: warning: -D_FORTIFY_SOURCE not defined
/builddir/build/BUILD/libreoffice-7.4.2.3/sal/qa/osl/socket.cxx: warning: -D_FORTIFY_SOURCE not defined
### float does not match! failed
struct comparison test failed
[build LNK] Library/libsaxlo.so
[build CXX] sax/source/expatwrap/sax_expat.cxx
[build CXX] sax/source/expatwrap/saxwriter.cxx
### float does not match! failed
recursive test results failed
/builddir/build/BUILD/libreoffice-7.4.2.3/sax/source/expatwrap/sax_expat.cxx: warning: -D_FORTIFY_SOURCE not defined
standard test failed
exception occurred: error: test failed! at /builddir/build/BUILD/libreoffice-7.4.2.3/testtools/source/bridgetest/bridgetest.cxx:1268

> error: error: test failed! at /builddir/build/BUILD/libreoffice-7.4.2.3/testtools/source/bridgetest/bridgetest.cxx:1268
> dying...make[1]: *** [/builddir/build/BUILD/libreoffice-7.4.2.3/testtools/CustomTarget_uno_test.mk:25: /builddir/build/BUILD/libreoffice-7.4.2.3/workdir/CustomTarget/testtools/uno_test.done] Error 1
make[1]: *** Waiting for unfinished jobs....

Same issue was seen in recent build attempt in koji, need to find the link ...

Version-Release number of selected component (if applicable):
libreoffice-7.4.2.3-1.fc38

How reproducible:
100%

Comment 1 Dan Horák 2022-10-20 10:48:52 UTC
https://koji.fedoraproject.org/koji/taskinfo?taskID=93214547 has the test failure

Comment 2 Dan Horák 2022-10-20 10:49:49 UTC
s/the test failure/the same test failure/

Comment 3 Dan Horák 2022-10-20 11:13:32 UTC
And regarding the memory consumption, it seems to require ~7GB per cpu in the %check phase. Build gets killed due OOM with -j6 and 32GB RAM + 8GB swap. The above test failure was (re-)produced with "-j4".

Comment 4 Caolan McNamara 2022-10-20 11:39:30 UTC
"### float does not match! failed" has appeared intermittently in the past.

While the F38 build is failing, the F37 one with the same source passed https://koji.fedoraproject.org/koji/buildinfo?buildID=2074985
its entirely possible we have float/double passing done wrong for s390x and that the passes are arbitrary luck

Comment 5 Dan Horák 2022-10-20 11:46:20 UTC
Let me try if I can reproduce it (more) consistently under rawhide. The OOM killed builds might have been getting over this test and fail much later ...

Comment 6 Stephan Bergmann 2022-10-20 11:55:26 UTC
When I last looked into this well-known-on-s390x sporadic failure in September 2020, I inconclusively noted that it "looks more like a heisenbug related to floating-point behavior".  (I.e., it passes some hardcoded floating-point value around and then compares with ==.  IIRC, the values were always printing identically in a debugger when I tried to debug that, but still == occasionally failed.  So it wasn't like the values were wildly off and thus clearly indicating an actual bug somewhere in the LibreOffice code.)  The back-then disabled-for-s390x `make unitcheck slowcheck` has since been enabled with <https://src.fedoraproject.org/rpms/libreoffice/c/5be3141a5b44a2d2fc236a676ec2e8325a7a0036> "renable check for s390x".  If this particular sporadic failure hits frequently enough, we might want to disable that one test for s390x for the time being?

Comment 7 Dan Horák 2022-10-26 14:56:35 UTC
I can confirm that test failure is intermittent as was able to successfully build LO locally.

What remains is the (much) increased memory consumption during the build (in %check to be precise, isn't it due https://fedoraproject.org/wiki/Changes/SetBuildFlagsBuildCheck ?). It used --with-parallelism=2 on system with 32 GB + 8 GB memory ...

Comment 8 Caolan McNamara 2022-11-16 08:54:04 UTC
https://koji.fedoraproject.org/koji/taskinfo?taskID=94215888 built successfully

after the fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106355 became available, allowing me to drop the filtering of -O2 from the CFLAGS, which then may have reduced the mem needed to link, or coincidence, but two successful builds in a row.

Comment 9 Dan Horák 2022-11-16 09:09:28 UTC
Thanks for the update, I think they will be related.

Comment 10 Caolan McNamara 2022-12-06 15:35:54 UTC
I built libreoffice with:
fedpkg build --scratch --arches s390x
20 times in a row with a F38 target without failure. I'm not really convinced the bug is gone, but I can't trigger it.