Bug 2136459 - test failure in rawhide/s390x - float does not match!
Summary: test failure in rawhide/s390x - float does not match!
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: libreoffice
Version: rawhide
Hardware: s390x
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Caolan McNamara
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ZedoraTracker
TreeView+ depends on / blocked
 
Reported: 2022-10-20 10:41 UTC by Dan Horák
Modified: 2022-12-06 15:35 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-06 15:35:54 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Document Foundation 125978 0 None None None 2022-11-22 12:08:05 UTC

Description Dan Horák 2022-10-20 10:41:56 UTC
Description of problem:
Looks like there is a real build issue with the latest libreoffice package, there is an actual test failure breaking the build. The other problem is increased memory requirements ...

from my local rebuild in mock
...
[build CXX] sal/qa/osl/socket.cxx
/builddir/build/BUILD/libreoffice-7.4.2.3/sal/qa/osl/setthreadname/test-setthreadname.cxx: warning: -D_FORTIFY_SOURCE not defined
/builddir/build/BUILD/libreoffice-7.4.2.3/sal/qa/osl/socket.cxx: warning: -D_FORTIFY_SOURCE not defined
### float does not match! failed
struct comparison test failed
[build LNK] Library/libsaxlo.so
[build CXX] sax/source/expatwrap/sax_expat.cxx
[build CXX] sax/source/expatwrap/saxwriter.cxx
### float does not match! failed
recursive test results failed
/builddir/build/BUILD/libreoffice-7.4.2.3/sax/source/expatwrap/sax_expat.cxx: warning: -D_FORTIFY_SOURCE not defined
standard test failed
exception occurred: error: test failed! at /builddir/build/BUILD/libreoffice-7.4.2.3/testtools/source/bridgetest/bridgetest.cxx:1268

> error: error: test failed! at /builddir/build/BUILD/libreoffice-7.4.2.3/testtools/source/bridgetest/bridgetest.cxx:1268
> dying...make[1]: *** [/builddir/build/BUILD/libreoffice-7.4.2.3/testtools/CustomTarget_uno_test.mk:25: /builddir/build/BUILD/libreoffice-7.4.2.3/workdir/CustomTarget/testtools/uno_test.done] Error 1
make[1]: *** Waiting for unfinished jobs....

Same issue was seen in recent build attempt in koji, need to find the link ...

Version-Release number of selected component (if applicable):
libreoffice-7.4.2.3-1.fc38

How reproducible:
100%

Comment 1 Dan Horák 2022-10-20 10:48:52 UTC
https://koji.fedoraproject.org/koji/taskinfo?taskID=93214547 has the test failure

Comment 2 Dan Horák 2022-10-20 10:49:49 UTC
s/the test failure/the same test failure/

Comment 3 Dan Horák 2022-10-20 11:13:32 UTC
And regarding the memory consumption, it seems to require ~7GB per cpu in the %check phase. Build gets killed due OOM with -j6 and 32GB RAM + 8GB swap. The above test failure was (re-)produced with "-j4".

Comment 4 Caolan McNamara 2022-10-20 11:39:30 UTC
"### float does not match! failed" has appeared intermittently in the past.

While the F38 build is failing, the F37 one with the same source passed https://koji.fedoraproject.org/koji/buildinfo?buildID=2074985
its entirely possible we have float/double passing done wrong for s390x and that the passes are arbitrary luck

Comment 5 Dan Horák 2022-10-20 11:46:20 UTC
Let me try if I can reproduce it (more) consistently under rawhide. The OOM killed builds might have been getting over this test and fail much later ...

Comment 6 Stephan Bergmann 2022-10-20 11:55:26 UTC
When I last looked into this well-known-on-s390x sporadic failure in September 2020, I inconclusively noted that it "looks more like a heisenbug related to floating-point behavior".  (I.e., it passes some hardcoded floating-point value around and then compares with ==.  IIRC, the values were always printing identically in a debugger when I tried to debug that, but still == occasionally failed.  So it wasn't like the values were wildly off and thus clearly indicating an actual bug somewhere in the LibreOffice code.)  The back-then disabled-for-s390x `make unitcheck slowcheck` has since been enabled with <https://src.fedoraproject.org/rpms/libreoffice/c/5be3141a5b44a2d2fc236a676ec2e8325a7a0036> "renable check for s390x".  If this particular sporadic failure hits frequently enough, we might want to disable that one test for s390x for the time being?

Comment 7 Dan Horák 2022-10-26 14:56:35 UTC
I can confirm that test failure is intermittent as was able to successfully build LO locally.

What remains is the (much) increased memory consumption during the build (in %check to be precise, isn't it due https://fedoraproject.org/wiki/Changes/SetBuildFlagsBuildCheck ?). It used --with-parallelism=2 on system with 32 GB + 8 GB memory ...

Comment 8 Caolan McNamara 2022-11-16 08:54:04 UTC
https://koji.fedoraproject.org/koji/taskinfo?taskID=94215888 built successfully

after the fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106355 became available, allowing me to drop the filtering of -O2 from the CFLAGS, which then may have reduced the mem needed to link, or coincidence, but two successful builds in a row.

Comment 9 Dan Horák 2022-11-16 09:09:28 UTC
Thanks for the update, I think they will be related.

Comment 10 Caolan McNamara 2022-12-06 15:35:54 UTC
I built libreoffice with:
fedpkg build --scratch --arches s390x
20 times in a row with a F38 target without failure. I'm not really convinced the bug is gone, but I can't trigger it.


Note You need to log in before you can comment on or make changes to this bug.