Bug 2045367 - fasttrack: AutoLevelTest.AutoLevel fails on ppc64le since GCC 12 and/or “long double” change
Summary: fasttrack: AutoLevelTest.AutoLevel fails on ppc64le since GCC 12 and/or “long...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: fasttrack
Version: 36
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ben Beasley
QA Contact:
URL:
Whiteboard:
Depends On: 2048723
Blocks: PPCTracker 1649936
TreeView+ depends on / blocked
 
Reported: 2022-01-25 16:31 UTC by Fedora Release Engineering
Modified: 2022-02-28 13:54 UTC (History)
3 users (show)

Fixed In Version: fasttrack-6.1.1-4.fc36
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-02-28 13:54:33 UTC
Type: ---


Attachments (Terms of Use)
build.log (32.00 KB, text/plain)
2022-01-25 16:31 UTC, Fedora Release Engineering
no flags Details
root.log (32.00 KB, text/plain)
2022-01-25 16:31 UTC, Fedora Release Engineering
no flags Details
state.log (978 bytes, text/plain)
2022-01-25 16:31 UTC, Fedora Release Engineering
no flags Details
reduced test (1.67 KB, text/x-csrc)
2022-02-15 11:41 UTC, Dan Horák
no flags Details

Description Fedora Release Engineering 2022-01-25 16:31:24 UTC
fasttrack failed to build from source in Fedora rawhide/f36

https://koji.fedoraproject.org/koji/taskinfo?taskID=81771440


For details on the mass rebuild see:

https://fedoraproject.org/wiki/Fedora_36_Mass_Rebuild
Please fix fasttrack at your earliest convenience and set the bug's status to
ASSIGNED when you start fixing it. If the bug remains in NEW state for 8 weeks,
fasttrack will be orphaned. Before branching of Fedora 37,
fasttrack will be retired, if it still fails to build.

For more details on the FTBFS policy, please visit:
https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/

Comment 1 Fedora Release Engineering 2022-01-25 16:31:28 UTC
Created attachment 1854103 [details]
build.log

file build.log too big, will only attach last 32768 bytes

Comment 2 Fedora Release Engineering 2022-01-25 16:31:31 UTC
Created attachment 1854104 [details]
root.log

file root.log too big, will only attach last 32768 bytes

Comment 3 Fedora Release Engineering 2022-01-25 16:31:34 UTC
Created attachment 1854105 [details]
state.log

Comment 4 Ben Beasley 2022-02-01 01:30:58 UTC
Right now the chief problem is that qt6-qtsvg is FTI.

-----

There was previously a test failure in AutoLeveTest.AutoLevel on ppc64le, which is probably not resolved since it was appearing in Koschei as recently as 2022-01-31[1]. It worked fine with GCC 12.0.1-0.3.fc36, but started failing after a number of dependencies were rebuilt with that GCC in the mass rebuild. It’s not immediately clear what the problem is. The values are wildly wrong.

> [----------] 2 tests from AutoLevelTest
> [ RUN      ] AutoLevelTest.AutoLevel
> [ INFO:0@16.139] global /builddir/build/BUILD/opencv-4.5.5/modules/videoio/src/videoio_registry.cpp (223) VideoBackendRegistry VIDEOIO: Enabled backends(8, sorted by priority): FFMPEG(1000); GSTREAMER(990); INTEL_MFX(980); V4L2(970); CV_IMAGES(960); CV_MJPEG(950); FIREWIRE(940); UEYE(930)
> [ INFO:0@16.139] global /builddir/build/BUILD/opencv-4.5.5/modules/videoio/src/backend_plugin.cpp (369) getPluginCandidates VideoIO plugin (FFMPEG): glob is 'libopencv_videoio_ffmpeg*.so', 1 location(s)
> [ INFO:0@16.147] global /builddir/build/BUILD/opencv-4.5.5/modules/videoio/src/backend_plugin.cpp (379) getPluginCandidates     - /lib64: 0
> [ INFO:0@16.147] global /builddir/build/BUILD/opencv-4.5.5/modules/videoio/src/backend_plugin.cpp (383) getPluginCandidates Found 0 plugin(s) for FFMPEG
> [ INFO:0@16.398] global /builddir/build/BUILD/opencv-4.5.5/modules/videoio/src/cap_gstreamer.cpp (1104) open OpenCV | GStreamer: ../dataSet/images/frame_%06d.pgm
> [ WARN:0@16.398] global /builddir/build/BUILD/opencv-4.5.5/modules/videoio/src/cap_gstreamer.cpp (1127) open OpenCV | GStreamer warning: Error opening bin: syntax error
> [ WARN:0@16.399] global /builddir/build/BUILD/opencv-4.5.5/modules/videoio/src/cap_gstreamer.cpp (862) isPipelinePlaying OpenCV | GStreamer warning: GStreamer: pipeline have not been created
> [ INFO:0@16.399] global /builddir/build/BUILD/opencv-4.5.5/modules/videoio/src/backend_plugin.cpp (369) getPluginCandidates VideoIO plugin (INTEL_MFX): glob is 'libopencv_videoio_intel_mfx*.so', 1 location(s)
> [ INFO:0@16.407] global /builddir/build/BUILD/opencv-4.5.5/modules/videoio/src/backend_plugin.cpp (379) getPluginCandidates     - /lib64: 0
> [ INFO:0@16.407] global /builddir/build/BUILD/opencv-4.5.5/modules/videoio/src/backend_plugin.cpp (383) getPluginCandidates Found 0 plugin(s) for INTEL_MFX
> [ INFO:0@16.413] global /builddir/build/BUILD/opencv-4.5.5/modules/videoio/src/cap_gstreamer.cpp (1104) open OpenCV | GStreamer: ../dataSet/images/frame_%06d.pgm
> [ WARN:0@16.413] global /builddir/build/BUILD/opencv-4.5.5/modules/videoio/src/cap_gstreamer.cpp (1127) open OpenCV | GStreamer warning: Error opening bin: syntax error
> [ WARN:0@16.414] global /builddir/build/BUILD/opencv-4.5.5/modules/videoio/src/cap_gstreamer.cpp (862) isPipelinePlaying OpenCV | GStreamer warning: GStreamer: pipeline have not been created
> trackingTest.cpp:711: Failure
> Expected equality of these values:
>   std::lround(test.value("normArea"))
>     Which is: -9223372036854775808
>   10
> trackingTest.cpp:712: Failure
> Expected equality of these values:
>   std::lround(test.value("normPerim"))
>     Which is: -9223372036854775808
>   8
> trackingTest.cpp:713: Failure
> Expected equality of these values:
>   std::lround(test.value("normAngle"))
>     Which is: -9223372036854775808
>   20
> trackingTest.cpp:714: Failure
> Expected equality of these values:
>   std::lround(test.value("normDist"))
>     Which is: -9223372036854775808
>   2
> [  FAILED  ] AutoLevelTest.AutoLevel (283 ms)

[1] https://koschei.fedoraproject.org/package/fasttrack

Comment 5 Dan Horák 2022-02-01 08:52:34 UTC
I believe it is related to the "long double" change, where the argument passed to std::lround() will be already incorrect ...

Comment 6 Ben Beasley 2022-02-03 20:42:32 UTC
(In reply to Dan Horák from comment #5)
> I believe it is related to the "long double" change, where the argument
> passed to std::lround() will be already incorrect ...

I’ll take another look at it after https://bugzilla.redhat.com/show_bug.cgi?id=2048723 is fixed in qt6-qtsvg. (I just sent a PR.)

Still, I don’t see anything in AutoLevel::level() from src/autolevel.cpp or in TEST_F(AutoLevelTest, AutoLevel) from Test/trackingTest.cpp doing anything conspicuously non-portable with floating-point data. I don’t even see explicit use of “long double” anywhere (std::lround takes double and returns long int).

Then again, since the problem appeared after dependencies were rebuilt, the problem is probably somewhere in the dependency tree rather than in FastTrack. I can always deploy ExcludeArch, which won’t be too disruptive since this is an application and a leaf package, but it would be nice to find the real problem since other packages are likely affected too.

Comment 7 Ben Beasley 2022-02-03 20:47:37 UTC
Link to the first Koschei build that failed on ppc64le: https://koschei.fedoraproject.org/build/11971178

Comment 8 Ben Beasley 2022-02-04 13:25:57 UTC
Now we’re back to just the ppc64le failure: https://koji.fedoraproject.org/koji/taskinfo?taskID=82371952

For now, I’m going to change the bug title to reference the ppc64le failure, and proceed by adding “ExcludeArch: ppc64le”, since the problem is likely in a dependency and there’s not much to go on in debugging it. I’d still like to figure it out eventually.

Comment 9 Fedora Update System 2022-02-04 13:40:39 UTC
FEDORA-2022-01eb9a8715 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-01eb9a8715

Comment 10 Fedora Update System 2022-02-04 13:43:32 UTC
FEDORA-2022-01eb9a8715 has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 11 Ben Beasley 2022-02-04 13:47:47 UTC
I did a successful build with “ExcludeArch: ppc64le”. It was only supposed to reference this issue, not close it.

Reopening, and unblocking everything except PPCTracker.

Comment 12 Ben Cotton 2022-02-08 20:15:37 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 36 development cycle.
Changing version to 36.

Comment 13 Dan Horák 2022-02-11 17:31:42 UTC
we might be hitting similar issue as in bug 2045186, something wrong in libstdc++ with regard to the new "long double" ABI

Comment 14 Dan Horák 2022-02-15 11:41:12 UTC
Created attachment 1861212 [details]
reduced test

a couple notes
- seems only the "double" type is being used, so perhaps this is not related to the "long double" ABI change
- the problem seems to be the computations (computeStdArea(), ...) used in AutoLevel::level() (in src/autolevel.cpp) return NaN for the input data

Comment 15 Ben Beasley 2022-02-15 16:41:45 UTC
(In reply to Dan Horák from comment #14)
> Created attachment 1861212 [details]
> reduced test […]

Thanks. I had isolated that calculation, too, but I hadn’t tracked down that it was returning NaN.

I agree that if were a “long double” problem then it would have to be buried inside a dependency behind a “double” API. I can also exclude the most obvious candidate, OpenCV, as it uses “long double” only in cv::text::Minibox and cv::text::HCluster. Neither class is used transitively by others, except cv::text::MaxMeaningfulClustering, and none of those three classes is used in FastTrack.

I’m going to try again after the GCC and OpenCV updates that are currently building. If the various bug fixes in those builds don’t resolve this, then I guess I’ll plan to fire up a Rawhide VM and step through this test case in a debugger side-by-side with Fedora 35. It will take me a little while to get around to that, though.

Comment 16 Dan Horák 2022-02-15 17:07:41 UTC
and it looks to me the condition bellow is querying a non-existent parameter (normParam instead of normPerim)

diff -up FastTrack-6.2.1/src/autolevel.cpp.orig FastTrack-6.2.1/src/autolevel.cpp
--- FastTrack-6.2.1/src/autolevel.cpp.orig	2022-02-15 17:04:20.437532458 +0000
+++ FastTrack-6.2.1/src/autolevel.cpp	2022-02-15 11:38:07.723863179 +0000
@@ -82,7 +82,7 @@ QMap<QString, double> AutoLevel::level()
       m_spotSuffix = "Body";
     }
     int counter = 0;
-    while (abs(stdAngle - m_parameters.value("normAngle").toDouble()) > 1E-3 && abs(stdDist - m_parameters.value("normDist").toDouble()) > 1E-3 && abs(stdArea - m_parameters.value("normArea").toDouble()) > 1E-3 && abs(stdPerimeter - m_parameters.value("normParam").toDouble()) > 1E-3) {
+    while (abs(stdAngle - m_parameters.value("normAngle").toDouble()) > 1E-3 && abs(stdDist - m_parameters.value("normDist").toDouble()) > 1E-3 && abs(stdArea - m_parameters.value("normArea").toDouble()) > 1E-3 && abs(stdPerimeter - m_parameters.value("normPerim").toDouble()) > 1E-3) {
       m_parameters.insert("normAngle", QString::number(stdAngle));
       m_parameters.insert("normDist", QString::number(stdDist));
       m_parameters.insert("normArea", QString::number(stdArea));

Comment 17 Ben Beasley 2022-02-15 18:18:27 UTC
Good catch! That’s certainly a FastTrack bug, and I’ll submit a PR for it.

m_parameters is a QMap<QString, QString>, so m_parameters.value()[1] returns a default-constructed QString value[2], which is, in Qt semantics, both null and empty. According to the documentation[3], calling the toDouble() method on such a string should return 0.0. If we’re getting NaN, then that is probably a qt6-qtbase bug that could be reproduced in a much simpler test case.

Still, correcting the key name will probably make the Qt bug irrelevant to FastTrack. I’m doing a scratch build now to verify that the change this will fix this bug.

Thanks for your help in investigating this.

[1] https://doc.qt.io/qt-5/qmap.html#value
[2] https://doc.qt.io/qt-5/qstring.html#QString
[3] https://doc.qt.io/qt-5/qstring.html#toDouble

Comment 18 Ben Beasley 2022-02-15 18:31:15 UTC
On closer inspection, normPerim is not in the map the first time through the loop, so the code is relying on accessing a non-existent key returning 0.0. The typo fix (while correct and necessary) is therefore not, by itself, a fix for this bug.

Comment 19 Ben Beasley 2022-02-15 19:06:53 UTC
I’ve sent a PR upstream[1] for the normPerim/normParam typo and applied the corresponding patch in dist-git.

I’ll follow up by filing a bug on qt6-qtbse and blocking PPCTracker and bug 2050761.

[1] https://github.com/FastTrackOrg/FastTrack/pull/39

Comment 20 Ben Beasley 2022-02-15 19:14:03 UTC
Hmm. I crafted this test case:

> g++ -I/usr/include/qt6 -lQt6Core -o demo demo.cpp

demo.cpp:

> #include <iostream>
> #include <QtCore/QString>
> 
> int main(int, char *[]) {
> 	std::cout << QString().toDouble() << std::endl;
> 	return 0;
> }

but it prints 0, not NaN, in an emulated ppc64le mock chroot.

Comment 21 Dan Horák 2022-02-15 19:24:47 UTC
I have not checked deeper, but it's eg. https://github.com/FastTrackOrg/FastTrack/blob/master/src/autolevel.cpp#L142 what returns the NaN, seen with debugger in the stdAngle variable in https://github.com/FastTrackOrg/FastTrack/blob/master/src/autolevel.cpp#L66

Comment 22 Ben Beasley 2022-02-16 13:49:42 UTC
There is no change[1] with gcc-12.0.1-0.8, so it looks like I will be studying this in the debugger when I have a chance.

[1] https://koji.fedoraproject.org/koji/taskinfo?taskID=82889879

Comment 23 Ben Beasley 2022-02-28 13:54:33 UTC
I have retired this package in F36 and later. It’s increasingly clear that ffmpeg-enabled OpenCV should be considered a hard requirement, but OpenCV in Fedora has no plans to build with ffmpeg-free.

https://github.com/FastTrackOrg/FastTrack/issues/43
https://bugzilla.redhat.com/show_bug.cgi?id=2058684
https://pagure.io/neuro-sig/NeuroFedora/issue/423


Note You need to log in before you can comment on or make changes to this bug.