Since the latest update to OpenBLAS 0.3.28 in rawhide, FlexiBLAS fails to build in aarch64 because OpenBLAS crashes in the LAPACK-xeigtstc_cec_in test. Note that OpenBLAS itself does not fail only because they don't include LAPACK test suite. See: - The first failure in Koschei after the 0.3.28 update: https://koschei.fedoraproject.org/package/flexiblas - The build log: https://koji.fedoraproject.org/koji/taskinfo?taskID=125998498 Reproducible: Always -- Set TEST BLAS to /builddir/build/BUILD/flexiblas-3.4.4-build/BUILDROOT/usr/lib64/flexiblas/libflexiblas_openblas-openmp.so Running: /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc ARGS= OUTPUT_FILE;/builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/build/test/lapack-3.12.0/cec.out;ERROR_FILE;/builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/build/test/lapack-3.12.0/cec.out.err;INPUT_FILE;/builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/test/lapack-3.12.0/cec.in Test OUTPUT: Test ERROR: corrupted size vs. prev_size Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0xffffb6a8d157 in ??? #1 0xffffb6a8c03f in ??? #2 0xffffb70c183f in ??? #3 0xffffb692e420 in ??? #4 0xffffb68db23f in ??? #5 0xffffb68c5a97 in ??? #6 0xffffb6920da3 in ??? #7 0xffffb6939947 in ??? #8 0xffffb693a347 in ??? #9 0xffffb693a59f in ??? #10 0xffffb693bbab in ??? #11 0xffffb693bde3 in ??? #12 0xffffb693e90f in ??? #13 0xaaaac3a05673 in csyl01_ at /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/test/lapack-3.12.0/EIG/csyl01.f:308 #14 0xaaaac3a0980b in cchkec_ at /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/test/lapack-3.12.0/EIG/cchkec.f:129 #15 0xaaaac3a14e27 in cchkee at /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/test/lapack-3.12.0/EIG/cchkee.F:1271 #16 0xaaaac3a043e7 in main at /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/test/lapack-3.12.0/EIG/cchkee.F:2553 CMake Error at /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/test/lapack-3.12.0/runtest.cmake:51 (message): Test /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc returned Subprocess aborted
*** Bug 2330586 has been marked as a duplicate of this bug. ***
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.
Any progress on this matter?
Well, I tried building it in mock on aarch64-test01.fedorainfracloud.org to try to get some better debug information - but it built fine there. So something strange is going on.
Tried in Copr and builds fine too, but Koschei reports consistent failures in Koji. Not sure what to do next. Any suggestion?
I have attempted to collect some more debug info via the following - https://src.fedoraproject.org/fork/orion/rpms/flexiblas/tree/debug But the valgrind run just seems to hang with no output from valgrind - https://kojipkgs.fedoraproject.org//work/tasks/3875/127513875/build.log Tests of the Nonsymmetric eigenproblem condition estimation routines CTRSYL, CTREXC, CTRSNA, CTRSEN Relative machine precision (EPS) = 0.119209E-06 Safe minimum (SFMIN) = 0.117549E-37 Routines pass computational tests if test ratio is less than 20.00 CEC routines passed the tests of the error exits ( 41 tests done) And the crash seems to occur after memory corruption has already occurred so seems to be of limited utility. So I'm at a loss myself.
Waiting longer and removing some earlier tests gives: ==44481== Thread 10: ==44481== Invalid read of size 4 ==44481== at 0x6182DC4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid read of size 4 ==44481== at 0x6182DCC: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e8 is 8 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid write of size 4 ==44481== at 0x6182DF4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid write of size 4 ==44481== at 0x6182DF8: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e8 is 8 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid read of size 4 ==44481== at 0x6182DC4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid read of size 4 ==44481== at 0x6182DCC: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e8 is 8 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid write of size 4 ==44481== at 0x6182DF4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid write of size 4 ==44481== at 0x6182DF8: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e8 is 8 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481==
FWIW I have reproduced the issue on our bare-metal Ampere MtSnow system (80 cpus) doing a rawhide mock build for flexiblas.
It seems that upstream is on the right path. I've limited the concurrence for FlexiBLAS testing to a maximum of 10 threads to avoid this crash for now. And it would be great to enable the LAPACK test suite in OpenBLAS, to detect this kind of issue earlier.
This bug appears to have been reported against 'rawhide' during the Fedora Linux 42 development cycle. Changing version to 42.