Bug 2329491 - LAPACK-xeigtstc_cec_in test crash
Summary: LAPACK-xeigtstc_cec_in test crash
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: openblas
Version: 42
Hardware: aarch64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Ali Erdinc Koroglu
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2330586 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-11-29 14:19 UTC by Iñaki Ucar
Modified: 2025-02-26 13:18 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github OpenMathLib OpenBLAS issues 5050 0 None open LAPACK test failure with 3.28 on aarch64 2025-01-05 19:03:00 UTC

Description Iñaki Ucar 2024-11-29 14:19:57 UTC
Since the latest update to OpenBLAS 0.3.28 in rawhide, FlexiBLAS fails to build in aarch64 because OpenBLAS crashes in the LAPACK-xeigtstc_cec_in test. Note that OpenBLAS itself does not fail only because they don't include LAPACK test suite.

See:
- The first failure in Koschei after the 0.3.28 update: https://koschei.fedoraproject.org/package/flexiblas
- The build log: https://koji.fedoraproject.org/koji/taskinfo?taskID=125998498


Reproducible: Always




-- Set TEST BLAS to /builddir/build/BUILD/flexiblas-3.4.4-build/BUILDROOT/usr/lib64/flexiblas/libflexiblas_openblas-openmp.so
Running: /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc
ARGS= OUTPUT_FILE;/builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/build/test/lapack-3.12.0/cec.out;ERROR_FILE;/builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/build/test/lapack-3.12.0/cec.out.err;INPUT_FILE;/builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/test/lapack-3.12.0/cec.in
Test OUTPUT:
Test ERROR:
corrupted size vs. prev_size
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0  0xffffb6a8d157 in ???
#1  0xffffb6a8c03f in ???
#2  0xffffb70c183f in ???
#3  0xffffb692e420 in ???
#4  0xffffb68db23f in ???
#5  0xffffb68c5a97 in ???
#6  0xffffb6920da3 in ???
#7  0xffffb6939947 in ???
#8  0xffffb693a347 in ???
#9  0xffffb693a59f in ???
#10  0xffffb693bbab in ???
#11  0xffffb693bde3 in ???
#12  0xffffb693e90f in ???
#13  0xaaaac3a05673 in csyl01_
	at /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/test/lapack-3.12.0/EIG/csyl01.f:308
#14  0xaaaac3a0980b in cchkec_
	at /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/test/lapack-3.12.0/EIG/cchkec.f:129
#15  0xaaaac3a14e27 in cchkee
	at /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/test/lapack-3.12.0/EIG/cchkee.F:1271
#16  0xaaaac3a043e7 in main
	at /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/test/lapack-3.12.0/EIG/cchkee.F:2553
CMake Error at /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/test/lapack-3.12.0/runtest.cmake:51 (message):
  Test
  /builddir/build/BUILD/flexiblas-3.4.4-build/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc
  returned Subprocess aborted

Comment 1 Iñaki Ucar 2024-12-05 15:26:16 UTC
*** Bug 2330586 has been marked as a duplicate of this bug. ***

Comment 2 Fedora Admin user for bugzilla script actions 2024-12-06 01:51:19 UTC
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.

Comment 3 Iñaki Ucar 2024-12-16 16:01:18 UTC
Any progress on this matter?

Comment 4 Orion Poplawski 2025-01-01 05:54:22 UTC
Well, I tried building it in mock on aarch64-test01.fedorainfracloud.org to try to get some better debug information - but it built fine there.  So something strange is going on.

Comment 5 Iñaki Ucar 2025-01-01 12:27:18 UTC
Tried in Copr and builds fine too, but Koschei reports consistent failures in Koji. Not sure what to do next. Any suggestion?

Comment 6 Orion Poplawski 2025-01-04 18:12:52 UTC
I have attempted to collect some more debug info via the following - https://src.fedoraproject.org/fork/orion/rpms/flexiblas/tree/debug

But the valgrind run just seems to hang with no output from valgrind - https://kojipkgs.fedoraproject.org//work/tasks/3875/127513875/build.log

 Tests of the Nonsymmetric eigenproblem condition estimation routines
 CTRSYL, CTREXC, CTRSNA, CTRSEN
 Relative machine precision (EPS) =     0.119209E-06
 Safe minimum (SFMIN)             =     0.117549E-37
 Routines pass computational tests if test ratio is less than   20.00
 CEC routines passed the tests of the error exits ( 41 tests done)

And the crash seems to occur after memory corruption has already occurred so seems to be of limited utility.  So I'm at a loss myself.

Comment 7 Orion Poplawski 2025-01-05 18:15:36 UTC
Waiting longer and removing some earlier tests gives:

==44481== Thread 10:
==44481== Invalid read of size 4
==44481==    at 0x6182DC4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
==44481==  Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd
==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
==44481==    by 0x10C327: main (cchkee.F:2553)
==44481== 
==44481== Invalid read of size 4
==44481==    at 0x6182DCC: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
==44481==  Address 0x53cd9e8 is 8 bytes after a block of size 111,504 alloc'd
==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
==44481==    by 0x10C327: main (cchkee.F:2553)
==44481== 
==44481== Invalid write of size 4
==44481==    at 0x6182DF4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
==44481==  Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd
==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
==44481==    by 0x10C327: main (cchkee.F:2553)
==44481== 
==44481== Invalid write of size 4
==44481==    at 0x6182DF8: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
==44481==  Address 0x53cd9e8 is 8 bytes after a block of size 111,504 alloc'd
==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
==44481==    by 0x10C327: main (cchkee.F:2553)
==44481== 
==44481== Invalid read of size 4
==44481==    at 0x6182DC4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
==44481==  Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd
==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
==44481==    by 0x10C327: main (cchkee.F:2553)
==44481== 
==44481== Invalid read of size 4
==44481==    at 0x6182DCC: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
==44481==  Address 0x53cd9e8 is 8 bytes after a block of size 111,504 alloc'd
==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
==44481==    by 0x10C327: main (cchkee.F:2553)
==44481== 
==44481== Invalid write of size 4
==44481==    at 0x6182DF4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
==44481==  Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd
==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
==44481==    by 0x10C327: main (cchkee.F:2553)
==44481== 
==44481== Invalid write of size 4
==44481==    at 0x6182DF8: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
==44481==  Address 0x53cd9e8 is 8 bytes after a block of size 111,504 alloc'd
==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
==44481==    by 0x10C327: main (cchkee.F:2553)
==44481==

Comment 8 Dan Horák 2025-01-07 08:38:22 UTC
FWIW I have reproduced the issue on our bare-metal Ampere MtSnow system (80 cpus) doing a rawhide mock build for flexiblas.

Comment 9 Iñaki Ucar 2025-01-08 15:54:59 UTC
It seems that upstream is on the right path. I've limited the concurrence for FlexiBLAS testing to a maximum of 10 threads to avoid this crash for now. And it would be great to enable the LAPACK test suite in OpenBLAS, to detect this kind of issue earlier.

Comment 10 Aoife Moloney 2025-02-26 13:18:24 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 42 development cycle.
Changing version to 42.


Note You need to log in before you can comment on or make changes to this bug.