Bug 2329491
Summary: | LAPACK-xeigtstc_cec_in test crash | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Iñaki Ucar <i.ucar86> |
Component: | openblas | Assignee: | Ali Erdinc Koroglu <aekoroglu> |
Status: | NEW --- | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 42 | CC: | aekoroglu, dan, jeremy.linton, jhughes, nforro, orion, psimovec, susi.lehtola |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | aarch64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | --- | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Iñaki Ucar
2024-11-29 14:19:57 UTC
*** Bug 2330586 has been marked as a duplicate of this bug. *** This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component. Any progress on this matter? Well, I tried building it in mock on aarch64-test01.fedorainfracloud.org to try to get some better debug information - but it built fine there. So something strange is going on. Tried in Copr and builds fine too, but Koschei reports consistent failures in Koji. Not sure what to do next. Any suggestion? I have attempted to collect some more debug info via the following - https://src.fedoraproject.org/fork/orion/rpms/flexiblas/tree/debug But the valgrind run just seems to hang with no output from valgrind - https://kojipkgs.fedoraproject.org//work/tasks/3875/127513875/build.log Tests of the Nonsymmetric eigenproblem condition estimation routines CTRSYL, CTREXC, CTRSNA, CTRSEN Relative machine precision (EPS) = 0.119209E-06 Safe minimum (SFMIN) = 0.117549E-37 Routines pass computational tests if test ratio is less than 20.00 CEC routines passed the tests of the error exits ( 41 tests done) And the crash seems to occur after memory corruption has already occurred so seems to be of limited utility. So I'm at a loss myself. Waiting longer and removing some earlier tests gives: ==44481== Thread 10: ==44481== Invalid read of size 4 ==44481== at 0x6182DC4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid read of size 4 ==44481== at 0x6182DCC: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e8 is 8 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid write of size 4 ==44481== at 0x6182DF4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid write of size 4 ==44481== at 0x6182DF8: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e8 is 8 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid read of size 4 ==44481== at 0x6182DC4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid read of size 4 ==44481== at 0x6182DCC: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e8 is 8 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid write of size 4 ==44481== at 0x6182DF4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== ==44481== Invalid write of size 4 ==44481== at 0x6182DF8: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E36BC7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so) ==44481== by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0) ==44481== by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6) ==44481== by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6) ==44481== Address 0x53cd9e8 is 8 bytes after a block of size 111,504 alloc'd ==44481== at 0x48854F0: malloc (vg_replace_malloc.c:446) ==44481== by 0x10C6CB: csyl01_ (csyl01.f:151) ==44481== by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129) ==44481== by 0x119D4F: MAIN__ (cchkee.F:1271) ==44481== by 0x10C327: main (cchkee.F:2553) ==44481== FWIW I have reproduced the issue on our bare-metal Ampere MtSnow system (80 cpus) doing a rawhide mock build for flexiblas. It seems that upstream is on the right path. I've limited the concurrence for FlexiBLAS testing to a maximum of 10 threads to avoid this crash for now. And it would be great to enable the LAPACK test suite in OpenBLAS, to detect this kind of issue earlier. This bug appears to have been reported against 'rawhide' during the Fedora Linux 42 development cycle. Changing version to 42. |