It appears as the P&Z fixes for openblas in https://github.com/OpenMathLib/OpenBLAS/issues/4475 are now causing serious issues when running tests of flexiblas. First off, LAPACK-xeigtsts_nep_in hangs, e.g. https://koji.fedoraproject.org/koji/taskinfo?taskID=113669862 and https://kojihub.stream.centos.org/koji/taskinfo?taskID=3637887 If that test is skipped, then a bunch of other tests fail: https://koji.fedoraproject.org/koji/taskinfo?taskID=113671107 and https://kojihub.stream.centos.org/koji/taskinfo?taskID=3639302 All other architectures are still building. Previous builds using openblas-0.3.26-1 (built with GCC 13 without those patches) built fine, even when flexiblas itself was built with GCC 14. Please note that this is blocking the c10s bootstrap of flexiblas. Reproducible: Always
One thing to try would be rebuilding openblas-0.3.26-1 (with gcc 14) with -Wno-error=incompatible-pointer-types if possible, to see what happens (does it compile, does it work with flexiblas, ...).
You're right, same thing happens with openblas-0.3.26-1 rebuilt with `%global build_type_safety_c 1`: https://download.copr.fedorainfracloud.org/results/yselkowitz/openblas-bz2264712/fedora-40-ppc64le/07031762-flexiblas/builder-live.log https://download.copr.fedorainfracloud.org/results/yselkowitz/openblas-bz2264712/fedora-40-ppc64le/07031785-flexiblas/builder-live.log So it's not your patches, and therefore possibly a GCC bug. Reassigning. Tools team: please note that this is blocking the c10s bootstrap of openblas and its dependents. Also, I wonder if this is related to the ppc64le test failures that we saw in scipy as well: https://gitlab.com/redhat/centos-stream/rpms/scipy/-/commit/1edeb822949e56c06940b3a17ea83481401c30e9
Just a wild guess, the backtraces look like #2261826 / gcc.gnu.org/PR113503 I've fixed recently and should be in gcc-14.0.1-0.7.fc41. Can you try that?
Looks to me the latest gcc makes no difference in up-to-date Rawhide unfortunately, it still hangs in ... Start 26: LAPACK-xlintsts_stest_in 25/115 Test #26: LAPACK-xlintsts_stest_in ......... Passed 13.10 sec Start 27: LAPACK-xlintstrfs_stest_rfp_in 26/115 Test #27: LAPACK-xlintstrfs_stest_rfp_in ... Passed 0.62 sec Start 28: LAPACK-xeigtsts_nep_in
LAPACK-xeigtsts_nep_in is doing something, so probably it entered an endless loop
my results are - endless loop(?) LAPACK-xeigtsts_nep_in + LAPACK-xeigtsts_sed_in - failures The following tests FAILED: 29 - LAPACK-xeigtsts_se2_in (Failed) 31 - LAPACK-xeigtsts_sec_in (Failed) 34 - LAPACK-xeigtsts_ssb_in (Failed) 35 - LAPACK-xeigtsts_ssg_in (Failed) 48 - LAPACK-xeigtstd_nep_in (Failed) 50 - LAPACK-xeigtstd_se2_in (Failed) 52 - LAPACK-xeigtstd_dec_in (Failed) 53 - LAPACK-xeigtstd_ded_in (Failed) 56 - LAPACK-xeigtstd_dsb_in (Failed) 57 - LAPACK-xeigtstd_dsg_in (Failed) 72 - LAPACK-xeigtstc_se2_in (Failed) 78 - LAPACK-xeigtstc_csb_in (Failed) 79 - LAPACK-xeigtstc_csg_in (Failed) 94 - LAPACK-xeigtstz_se2_in (Failed) 100 - LAPACK-xeigtstz_zsb_in (Failed)
Fortunately reproduces without LTO, that will make investigation easier. Trying now -O0 build...
Seems it fails even in -O0 build.
To be precise, reproduces even when both flexiblas and openblas are built with -O0. In that case (with OMP_NUM_THREADS=1), I see that zhb2st_kernels stores close to the start of the function into %r29 (call saved register) the value of ~(long)*lda, i.e. 0xfffffffffffffff8 and uses it later in the function. Now, at some point it calls zlarfx_ C wrapper which calls zlarfx. zlarfx_ saves into stack %r29 but later on zlarfx overwrites that in the prologue: #0 zlarfx (side=..., m=3, n=1, v=..., tau=(0,-0), c=..., ldc=6, work=..., _side=7) at zlarfx.f:118 #1 0x00007ffff7d91074 in zlarfx_ (side=0x7ffff6a1b068 'Left\000', m=0x7ffffffdd4e4, n=0x7ffffffdd4e0, v=0x7fffe3db0110, tau=0x7ffffffdd528, c=0x7fffe3db04d0, ldc=0x7ffffffdd4e8, work=0x7fffe3db0510) at /home/nfs/jakub/rpmbuild/BUILD/flexiblas-3.4.1/src/lapack_interface/wrapper/zlarfx.c:61 #2 0x00007ffff669d954 in zhb2st_kernels (uplo=..., wantz=.FALSE., ttype=2, st=2, ed=4, sweep=1, n=5, nb=3, ib=16, a=..., lda=7, v=..., tau=..., ldvt=19, work=..., _uplo=140737354097904) at zhb2st_kernels.f:271 #3 0x00007ffff7d64dcc in zhb2st_kernels_ (uplo=0x7ffffffde9a8 'UH\021\000', wantz=0x7ffffffdda50, ttype=0x7ffffffdd704, st=0x7ffffffdd4e8, ed=0x7fffe3db04d0, sweep=0x7ffffffdd528, n=0x7fffe3db0110, nb=0x7ffffffdd4e0, ib=0x7ffffffdda44, a=0x7fffe3db02e0, lda=0x7ffffffdda48, v=0x7fffe3db0100, tau=0x7fffe3db0060, ldvt=0x7ffffffdda4c, work=0x7fffe3db0510) at /home/nfs/jakub/rpmbuild/BUILD/flexiblas-3.4.1/src/lapack_interface/wrapper/zhb2st_kernels.c:61 Dump of assembler code for function zlarfx_: 0x00007ffff659d0b8 <+0>: addis r2,r12,91 0x00007ffff659d0bc <+4>: addi r2,r2,24392 0x00007ffff659d0c0 <+8>: mflr r0 0x00007ffff659d0c4 <+12>: std r0,16(r1) 0x00007ffff659d0c8 <+16>: std r29,-24(r1) 0x00007ffff659d0cc <+20>: std r30,-16(r1) 0x00007ffff659d0d0 <+24>: std r31,-8(r1) 0x00007ffff659d0d4 <+28>: stdu r1,-832(r1) 0x00007ffff659d0d8 <+32>: mr r31,r1 0x00007ffff659d0dc <+36>: std r3,864(r31) 0x00007ffff659d0e0 <+40>: std r4,872(r31) 0x00007ffff659d0e4 <+44>: std r5,880(r31) 0x00007ffff659d0e8 <+48>: std r6,888(r31) 0x00007ffff659d0ec <+52>: std r7,896(r31) 0x00007ffff659d0f0 <+56>: std r8,904(r31) 0x00007ffff659d0f4 <+60>: std r9,912(r31) => 0x00007ffff659d0f8 <+64>: std r10,920(r31) 0x00007ffff659d0fc <+68>: ld r9,912(r31) The above std r10 overwrote it. The caller saved it in Dump of assembler code for function zlarfx_: 0x00007ffff7d90ef0 <+0>: addis r2,r12,26 0x00007ffff7d90ef4 <+4>: addi r2,r2,25872 0x00007ffff7d90ef8 <+8>: mflr r0 0x00007ffff7d90efc <+12>: std r23,-72(r1) 0x00007ffff7d90f00 <+16>: std r24,-64(r1) 0x00007ffff7d90f04 <+20>: std r25,-56(r1) 0x00007ffff7d90f08 <+24>: std r26,-48(r1) 0x00007ffff7d90f0c <+28>: std r27,-40(r1) 0x00007ffff7d90f10 <+32>: std r28,-32(r1) => 0x00007ffff7d90f14 <+36>: std r29,-24(r1) 0x00007ffff7d90f18 <+40>: std r30,-16(r1) 0x00007ffff7d90f1c <+44>: std r31,-8(r1) 0x00007ffff7d90f20 <+48>: std r0,16(r1) 0x00007ffff7d90f24 <+52>: stdu r1,-112(r1) and it happens to be the same address. Now, I believe that is because flexiblas is buggy. The caller is: #1 0x00007ffff7d91074 in zlarfx_ (side=0x7ffff6a1b068 "Left", m=0x7ffffffdd4e4, n=0x7ffffffdd4e0, v=0x7fffe3db0110, tau=0x7ffffffdd528, c=0x7fffe3db04d0, ldc=0x7ffffffdd4e8, work=0x7fffe3db0510) at /home/nfs/jakub/rpmbuild/BUILD/flexiblas-3.4.1/src/lapack_interface/wrapper/zlarfx.c:61 61 fn((void*) side, (void*) m, (void*) n, (void*) v, (void*) tau, (void*) c, (void*) ldc, (void*) work); 56 current_backend->post_init = 0; 57 } 58 *(void **) & fn = current_backend->lapack.zlarfx.f77_blas_function; 59 *(void **) & fn_hook = __flexiblas_hooks->zlarfx.f77_hook_function[0]; 60 if ( fn_hook == NULL ) { 61 fn((void*) side, (void*) m, (void*) n, (void*) v, (void*) tau, (void*) c, (void*) ldc, (void*) work); 62 return; 63 } else { 64 hook_pos_zlarfx = 0; 65 fn_hook((void*) side, (void*) m, (void*) n, (void*) v, (void*) tau, (void*) c, (void*) ldc, (void*) work); so calls fn with 8 arguments. Except that the function it calls is #0 zlarfx (side=..., m=3, n=1, v=..., tau=(0,-0), c=..., ldc=6, work=..., _side=7) at zlarfx.f:118 118 SUBROUTINE ZLARFX( SIDE, M, N, V, TAU, C, LDC, WORK ) 124 * .. Scalar Arguments .. 125 CHARACTER SIDE 126 INTEGER LDC, M, N 127 COMPLEX*16 TAU 128 * .. 129 * .. Array Arguments .. 130 COMPLEX*16 C( LDC, * ), V( * ), WORK( * ) As can be seen even from gdb, the callee takes 8 Fortran arguments, but that in the Fortran calling conventions is actually 9 because one of them is CHARACTER, so there is additional _side argument, one needs to pass not just the pointer to the first character but also the length because Fortran doesn't use zero terminated strings but explicit lengths. See e.g. /home/nfs/jakub/rpmbuild/BUILD/flexiblas-3.4.1/src/lapack_interface/wrapper/cgejsv.c how it expects from the callers fortran_charlen_t extra arguments and passes them down to what it wraps. grep 'char[*].*fortran_charlen_t' flexiblas-3.4.1/src/lapack_interface/wrapper/*.c | wc -l 110 Those are the correctly handled cases. grep 'char[*]' flexiblas-3.4.1/src/lapack_interface/wrapper/*.c | grep -v fortran_charlen_t | wc -l 6240 shows the likely mishandled ones. Now, if nothing on the callee side actually looks at the passed in CHARACTER arguments, maybe it works fine on other arches. But I believe PowerPC is very sensitive to this, if you lie on the number of arguments to non-varargs function and pass fewer than the callee expects, the caller allocates smaller stack area for the saving of the arguments but callee might use the stack area of expected size for saving/restoring stuff. https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#STACK The parameter save area shall be allocated by the caller. It shall be doubleword aligned, and shall be at least 8 doublewords in length. If a function needs to pass more than 8 doublewords of arguments, the parameter save area shall be large enough to contain the arguments that the caller stores in it. Its contents are not preserved across function calls. So, when caller lies here, it expects to pass just 8 doubleword arguments and sizes the stack frame accordingly, but the callee expects 9 arguments and expects different size and stores into it.
@dan based on that, would you be able to make a patch to fix flexiblas?
Although I still wonder why this didn't fail with GCC 13 and is failing with 14?
Did something changed in GFortran? AFAIK, this "ghost" last CHARACTER argument is a calling convention from BLAS/LAPACK, and not something specific to FlexiBLAS. See e.g. this discussion: - https://blog.r-project.org/2019/05/15/gfortran-issues-with-lapack/index.html - https://blog.r-project.org/2019/09/25/gfortran-issues-with-lapack-ii/ Anyway, I'll bring this upstream. Thanks for the detailed analysis.
Thanks Jakub for the detailed analysis. For the purposes of unblocking builds of flexiblas in the meantime (e.g. c10s bootstrap), are there any workarounds?
(In reply to Yaakov Selkowitz from comment #10) > @dan based on that, would you be able to make a patch to fix > flexiblas? Anyone (including me) probably could prepare one "blindly". And due the amount of changes needed it would require write some kind of "generator" first. Also function bodies need to be updated if I see right, in fact the individual source files will be rewritten almost completely. So not a simple task ...
Nothing really changed on the GFortran side. The above mentioned blogs are about tail call optimization, for which I've added an ugly workaround and that one is still in gfortran. What I see above is not about tail calls, at -O0 tail calls are disabled and at -O2 prevented by that workaround. But it is about the sizes of powerpc ELFv2 stack frame for different number of arguments and the use of that area in functions. Consider: void f7 (int, int, int, int, int, int, int); void f8 (int, int, int, int, int, int, int, int); void f9 (int, int, int, int, int, int, int, int, int); void f10 (int, int, int, int, int, int, int, int, int, int); void f11 (int, int, int, int, int, int, int, int, int, int, int); int c7 (void) { f7 (0, 1, 2, 3, 4, 5, 6); return 0; } int c8 (void) { f8 (0, 1, 2, 3, 4, 5, 6, 7); return 0; } int c9 (void) { f9 (0, 1, 2, 3, 4, 5, 6, 7, 8); return 0; } int c10 (void) { f10 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9); return 0; } int c11 (void) { f11 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10); return 0; } GCC (various versions) as well as clang on powerpc64le uses very small stack frame above for c7 and c8 functions: stdu 1,-32(1) but for c9/c10/c11 uses significantly larger: stdu 1,-112(1) So, if you want a quick fix, I'd think it might be enough to look through all 8 argument wrappers which have any CHARACTER arguments among them and don't have fortran_charlen_t arguments yet. That is still a lot: grep 'char[*]' flexiblas-3.4.1/src/lapack_interface/wrapper/*.c | grep -v fortran_charlen_t | grep '(\([^),]*,\)\{7\}[^),]*)' | awk '{print $1}' | sort -u | wc -l 132 but less than all the other argument counts: grep 'char[*]' flexiblas-3.4.1/src/lapack_interface/wrapper/*.c | grep -v fortran_charlen_t | awk '{print $1}' | sort -u | wc -l 1248 But I bet the fix can be best done by some simple script and verify the results afterwards. Out of those 132, 8 files have 3 char* arguments, 22 2 char* arguments and rest just one.
Upstream bug: https://github.com/mpimd-csc/flexiblas/issues/45
Anyway, I'd recommend to start by hand with fixing flexiblas-3.4.1/src/lapack_interface/wrapper/zlarfx.c and maybe flexiblas-3.4.1/src/lapack_interface/wrapper/zlarfy.c; all the testsuite crashes I saw have the same address in the innermost frame and that is inside of ZHB2ST_KERNELS I believe, and ZLARFX and ZLARFY seems to be the only calls that function calls which have 8 arguments with one of them CHARACTER.
(In reply to Jakub Jelinek from comment #17) > Anyway, I'd recommend to start by hand with fixing > flexiblas-3.4.1/src/lapack_interface/wrapper/zlarfx.c and maybe > flexiblas-3.4.1/src/lapack_interface/wrapper/zlarfy.c; all the testsuite > crashes I saw have the same address in the innermost frame and that is > inside of ZHB2ST_KERNELS I believe, and > ZLARFX and ZLARFY seems to be the only calls that function calls which have > 8 arguments with one of them CHARACTER. ack, will try that ASAP
with ctest -E "LAPACK-xeigtsts_nep_in|LAPACK-xeigtsts_sed_in" (to skip the endless looping tests) the list is a bit shorter The following tests FAILED: 29 - LAPACK-xeigtsts_se2_in (Failed) 31 - LAPACK-xeigtsts_sec_in (Failed) 34 - LAPACK-xeigtsts_ssb_in (Failed) 35 - LAPACK-xeigtsts_ssg_in (Failed) 48 - LAPACK-xeigtstd_nep_in (Failed) 50 - LAPACK-xeigtstd_se2_in (Failed) 52 - LAPACK-xeigtstd_dec_in (Failed) 53 - LAPACK-xeigtstd_ded_in (Failed) 56 - LAPACK-xeigtstd_dsb_in (Failed) 57 - LAPACK-xeigtstd_dsg_in (Failed) 72 - LAPACK-xeigtstc_se2_in (Failed) 78 - LAPACK-xeigtstc_csb_in (Failed) 79 - LAPACK-xeigtstc_csg_in (Failed) It fixed these 2 "z-class" 94 - LAPACK-xeigtstz_se2_in (Failed) 100 - LAPACK-xeigtstz_zsb_in (Failed)
Created attachment 2017838 [details] WIP fix
As it appears that any fix for this will be extensive, it has been decided that this will not block the c10s bootstrap. It does still need to be fixed in both Fedora and c10s though, just not this week.
Comment on attachment 2017838 [details] WIP fix Yeah, this looks right. The (fortran_charlen_t) len_* casts are useless, but guess it depends on whether the other files that already have fortran_charlen_t do it like that or not. Anyway, I'd think the above changes at least for the single char* argument could be scripted using sed, awk or python or something like that pretty easily. It is possible the other tests fail because of the 130 affected other files. Or of course it could be some other bug, who knows. But, given that everything misbehaves even with -O0, the probability of bugs in compiler optimizations is lower.
There is some good discussion in the upstream ticket as well and my understanding is that the wrappers are generated and the developer is going to regenerate them based on your analysis. The current state was based on an assumption described in https://github.com/mpimd-csc/flexiblas/issues/45#issuecomment-1954873423 I have modelled my patch after the upstream code, even with the superfluous casts :-)
This bug is the exact same issue reported in https://gcc.gnu.org/PR100799 which we closed as INVALID, meaning user error on the C caller part when calling the fortran function and not passing the hidden param. Surya mentioned two possible fixes to the flexiblas code, but the reporter must have dropped the ball.
A short summary of the bug is that the C caller and Fortran callee disagree on whether a parameter save area has been allocated or not. The param save area is actually allocated in the caller's stack frame and the callee is allowed to read/write to that space. The caller can omit allocating the param save area if it knows all params will fit in registers (ie, 8 doublewords/args or less). Jakub's analysis above is correct. The mismatch of expected # params is only a problem when the caller thinks there are 8 or fewer params (ie, not param save area needed) and the caller thinks there are more than 8 params, so thinks there is a param save area....ie, this exact case. :-(
fixed in git (3.4.2-rc1) per https://github.com/mpimd-csc/flexiblas/issues/45#issuecomment-1966048540 IMO it makes sense to rebase to the RC for both F-40+ and downstream
I'll check the new RC. I'll need to reproduce the issue first, because I'm confused by the fact that I don't see any failures in Koschei (https://koschei.fedoraproject.org/package/flexiblas?collection=f41). Any idea why?
I think it's because it needs openblas rebuilt with gcc14 in the buildroot, which became available on Feb 11 and the last Koschei build of flexiblas is from Feb 09. A local rawhide/ppc64le rebuild looks good here - 100% tests passed, 0 tests failed out of 115
Thanks, scratch-building now in Koji to confirm.
Tests are clean, but there are issues with the naming of the 64-bit files now. Working with upstream to fix this too.
It was my mistake, wrong variable name. I'll recheck just in case, but this looks good. I'll tell Martin to make the final release and I'll publish the corresponding updates.
Thanks, I have seen the file naming issue as well, but wasn't sure what's wrong there.
FEDORA-2024-b92bee063f (flexiblas-3.4.2-1.fc41) has been submitted as an update to Fedora 41. https://bodhi.fedoraproject.org/updates/FEDORA-2024-b92bee063f
FEDORA-2024-b92bee063f (flexiblas-3.4.2-1.fc41) has been pushed to the Fedora 41 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2024-7d7f63b8da (flexiblas-3.4.2-1.fc40) has been submitted as an update to Fedora 40. https://bodhi.fedoraproject.org/updates/FEDORA-2024-7d7f63b8da
FEDORA-2024-6ff1eeab91 (flexiblas-3.4.2-1.fc39) has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2024-6ff1eeab91
FEDORA-2024-cb96c80ab5 (flexiblas-3.4.2-1.fc38) has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2024-cb96c80ab5
FEDORA-2024-7d7f63b8da has been pushed to the Fedora 40 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-7d7f63b8da` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-7d7f63b8da See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-6ff1eeab91 has been pushed to the Fedora 39 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-6ff1eeab91` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-6ff1eeab91 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-cb96c80ab5 has been pushed to the Fedora 38 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-cb96c80ab5` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-cb96c80ab5 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-6ff1eeab91 (flexiblas-3.4.2-1.fc39) has been pushed to the Fedora 39 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2024-cb96c80ab5 (flexiblas-3.4.2-1.fc38) has been pushed to the Fedora 38 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2024-7d7f63b8da (flexiblas-3.4.2-1.fc40) has been pushed to the Fedora 40 stable repository. If problem still persists, please make note of it in this bug report.