Bug 1752241
Summary: | octave test fails with illegal instruction on s390x | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Orion Poplawski <orion> | ||||
Component: | openblas | Assignee: | Nikola Forró <nforro> | ||||
Status: | CLOSED ERRATA | QA Contact: | RHEL CS Apps Subsystem QE <rhel-cs-apps-subsystem-qe> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 8.0 | CC: | alex, bugproxy, cbm, dan, fkluknav, hannsj_uhl, jaromir.capik, jkejda, mmahut, orion, rakesh.pandit, susi.lehtola | ||||
Target Milestone: | rc | ||||||
Target Release: | 8.2 | ||||||
Hardware: | s390x | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | openblas-0.3.3-4.el8 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-04-28 15:55:31 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 467765, 1711971 | ||||||
Attachments: |
|
Description
Orion Poplawski
2019-09-15 02:30:12 UTC
Tried to get a backtrace with libSegFault.so to no avail. I'll try to look. Is it octave from epel8 branch? Yes. Thanks. OK, reproduced locally, bellow is the traceback. (gdb) where #0 0x000003ffa852e1a8 in izamax_k () from /lib64/libopenblas.so.0 #1 0x000003ffa83a7d46 in izamax_ () from /lib64/libopenblas.so.0 #2 0x000003ffa8a3e564 in zlatrs_ () from /lib64/libopenblas.so.0 #3 0x000003ffa8a80f2e in ztrcon_ () from /lib64/libopenblas.so.0 #4 0x000003ffab157ef4 in ComplexMatrix::utsolve (this=this@entry=0x3ffc53f1278, mattype=..., b=..., info=@0x3ffc53f0e54: 0, rcon=@0x3ffc53f0e58: 0, sing_handler=0x0, calc_cond=true, transt=blas_no_trans) at liboctave/array/CMatrix.cc:1566 #5 0x000003ffab15b8b2 in ComplexMatrix::solve (this=this@entry=0x3ffc53f1278, mattype=..., b=..., info=@0x3ffc53f0e54: 0, rcon=@0x3ffc53f0e58: 0, sing_handler=0x0, singular_fallback=true, transt=blas_no_trans) at liboctave/array/CMatrix.cc:1977 #6 0x000003ffab47e478 in lusolve<ComplexMatrix, ComplexMatrix> (L=..., U=..., m=..., m@entry=<error reading variable: value has been optimized out>) at ./liboctave/array/dim-vector.h:285 #7 0x000003ffab48f75c in EigsComplexNonSymmetricMatrixShift<ComplexMatrix> (m=..., sigma=..., k_arg=k_arg@entry=10, p_arg=<optimized out>, info=@0x3ffc53f1734: 0, eig_vec=..., eig_val=..., _b=..., permB=..., cresid=..., os=..., tol=<optimized out>, tol@entry=2.2204460492503131e-16, rvec=false, cholB=false, disp=0, maxit=7) at /usr/include/c++/8/complex:1307 #8 0x000003ff66c8e726 in F__eigs__ (interp=..., args=..., nargout=<optimized out>) at libinterp/dldfcn/__eigs__.cc:457 #9 0x000003ffac6beb5a in octave_builtin::call (this=0x2aa73c688a0, tw=..., nargout=<optimized out>, args=...) at libinterp/octave-value/ov-builtin.cc:71 (gdb) disas Dump of assembler code for function izamax_k: ... 0x000003ffa852e182 <+978>: vrepg %v5,%v7,1 0x000003ffa852e188 <+984>: wfcdb %v26,%v6 0x000003ffa852e18e <+990>: jne 0x3ffa852e1a8 <izamax_k+1016> 0x000003ffa852e192 <+994>: vsteg %v6,160(%r15),0 0x000003ffa852e198 <+1000>: vmnlg %v1,%v5,%v7 0x000003ffa852e19e <+1006>: vlgvg %r5,%v1,0 0x000003ffa852e1a4 <+1012>: j 0x3ffa852e1a6 <izamax_k+1014> => 0x000003ffa852e1a8 <+1016>: wfchdb %v16,%v26,%v6 0x000003ffa852e1ae <+1022>: vsel %v1,%v5,%v7,%v16 0x000003ffa852e1b4 <+1028>: vsel %v0,%v26,%v6,%v16 0x000003ffa852e1ba <+1034>: vlgvg %r5,%v1,0 0x000003ffa852e1c0 <+1040>: std %f0,160(%r15) 0x000003ffa852e1c4 <+1044>: cgrjh %r2,%r11,0x3ffa852e1ce <izamax_k+1054> 0x000003ffa852e1ca <+1050>: j 0x3ffa852dede <izamax_k+302> 0x000003ffa852e1ce <+1054>: sllg %r4,%r11,1 0x000003ffa852e1d4 <+1060>: ld %f4,160(%r15) 0x000003ffa852e1d8 <+1064>: j 0x3ffa852de8c <izamax_k+220> 0x000003ffa852e1dc <+1068>: lghi %r2,1 0x000003ffa852e1e0 <+1072>: j 0x3ffa852de4e <izamax_k+158> 0x000003ffa852e1e4 <+1076>: brasl %r14,0x3ffa837b5d8 <__stack_chk_fail@plt> End of assembler dump. Could be z14 instruction slipping into z13 code or similar issue. I haven't checked the openblas build for rhel8/epel8 yet. 0x000003ffa852e1a4 <+1012>: j 0x3ffa852e1a6 <izamax_k+1014> looks suspicious, it jumps into a middle of next instruction, while it should jump much further, right after the "std" instruction https://github.com/xianyi/OpenBLAS/blob/v0.3.3/kernel/zarch/izamax.c#L188 is the source code in question Reassigned to RHEL, trying to figure out if it's an openblas issue or a toolchain issue. With fixed openblas I've got Summary: PASS 15407 FAIL 5 REGRESSION 1 XFAIL (reported bug) 28 SKIP (missing feature) 124 SKIP (run-time condition) 34 Created attachment 1615586 [details]
fix izamax
I strongly recommend to explicitly set TARGET=Z13 during the build, so the rpms won't get different default when the builders move to another machine. Thanks Dan.
> I strongly recommend to explicitly set TARGET=Z13 during the build, so the rpms won't get different default when the builders move to another machine.
Do you think I should also disable DYNAMIC_ARCH as is the case with other non-x86_64 arches?
AFAIK using DYNAMIC_ARCH is OK, because it builds all variants and selects the right one during runtime. What we should fix is the builds that don't support DYNAMIC_ARCH and don't set TARGET explicitly (like s390x). ------- Comment From Andreas.Krebbel.com 2019-11-21 07:04 EDT------- "vlgvg %[index],%%v1,0 \n\t" "j 3 \n\t" "2: \n\t" "j 3" is wrong. It must be either "j 3f" or "j 3b". This problem has been fixed in OpenBLAS in February this year. > "j 3" is wrong. It must be either "j 3f" or "j 3b". This problem has been fixed in OpenBLAS in February this year.
Yes, the patch changes "j 3" to "j 3f".
------- Comment From Andreas.Krebbel.com 2019-11-21 07:53 EDT------- (In reply to comment #12) > > "j 3" is wrong. It must be either "j 3f" or "j 3b". This problem has been fixed in OpenBLAS in February this year. > > Yes, the patch changes "j 3" to "j 3f". Oh right. I missed that. In upstream OpenBLAS there are bunch of patches to add z14 support. These also fix a couple of issues with the z13 support. There should be no testsuite fails anymore with the upstream level. We will check what needs to be backported and open a separate Bugzilla for this. IIRC s390x is the only (RHEL) arch in openblas that can't build with DYNAMIC_ARCH (aka runtime CPU level detection). Without that we can only build the z13 variant for RHEL-8 as it's the minimum supported arch. ------- Comment From arnez.com 2019-11-21 09:06 EDT------- > IIRC s390x is the only (RHEL) arch in openblas that can't build with > DYNAMIC_ARCH (aka runtime CPU level detection). Right. There was a proposed project as part of the OpenMainframeProject's internship program to fix that: https://github.com/openmainframeproject-internship/resources/blob/master/proposed_projects/OpenBLAS.mdp But it hasn't been picked up by anyone yet. ------- Comment From arnez.com 2019-11-21 09:14 EDT------- Oops, please replace "OpenBLAS.mdp" by "OpenBLAS.md" in the URL above. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1664 |