|Summary:||Octave is linked against multiple blas packages|
|Product:||[Fedora] Fedora||Reporter:||Dmitri A. Sergatskov <dasergatskov>|
|Component:||octave||Assignee:||Orion Poplawski <orion>|
|Status:||CLOSED EOL||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||27||CC:||alexl, cbm, cse.cem+redhatbugz, fkluknav, jaromir.capik, orion, rakesh.pandit, susi.lehtola|
|Fixed In Version:||Doc Type:||If docs needed, set a value|
|Doc Text:||Story Points:||---|
|Last Closed:||2018-11-30 21:50:34 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Dmitri A. Sergatskov 2017-10-31 04:27:28 UTC
Created attachment 1345698 [details] gdb backtrace Description of problem: openblas-0.2.20-3 broke "make check" during octave build. Version-Release number of selected component (if applicable): openblas-0.2.20-3 How reproducible: 100% Steps to Reproduce: 1. get octave development sourse 2. build it againt openblas 3. run "make check" Actual results: make check will hang with an octave process running at 100% CPU load during "classes.tst" test. Expected results: It should pass Additional info: Building against reference blas, or atlas, or self-compiled openblas-0.2.2 from upstream tar ball all work fine. I assume the previous rpm (0.2.20-2) would work as well since I noticed the problem soon after the update. This affects Fedora 27/26 and EPEL 7. A backtrace of gdb attached to the hanged octave process is attached. It appears that it hangs in dgemm. Unfortunately I cannot provide a simple test case at this moment. Dmitri.
Comment 1 Susi Lehtola 2017-10-31 06:35:00 UTC
The backtrace doesn't show any info about what is crashing. Please install openblas-debuginfo.
Comment 2 Dmitri A. Sergatskov 2017-10-31 14:33:27 UTC
It is not crashing, it is in the infinite loop.
Comment 3 Susi Lehtola 2017-10-31 15:40:33 UTC
Stable Octave builds and passes test suite fine with openblas-0.2.0-3. https://koji.fedoraproject.org/koji/taskinfo?taskID=22828343 Please test with an older version of openblas.
Comment 4 Dmitri A. Sergatskov 2017-10-31 16:55:42 UTC
The bug is obscure. Octave links some other libraries which in turnm linked to different blas(es) so it possible that 4.2.1 gets to use different dgemm. ldd /usr/bin/octave-cli | grep blas libopenblas.so.0 => /lib64/libopenblas.so.0 (0x00007f0c41879000) ldd /usr/bin/octave-cli | grep blas libopenblas.so.0 => /lib64/libopenblas.so.0 (0x00007f0c41879000) libopenblasp.so.0 => /lib64/libopenblasp.so.0 (0x00007f0c39aa6000) libopenblasp.so.0 => /lib64/libopenblasp.so.0 (0x00007f0c39aa6000) ldd /usr/bin/octave-cli | grep atlas libsatlas.so.3 => /usr/lib64/atlas/libsatlas.so.3 (0x00007f48b2d6f000) Anyway, the trace with debug symbols (just top lines): #0 dgemm_ (TRANSA=<optimized out>, TRANSB=<optimized out>, M=<optimized out>, N=<optimized out>, K=<optimized out>, alpha=<optimized out>, a=0x139b5f0, ldA=0x7ffffcdb0f68, b=0x654f540, ldB=0x7ffffcdb0f60, beta=0x7ffffcdb1068, c=0x2e6d5e0, ldC=0x7ffffcdb0f7c) at gemm.c:403 #1 0x00007f4cf770dc9c in xgemm (a=..., b=..., transa=blas_no_trans, transb=blas_no_trans) at ../liboctave/array/dMatrix.cc:3008 #2 0x00007f4cf770dd70 in operator* (a=..., b=...) at ../liboctave/array/dMatrix.cc:3023 #3 0x00007f4cf887311c in oct_binop_mul (a1=..., a2=...) at ../libinterp/operators/op-m-m.cc:63 #4 0x00007f4cf8b289b7 in do_binary_op (op=octave_value::op_mul, v1=..., v2=...) at ../libinterp/octave-value/ov.cc:2186 Dmitri.
Comment 5 Dmitri A. Sergatskov 2017-10-31 18:24:46 UTC
Also, using libopenblaso.so (from 0.2.20-3) -- passes libopenblasp.so -- hangs libopenblas.so (default) -- hangs may be that give some additional clues... Also all my computers are fairly old (i7-2600k perhaps is the newest), perhaps it would work on a newer cpus... Dmitri.
Comment 6 Susi Lehtola 2017-10-31 19:52:39 UTC
Not surprising: libopenblas, libopenblasp and libopenblaso all provide the same symbols, but are different libraries. It's even worse if you're also linking to some other library. This is clearly a bug, but not in OpenBLAS. Reassigning to Octave. I can see on my Fedora 26 computer with $ rpm -q octave octave-4.2.1-3.fc26.x86_64 that the linkage is all around the place $ ldd /usr/bin/octave-cli|grep -i atla libtatlas.so.3 => /usr/lib64/atlas/libtatlas.so.3 (0x00007fd1b45ab000) libsatlas.so.3 => /usr/lib64/atlas/libsatlas.so.3 (0x00007fd1acd11000) $ ldd /usr/bin/octave-cli|grep -i blas libblas.so.3 => /lib64/libblas.so.3 (0x00007fcca3aab000) $ ldd /usr/bin/octave-cli|grep -i lapack liblapack.so.3 => /lib64/liblapack.so.3 (0x00007f266d0c3000) It's not linking to two, but three(!) mutually incompatible implementations: sequential ATLAS, threaded ATLAS, as well as reference BLAS and LAPACK, all of which supply the same symbols. I'm afraid even to look at what the libraries that are linked to Octave are using, since they will further confuse the mess.
Comment 7 Dmitri A. Sergatskov 2017-10-31 20:38:50 UTC
Why do you think they are mutually incompatible implementations? In fact you can link octave against e,g, reference blas and then override it with LD_PRELOAD=/your/optimized/blas/library.so method. Dmitri.
Comment 8 Susi Lehtola 2017-11-01 06:17:06 UTC
(In reply to Dmitri A. Sergatskov from comment #7) > Why do you think they are mutually incompatible implementations? > In fact you can link octave against e,g, reference blas and then override > it with LD_PRELOAD=/your/optimized/blas/library.so method. That's my experience. If you link to multiple libraries that provide the same symbols, you'll get weird behavior and crashes.
Comment 9 Ben Cotton 2018-11-27 15:19:02 UTC
This message is a reminder that Fedora 27 is nearing its end of life. On 2018-Nov-30 Fedora will stop maintaining and issuing updates for Fedora 27. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '27'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 27 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Comment 10 Ben Cotton 2018-11-30 21:50:34 UTC
Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.