Bug 1507744

Summary: Octave is linked against multiple blas packages
Product: [Fedora] Fedora Reporter: Dmitri A. Sergatskov <dasergatskov>
Component: octaveAssignee: Orion Poplawski <orion>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 27CC: alex, cbm, cse.cem+redhatbugz, fkluknav, jaromir.capik, orion, rakesh.pandit, susi.lehtola
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-30 21:50:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gdb backtrace none

Description Dmitri A. Sergatskov 2017-10-31 04:27:28 UTC
Created attachment 1345698 [details]
gdb backtrace

Description of problem:
openblas-0.2.20-3 broke "make check" during octave build. 

Version-Release number of selected component (if applicable):
openblas-0.2.20-3

How reproducible:
100%

Steps to Reproduce:
1. get octave development sourse
2. build it againt openblas
3. run "make check"

Actual results:
make check will hang with an octave process running at 100% CPU load
during "classes.tst" test.

Expected results:
It should pass


Additional info:
Building against reference blas, or atlas, or self-compiled openblas-0.2.2 from upstream tar ball all work fine. I assume the previous rpm (0.2.20-2) would work
as well since I noticed the problem soon after the update.

This affects Fedora 27/26 and EPEL 7.

A backtrace of gdb attached to the hanged octave process is attached. 
It appears that it hangs in dgemm. Unfortunately I cannot provide a simple 
test case at this moment.

Dmitri.

Comment 1 Susi Lehtola 2017-10-31 06:35:00 UTC
The backtrace doesn't show any info about what is crashing. Please install openblas-debuginfo.

Comment 2 Dmitri A. Sergatskov 2017-10-31 14:33:27 UTC
It is not crashing, it is in the infinite loop.

Comment 3 Susi Lehtola 2017-10-31 15:40:33 UTC
Stable Octave builds and passes test suite fine with openblas-0.2.0-3.

https://koji.fedoraproject.org/koji/taskinfo?taskID=22828343

Please test with an older version of openblas.

Comment 4 Dmitri A. Sergatskov 2017-10-31 16:55:42 UTC
The bug is obscure. Octave links some other libraries which in turnm linked to different blas(es) so it possible that 4.2.1 gets to use different dgemm.

 ldd /usr/bin/octave-cli | grep blas
	libopenblas.so.0 => /lib64/libopenblas.so.0 (0x00007f0c41879000)
	 ldd /usr/bin/octave-cli | grep blas
	libopenblas.so.0 => /lib64/libopenblas.so.0 (0x00007f0c41879000)
	libopenblasp.so.0 => /lib64/libopenblasp.so.0 (0x00007f0c39aa6000)
        libopenblasp.so.0 => /lib64/libopenblasp.so.0 (0x00007f0c39aa6000)


ldd /usr/bin/octave-cli | grep atlas
	libsatlas.so.3 => /usr/lib64/atlas/libsatlas.so.3 (0x00007f48b2d6f000)

Anyway, the trace with debug symbols (just top lines):

#0  dgemm_ (TRANSA=<optimized out>, TRANSB=<optimized out>, M=<optimized out>, N=<optimized out>, K=<optimized out>, alpha=<optimized out>, a=0x139b5f0, ldA=0x7ffffcdb0f68, b=0x654f540, ldB=0x7ffffcdb0f60, 
    beta=0x7ffffcdb1068, c=0x2e6d5e0, ldC=0x7ffffcdb0f7c) at gemm.c:403
#1  0x00007f4cf770dc9c in xgemm (a=..., b=..., transa=blas_no_trans, transb=blas_no_trans) at ../liboctave/array/dMatrix.cc:3008
#2  0x00007f4cf770dd70 in operator* (a=..., b=...) at ../liboctave/array/dMatrix.cc:3023
#3  0x00007f4cf887311c in oct_binop_mul (a1=..., a2=...) at ../libinterp/operators/op-m-m.cc:63
#4  0x00007f4cf8b289b7 in do_binary_op (op=octave_value::op_mul, v1=..., v2=...) at ../libinterp/octave-value/ov.cc:2186



Dmitri.

Comment 5 Dmitri A. Sergatskov 2017-10-31 18:24:46 UTC
Also, using
libopenblaso.so (from 0.2.20-3) -- passes
libopenblasp.so                 -- hangs
libopenblas.so (default)        -- hangs

may be that give some additional clues...

Also all my computers are fairly old (i7-2600k perhaps is the newest), perhaps
it would work on a newer cpus...

Dmitri.

Comment 6 Susi Lehtola 2017-10-31 19:52:39 UTC
Not surprising: libopenblas, libopenblasp and libopenblaso all provide the same symbols, but are different libraries. It's even worse if you're also linking to some other library.

This is clearly a bug, but not in OpenBLAS. Reassigning to Octave.

I can see on my Fedora 26 computer with

$ rpm -q octave
octave-4.2.1-3.fc26.x86_64

that the linkage is all around the place

$ ldd /usr/bin/octave-cli|grep -i atla
	libtatlas.so.3 => /usr/lib64/atlas/libtatlas.so.3 (0x00007fd1b45ab000)
	libsatlas.so.3 => /usr/lib64/atlas/libsatlas.so.3 (0x00007fd1acd11000)
$ ldd /usr/bin/octave-cli|grep -i blas
	libblas.so.3 => /lib64/libblas.so.3 (0x00007fcca3aab000)
$ ldd /usr/bin/octave-cli|grep -i lapack
	liblapack.so.3 => /lib64/liblapack.so.3 (0x00007f266d0c3000)

It's not linking to two, but three(!) mutually incompatible implementations: sequential ATLAS, threaded ATLAS, as well as reference BLAS and LAPACK, all of which supply the same symbols. I'm afraid even to look at what the libraries that are linked to Octave are using, since they will further confuse the mess.

Comment 7 Dmitri A. Sergatskov 2017-10-31 20:38:50 UTC
Why do you think they are mutually incompatible implementations?
In fact you can link octave against e,g, reference blas and then override
it with LD_PRELOAD=/your/optimized/blas/library.so method.

Dmitri.

Comment 8 Susi Lehtola 2017-11-01 06:17:06 UTC
(In reply to Dmitri A. Sergatskov from comment #7)
> Why do you think they are mutually incompatible implementations?
> In fact you can link octave against e,g, reference blas and then override
> it with LD_PRELOAD=/your/optimized/blas/library.so method.

That's my experience. If you link to multiple libraries that provide the same symbols, you'll get weird behavior and crashes.

Comment 9 Ben Cotton 2018-11-27 15:19:02 UTC
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 10 Ben Cotton 2018-11-30 21:50:34 UTC
Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.