Bug 1709161

Summary: SystemError in scipy.linalg.solve() with sym_pos=True
Product: [Fedora] Fedora Reporter: Susi Lehtola <susi.lehtola>
Component: scipyAssignee: Thomas Spura <tomspur>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 30CC: cstratak, orion, python-sig, quantum.analyst, tomspur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-26 16:16:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1707957    

Description Susi Lehtola 2019-05-13 06:31:30 UTC
Trying to run the code

import numpy
import scipy.linalg
A = numpy.array([[1., 0.65829205],
 [0.65829205, 1.        ]])
b = numpy.array([[0.89915184],
 [0.91979157]])
x = scipy.linalg.solve(A, b, sym_pos=True)

in
$ rpm -q python3-scipy
python3-scipy-1.2.0-1.fc30.x86_64

I get the error

>>> x = scipy.linalg.solve(A, b, sym_pos=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.7/site-packages/scipy/linalg/basic.py", line 249, in solve
    overwrite_b=overwrite_b)
SystemError: <fortran object> returned NULL without setting an error

This obviously shouldn't be happening; the system is extremely well-conditioned, the matrix having eigenvalues 0.34 and 1.66.

If I set sym_pos=False, the code works as expected.

Comment 1 Susi Lehtola 2019-05-13 06:35:22 UTC
I also see that scipy is linked to

libopenblas.so.0()(64bit)
libopenblasp.so.0()(64bit)

Note that this is wrong: linking both to the sequential as well as the pthreads version is bound to lead to problematic behavior. Furthermore, pthreads doesn't play well with OpenMP. If you want to use a parallel version of OpenBLAS, link with  -lopenblaso.

Comment 2 Orion Poplawski 2019-05-16 02:12:18 UTC
Apparently there is some weirdness with the numpy/scipy configuration system.  But the following:

diff --git a/numpy.spec b/numpy.spec
index bdd18d1..59a9ebf 100644
--- a/numpy.spec
+++ b/numpy.spec
@@ -137,8 +137,8 @@ This package provides the complete documentation for NumPy.
 # Use openblas pthreads as recommended by upstream (see comment in site.cfg.example)
 cat >> site.cfg <<EOF
 [openblas]
+libraries = openblasp
 library_dirs = %{_libdir}
-openblas_libs = openblasp
 EOF
 %else
 # Atlas 3.10 library names

does appear to make numpy link only to -lopenblasp (though it is still duplicated on the link lines: -lopenblasp -lopenblas) rather than "-lopenblasp -lopenblas".

I can't speak to openblasp vs openblaso.  There appears to be some level of cargo-cult knowledge going on here.  There appears to have been issues with with OpenMP version of openblas and numpy, but this may have been resolved long ago.

Comment 3 Elliott Sales de Andrade 2019-05-16 06:42:26 UTC
*** Bug 1687120 has been marked as a duplicate of this bug. ***

Comment 4 Susi Lehtola 2019-05-16 10:14:23 UTC
(In reply to Orion Poplawski from comment #2)
> I can't speak to openblasp vs openblaso.  There appears to be some level of
> cargo-cult knowledge going on here.  There appears to have been issues with
> with OpenMP version of openblas and numpy, but this may have been resolved
> long ago.

Right you are, numpy is also using the wrong version, see BZ #1710788.

Comment 5 Susi Lehtola 2019-05-16 10:30:17 UTC
(In reply to Orion Poplawski from comment #2)
> does appear to make numpy link only to -lopenblasp (though it is still
> duplicated on the link lines: -lopenblasp -lopenblas) rather than
> "-lopenblasp -lopenblas".

Which may be exactly the problem. It should only link to -lopenblasp, not -lopenblas which is the sequential version of the library. Looks like you need to hack the build system to remove the spurious duplicate linkage.

Comment 6 Orion Poplawski 2019-05-16 14:01:01 UTC
Sorry, I did a typo there - with the above change the link changes to: -lopenblasp -lopenblasp  
which should be fine.  I'd like to get some consensus on openblasp vs openblaso before making a change though.

Comment 7 Susi Lehtola 2019-05-16 14:04:06 UTC
openblasp might be fine on second thought, as it appears that at least numpy has nothing that is OpenMP parallel. It also appears that numpy upstream recommended the pthreads version of OpenBLAS over the OpenMP one several years ago.

Comment 8 Elliott Sales de Andrade 2019-07-23 07:07:38 UTC
This seems to have been fixed somehow in Rawhide, in that I can build dask 2.1.0 now. But it seems to still be broken in Fedora 30.

Comment 9 Susi Lehtola 2019-07-23 09:53:52 UTC
Rawhide has a newer version of scipy.

Comment 10 Susi Lehtola 2019-07-23 09:55:53 UTC
... maybe they have also fixed the gcc bug described in https://lwn.net/Articles/791393/ in rawhide but not F30...

Comment 11 Ben Cotton 2020-04-30 21:30:39 UTC
This message is a reminder that Fedora 30 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '30'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 30 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 12 Ben Cotton 2020-05-26 16:16:30 UTC
Fedora 30 changed to end-of-life (EOL) status on 2020-05-26. Fedora 30 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.