Bug 1908117

Summary: scipy: FTBFS in Fedora rawhide
Product: [Fedora] Fedora Reporter: Marek Polacek <mpolacek>
Component: scipyAssignee: Nikola Forró <nforro>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: cstratak, david08741, mhroncok, nforro, python-sig, tomspur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-12 10:03:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1868278, 1890881    

Description Marek Polacek 2020-12-15 20:57:36 UTC
Description of problem:
scipy fails to build in F34:
https://koji.fedoraproject.org/koji/taskinfo?taskID=57522505

Version-Release number of selected component (if applicable):
scipy-1.5.4-2.fc34

Additional info:

A log is appended.  I tried rebuilding with LTO disabled, and with -O0, and with -fsanitize=undefined, but the build failed all the same.

But I noticed that re-running a failing test with --numprocesses=1 worked, so this looks like a concurrency issue to me.  Be great if someone could take a look at this; unfortunately, I couldn't boil it down to a short testcase.  This can also fail with

================================================= FAILURES ==================================================
___________________________________________ tests/test_ndimage.py ___________________________________________
[gw1] linux -- Python 3.9.1 /usr/bin/python3
worker 'gw1' crashed while running 'tests/test_ndimage.py::TestNdimage::test_map_coordinates_large_data'
========================================== short test summary info ==========================================
FAILED test_ndimage.py::TestNdimage::test_map_coordinates_large_data
================================= 1 failed, 520 passed, 1 xpassed in 13.59s =================================

and is not deterministic.

e/linalg/eigen/arpack/tests/test_arpack.py:444: AssertionError
____________________________ test_parallel_threads _____________________________
[gw0] linux -- Python 3.9.1 /usr/bin/python3
    def test_parallel_threads():
        results = []
        v0 = np.random.rand(50)
    
        def worker():
            x = diags([1, -2, 1], [-1, 0, 1], shape=(50, 50))
            w, v = eigs(x, k=3, v0=v0)
            results.append(w)
    
            w, v = eigsh(x, k=3, v0=v0)
            results.append(w)
    
        threads = [threading.Thread(target=worker) for k in range(10)]
        for t in threads:
            t.start()
        for t in threads:
            t.join()
    
        worker()
    
        for r in results:
>           assert_allclose(r, results[-1])
E           AssertionError: 
E           Not equal to tolerance rtol=1e-07, atol=0
E           
E           Mismatched elements: 3 / 3 (100%)
E           Max absolute difference: 127.01237297
E           Max relative difference: 1.14434017
E            x: array([-3.996207+0.j, -3.984841+0.j, -3.965946+0.j])
E            y: array([-131.00858 ,   27.60729 ,   87.214419])
scipy/sparse/linalg/eigen/arpack/tests/test_arpack.py:902: AssertionError
=============================== warnings summary ===============================
optimize/tests/test__linprog_clean_inputs.py: 2 warnings
optimize/tests/test_linprog.py: 8 warnings
  /usr/lib64/python3.9/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
    return array(a, dtype, copy=False, order=order)
stats/tests/test_continuous_basic.py::test_cont_basic[crystalball-arg13]
stats/tests/test_continuous_basic.py::test_cont_basic[ncf-arg74]
  /builddir/build/BUILDROOT/scipy-1.5.4-2.fc34.x86_64/usr/lib64/python3.9/site-packages/scipy/stats/_distn_infrastructure.py:2444: IntegrationWarning: The occurrence of roundoff error is detected, which prevents 
    the requested tolerance from being achieved.  The error may be 
    underestimated.
    h = integrate.quad(integ, _a, _b)[0]
stats/tests/test_continuous_basic.py::test_cont_basic[crystalball-arg13]
  /builddir/build/BUILDROOT/scipy-1.5.4-2.fc34.x86_64/usr/lib64/python3.9/site-packages/scipy/stats/_distn_infrastructure.py:2459: IntegrationWarning: The occurrence of roundoff error is detected, which prevents 
    the requested tolerance from being achieved.  The error may be 
    underestimated.
    return integrate.quad(integ, lower, upper)[0]
stats/tests/test_continuous_basic.py::test_cont_basic[crystalball-arg13]
  /usr/lib64/python3.9/site-packages/numpy/lib/function_base.py:2192: RuntimeWarning: invalid value encountered in ? (vectorized)
    outputs = ufunc(*inputs)
-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED scipy/sparse/linalg/eigen/arpack/tests/test_arpack.py::test_symmetric_no_convergence
FAILED scipy/sparse/linalg/eigen/arpack/tests/test_arpack.py::test_parallel_threads
= 2 failed, 36340 passed, 2242 skipped, 103 xfailed, 6 xpassed, 14 warnings in 250.96s (0:04:10) =
RPM build errors:
error: Bad exit status from /var/tmp/rpm-tmp.q8RZst (%check)
    Bad exit status from /var/tmp/rpm-tmp.q8RZst (%check)
Child return code was: 1
EXCEPTION: [Error()]

Comment 1 Nikola Forró 2020-12-15 21:05:23 UTC
Yes, I'm aware of this. It only happens on i686 and x86_64 (but perhaps these two tests are skipped on other architectures, I didn't check).

Comment 2 Nikola Forró 2020-12-16 19:29:42 UTC
Here is a reproducer:

python3 -c 'import numpy as np; from scipy.sparse import diags; from scipy.sparse.linalg.eigen.arpack import eigsh; w, _ = eigsh(diags([1, -2, 1], [-1, 0, 1], shape=(50, 50)), k=3, v0=np.random.rand(50)); print(w)'

This should print "[-3.99620666 -3.98484102 -3.9659462 ]", but on i686 and x86_64, with scipy compiled with gcc11, the result is quite different and not constant, it changes on each run.
So perhaps some platform-specific optimization issue in gfortran?

Comment 4 Nikola Forró 2021-01-12 10:05:27 UTC
Yes, the underlying cause has been resolved in gcc, scipy builds fine and no tests fail with gcc-11.0.0-0.12.