Description of problem: Version-Release number of selected component (if applicable): How reproducible: always Steps to Reproduce: 1. Install python3-numpy on Fedora 33 2. Run python3 script https://gist.githubusercontent.com/markus-beuckelmann/8bc25531b11158431a5b09a45abd6276/raw/660904cb770197c3c841ab9b7084657b1aea5f32/numpy-benchmark.py 3. Note down the execution times # Now change backend to openblas-threads (as this one is used directly in F32) 4. Install flexiblas-openblas-threads 5. Change backend in /etc/flexiblasrc to "openblas-threads" 6. Run the Python script 2 again 7. Note down execution times 8. Install python3-numpy on F32 or build on F33 directly against threaded openblas 9. Run script of step 2 again 10. Note down execution times 11. Compare results Actual results: Some operations like eigendecompostition are much slower with flexiblas in between F33 with flexiblas and openblas-threads backend: Dotted two 4096x4096 matrices in 3.36 s. Dotted two vectors of length 524288 in 0.75 ms. SVD of a 2048x1024 matrix in 10.47 s. Cholesky decomposition of a 2048x2048 matrix in 1.62 s. Eigendecomposition of a 2048x2048 matrix in 51.59 s. This was obtained using the following Numpy configuration: blas_mkl_info: NOT AVAILABLE blis_info: NOT AVAILABLE openblas_info: libraries = ['flexiblas', 'flexiblas'] library_dirs = ['/usr/lib64'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/usr/lib64'] blas_opt_info: libraries = ['flexiblas', 'flexiblas'] library_dirs = ['/usr/lib64'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/usr/lib64'] lapack_mkl_info: NOT AVAILABLE openblas_lapack_info: libraries = ['flexiblas', 'flexiblas'] library_dirs = ['/usr/lib64'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/usr/lib64'] lapack_opt_info: libraries = ['flexiblas', 'flexiblas'] library_dirs = ['/usr/lib64'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/usr/lib64'] F33 with python3-numpy linked against libopenblasp directly (or F32 default, very similar): Dotted two 4096x4096 matrices in 2.95 s. Dotted two vectors of length 524288 in 0.57 ms. SVD of a 2048x1024 matrix in 1.70 s. Cholesky decomposition of a 2048x2048 matrix in 0.22 s. Eigendecomposition of a 2048x2048 matrix in 14.05 s. This was obtained using the following Numpy configuration: blas_mkl_info: NOT AVAILABLE blis_info: NOT AVAILABLE openblas_info: libraries = ['openblasp', 'openblasp'] library_dirs = ['/usr/lib64'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/usr/lib64'] blas_opt_info: libraries = ['openblasp', 'openblasp'] library_dirs = ['/usr/lib64'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/usr/lib64'] lapack_mkl_info: NOT AVAILABLE openblas_lapack_info: libraries = ['openblasp', 'openblasp'] library_dirs = ['/usr/lib64'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/usr/lib64'] lapack_opt_info: libraries = ['openblasp', 'openblasp'] library_dirs = ['/usr/lib64'] language = c define_macros = [('HAVE_CBLAS', None)] runtime_library_dirs = ['/usr/lib64'] Expected results: Performance is similar with and without flexiblas
It looks like SVD and Eigendecomposition run on one core only with FlexiBLAS (both OpenMP and Threads version) while other operations like matrix multiplication run on all cores with FlexiBLAS too.
Confirmed here, thanks for the report. I've opened an issue upstream: https://github.com/mpimd-csc/flexiblas/issues/7
The issue has been identified and a fix is underway.
Tried new release 3.0.4 (easy to rebuild, no spec changes except version required), this fixes the issue :)
FEDORA-2020-cd5d97c1e4 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-cd5d97c1e4
FEDORA-2020-cd5d97c1e4 has been pushed to the Fedora 33 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-cd5d97c1e4` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-cd5d97c1e4 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2020-cd5d97c1e4 has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report.