Bug 1589823

Summary: The OpenBLAS package that comes with RHEL does not have multithreading capabilities
Product: [Fedora] Fedora Reporter: Courtney Pacheco <cpacheco>
Component: openblasAssignee: Susi Lehtola <susi.lehtola>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: 28CC: prd-fedora, susi.lehtola
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-11 13:05:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Courtney Pacheco 2018-06-11 12:58:20 UTC
Description of problem:
OpenBLAS, when installed through yum/dnf, should automatically be capable of running on multiple threads/cores. I know there is an `openblas-threads64` threading package, but, correct me if I'm wrong, I don't *believe* you need to have multiple cores on your system to run an OpenBLAS that is pre-built with multithreading capabilities. 

Version-Release number of selected component (if applicable): 
- openblas-0.2.20-10.fc28.x86_64
- openblas-devel-0.2.20-10.fc28.x86_64
- openblas-static-0.2.20-10.fc28.x86_64
- glibc-2.27-8.fc28.x86_64
- compat-libgfortran-41-4.1.2-53.fc28.x86_64
- gcc-gfortran-8.1.1-1.fc28.x86_64
- libgfortran-8.1.1-1.fc28.x86_64
- libgfortran-static-8.1.1-1.fc28.x86_64

How reproducible: Always

Steps to Reproduce:

# Prove that the standard OpenBLAS package that comes with Fedora/RHEL does not have multithreading capabilities
1. $ sudo yum install -y openblas \
                         openblas-devel \
                         openblas-static \
                         glibc \
                         compat-libgfortran\
                         gcc-gfortran \
                         libgfortran \
                         libgfortran-static

2. $ wget https://gist.githubusercontent.com/xianyi/5780018/raw/c1d93058a2f61b88b9dd4237d2cf4a963065070b/time_dgemm.c

3. $ gcc -o time_dgemm time_dgemm.c /usr/lib64/libopenblas.a -lpthread

4. $ time OPENBLAS_NUM_THREADS=1 ./time_dgemm 10000 10000 10000
test!
m=10000,n=10000,k=10000,alpha=1.200000,beta=0.001000,sizeofc=100000000

real	1m5.280s
user	1m4.231s
sys	0m0.727s

5. $ time OPENBLAS_NUM_THREADS=24 ./time_dgemm 10000 10000 10000
test!
m=10000,n=10000,k=10000,alpha=1.200000,beta=0.001000,sizeofc=100000000

real	1m6.876s
user	1m5.476s
sys	0m0.956s

# Prove that a custom built multithreaded OpenBLAS is significantly faster
1. $ git clone https://github.com/xianyi/OpenBLAS.git

2. $ cd OpenBLAS

3. $ make FC=gfortran #this will take a bit

4. $ sudo make PREFIX=/opt/OpenBLAS install

5. $ cd /path/to/dgemm.c

6. $ gcc -o time_dgemm time_dgemm.c /opt/OpenBLAS/lib/libopenblas.a -lpthread

7. $ time OPENBLAS_NUM_THREADS=1 ./time_dgemm 10000 10000 10000
test!
m=10000,n=10000,k=10000,alpha=1.200000,beta=0.001000,sizeofc=100000000

real	1m1.896s
user	1m1.006s
sys	0m0.669s

8. $ time OPENBLAS_NUM_THREADS=24 ./time_dgemm 10000 10000 10000
test!
m=10000,n=10000,k=10000,alpha=1.200000,beta=0.001000,sizeofc=100000000

real	0m43.365s
user	2m30.888s
sys	0m1.171s

Actual results:
See above. We do not save any performance time using multiple threads with the Fedora/RHEL-installed OpenBLAS library. Thus, I am led to believe that the Fedora/RHEL OpenBLAS package is *not* multithreaded. However, if we build OpenBLAS from source with multithreading capabilities, we can save performance time when running code compiled with our custom OpenBLAS.

Expected results:
Adjusting the number of threads used by the Fedora/RHEL OpenBLAS should result in different performance times. Thus, we should build OpenBLAS with multithreading capabilities and use that version of OpenBLAS as the standard OpenBLAS package that comes with Fedora/RHEL

Additional info:
None

Comment 1 Susi Lehtola 2018-06-11 13:05:35 UTC
Don't use the sequential library if you want the parallel one. Use -lopenblaso for OpenMP and -lopenblasp for pthreads.