Description of problem: OpenBLAS, when installed through yum/dnf, should automatically be capable of running on multiple threads/cores. I know there is an `openblas-threads64` threading package, but, correct me if I'm wrong, I don't *believe* you need to have multiple cores on your system to run an OpenBLAS that is pre-built with multithreading capabilities. Version-Release number of selected component (if applicable): - openblas-0.2.20-10.fc28.x86_64 - openblas-devel-0.2.20-10.fc28.x86_64 - openblas-static-0.2.20-10.fc28.x86_64 - glibc-2.27-8.fc28.x86_64 - compat-libgfortran-41-4.1.2-53.fc28.x86_64 - gcc-gfortran-8.1.1-1.fc28.x86_64 - libgfortran-8.1.1-1.fc28.x86_64 - libgfortran-static-8.1.1-1.fc28.x86_64 How reproducible: Always Steps to Reproduce: # Prove that the standard OpenBLAS package that comes with Fedora/RHEL does not have multithreading capabilities 1. $ sudo yum install -y openblas \ openblas-devel \ openblas-static \ glibc \ compat-libgfortran\ gcc-gfortran \ libgfortran \ libgfortran-static 2. $ wget https://gist.githubusercontent.com/xianyi/5780018/raw/c1d93058a2f61b88b9dd4237d2cf4a963065070b/time_dgemm.c 3. $ gcc -o time_dgemm time_dgemm.c /usr/lib64/libopenblas.a -lpthread 4. $ time OPENBLAS_NUM_THREADS=1 ./time_dgemm 10000 10000 10000 test! m=10000,n=10000,k=10000,alpha=1.200000,beta=0.001000,sizeofc=100000000 real 1m5.280s user 1m4.231s sys 0m0.727s 5. $ time OPENBLAS_NUM_THREADS=24 ./time_dgemm 10000 10000 10000 test! m=10000,n=10000,k=10000,alpha=1.200000,beta=0.001000,sizeofc=100000000 real 1m6.876s user 1m5.476s sys 0m0.956s # Prove that a custom built multithreaded OpenBLAS is significantly faster 1. $ git clone https://github.com/xianyi/OpenBLAS.git 2. $ cd OpenBLAS 3. $ make FC=gfortran #this will take a bit 4. $ sudo make PREFIX=/opt/OpenBLAS install 5. $ cd /path/to/dgemm.c 6. $ gcc -o time_dgemm time_dgemm.c /opt/OpenBLAS/lib/libopenblas.a -lpthread 7. $ time OPENBLAS_NUM_THREADS=1 ./time_dgemm 10000 10000 10000 test! m=10000,n=10000,k=10000,alpha=1.200000,beta=0.001000,sizeofc=100000000 real 1m1.896s user 1m1.006s sys 0m0.669s 8. $ time OPENBLAS_NUM_THREADS=24 ./time_dgemm 10000 10000 10000 test! m=10000,n=10000,k=10000,alpha=1.200000,beta=0.001000,sizeofc=100000000 real 0m43.365s user 2m30.888s sys 0m1.171s Actual results: See above. We do not save any performance time using multiple threads with the Fedora/RHEL-installed OpenBLAS library. Thus, I am led to believe that the Fedora/RHEL OpenBLAS package is *not* multithreaded. However, if we build OpenBLAS from source with multithreading capabilities, we can save performance time when running code compiled with our custom OpenBLAS. Expected results: Adjusting the number of threads used by the Fedora/RHEL OpenBLAS should result in different performance times. Thus, we should build OpenBLAS with multithreading capabilities and use that version of OpenBLAS as the standard OpenBLAS package that comes with Fedora/RHEL Additional info: None
Don't use the sequential library if you want the parallel one. Use -lopenblaso for OpenMP and -lopenblasp for pthreads.