Description of problem: NumPy uses BLAS for the dot() function only if built against ATLAS (math-atlas.sf.net), a generic BLAS library isn't sufficient. See http://scipy.org/scipy/numpy/ticket/667 In case ATLAS isn't available, dot() performance will be very poor. Here are matrix multiplication benchmark results using the numpy currently in EPEL: Double precision matrix multiplication test using NumPy. Multiplying two NxN matrices. N Gflops/s =============== 2 0.007 4 0.058 8 0.301 16 0.785 32 1.166 64 1.217 128 0.839 256 0.872 512 0.469 1024 0.215 2048 0.125 With an optimized BLAS, the equivalent Fortran code using dgemm performance: Double precision matrix multiplication test Matrix side size Matmul (Gflops/s) dgemm (Gflops/s) ========================================================= 2 0.150 0.001 4 0.303 0.005 8 0.530 0.042 16 0.829 0.245 32 1.405 1.366 64 1.669 3.533 128 2.030 5.611 256 2.369 6.792 512 2.596 7.058 1024 0.601 7.428 2048 0.565 7.766 (The matmul column is performance using the F90 MATMUL intrinsic, which is in this case the same one would get using the generic netlib BLAS library). So one can see that for large matrices, NumPy without ATLAS is just incredibly slow. Python benchmark code below: #!/usr/bin/python # Matmul benchmark in python/numpy import numpy as npy import time def mm_timing(nn): """Matrix multiplication benchmark for nxn matrices until nnxnn.""" n = 2 print "Double precision matrix multiplication test using NumPy." print "Multiplying two NxN matrices." print "" print " N Gflops/s" print "===============" while n < nn: a = npy.random.rand(n, n) b = npy.random.rand(n, n) flops = (2 * float(n) - 1) * float(n)**2 # Assuming an on average 1 gflop/s cpu, 1e9 flops takes about 1 second and # should be enough. We also do a maximum of 1e5 loops, since # for small arrays the overhead is large. loop = int(max(min(1.e9 / flops, 1e5), 1)) t1 = time.time() for i in xrange(loop): c = npy.dot(a, b) t2 = time.time() perf = flops * loop / (t2 - t1) / 1.e9 print "%4i" % n + " " + "%6.3f" % perf n *= 2 if n > nn: break mm_timing(3000)
I've not been able to locate the RHEL or EPEL package I should BuildRequire in lieu of blas-devel. Can you point it out?
Uh, seems ATLAS is not in EPEL (yet). It's in Fedora though: https://admin.fedoraproject.org/pkgdb/packages/name/atlas So I suppose this bug must be on hold until someone packages ATLAS for EPEL as well. In case you're responsible for numpy in fedora as well, it could at least be fixed there, in case it isn't already.
I'll try it out in Fedora, and I've pinged the atlas maintainer to investigate branching atlas for EPEL.
The Fedora atlas maintainer is in the process of updating atlas in fedora. When completed, I'll rebuild numpy against it there, and then atlas will be built for EL-5, and I'll rebuild numpy there as well.
Great, thanks a lot!
The atlas maintainer reports that atlas has been built for EL-5. Once it's pushed, I'll rebuild numpy.
Built for EL-5, will be pushed to testing in stable in due course.