From Bugzilla Helper:
User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.4.2-2smp i686)
Description of problem:
Currently octave uses non-optimised BLAS library though it can
use optimised one (ATLAS).
Steps to Reproduce:
1.Since it is not a bug, but a feature request there are
2. no steps to reproduce
Actual Results: It runs slow.
Expected Results: It runs fast
Debian 2.1 has ATLS enabled Octave.
It's hard to do properly... ATLAS is very tied into your specific machine (it
optimizes for a specific cache size, to give one example). So you'll end up with
a non-optimal installation.
I've looked closer into it... it detects SSE, L1 and L2 cache etc. during build,
and not when run. If you have any good solutions, I'm interested, but currently,
I think that the best approach would be to change the code from detecting at
compiletime to detecting at runtime.
Matlab 6 (the current version) has ATLAS as a shared library. There are 4
different version optimized for different archs (PPro, PII, PIII, Athlon).
When run it tries to figure out arch and use an appropriate library.
I do not have Debian machine handy, so do not know how Debian does it
for octave, but I know that Debian as ATLAS as shared library as well
(this is somewhat special since ATLAS team do not support shared libraries).
For Matlab, I tried to build a custom library for AMD-K6-III and the results was
not that much better then any of supplied libraries -- in fact the scatter
between those 4 was withing 10% or so. I assume I could have done better
hand-tuning cache edge etc..., but I have not done so.
I am also wondering how much work would it be to make an SRPM that would
rebuild to a platform specific version. This way RPM could have been done
for say PIII and people who want better will have to rebuild SRPMS.
Actually, if ATLS build as shared library, then it can be a different package
and one should try to optimize it. Octave package can be generic.
Building as a shared library isn't a problem (I did that for LAPACK/BLAS many
years ago, when I was studying - you can find some of the entries in the
official one at netlib), but having one optimized for everyone is. I'm not fond
of the idea of shipping multiple libraries.
No. I suggest picking one arch (say PIII) and ship ATLAS RPM optimized for it.
Ideally, rebuilding SRPMS on particular architecture should result
in library optimized for this particular architecture. So you ship
_suboptimal_ library, but I think it is still better than plain lapack.
And people who care can try to optimize further as long as they want.
Well, if you can make a nice package (preferably integraded in the current
lapack/blas one) I'll be very interested :).