From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7 Description of problem: The attached program shows the problem. In this program code, we are calculating the SVD of a m x n-dimensional matrix where m=5 and n=2. The program allocates space according to the minimal requirements of LAPACK User's manual. However, the problem goes away when m<=4 above. Most likely, it is an issue when m is sufficiently larger than n. The attached code is svdd-demo.c. It uses the attached header file (array.h).The datafile is given in test.dat. Upon installing LAPACK manually, this is not a problem. The RPMs are the issue: I have verified that every release from 3.0.25 through 3.0.30 presents this problem. Version-Release number of selected component (if applicable): lapack-3.0-31.fc4, lapack-devel-3.0-31.fc4 How reproducible: Always Steps to Reproduce: 1. Use the following to compile: (I use the tcsh shell) %gcc -ansi -Wall -pedantic svdd-demo.c -lm -llapack 2. ./a.out 3. Actual Results: The program always ends as: reading in matrix A from test.dat: 2.117079 2.992091 2.106365 1.970807 2.131384 3.130390 2.344235 2.667219 1.838804 3.243629 ** On entry to DGESDD parameter number 12 had an illegal value STOP 0 for m>=5 in the program (n=2). However, using the statistical package R as well as a compiled LAPACK provides the correct result. Expected Results: reading in matrix A from test.dat: 2.117079 2.992091 2.106365 1.970807 2.131384 3.130390 2.344235 2.667219 1.838804 3.243629 svdd returned 0 singular values are: 7.875120 0.769370 The left multiplier matrix, u, is: -0.465324 0.102357 -0.525449 -0.358025 -0.607221 -0.360284 -0.676273 0.043269 -0.489893 0.413500 -0.480519 0.194368 0.811234 -0.064990 -0.262676 -0.449331 -0.386186 -0.133477 0.791199 -0.071766 -0.469971 0.587587 -0.214743 0.040039 0.621410 The right multiplier matrix, v, is: -0.595001 -0.803725 -0.803725 0.595001 Additional info: My hunch is that this is a problem because of some funky optimization that has been used in the RPMS. Also, the statistical package R uses blas, but gives the correct result: hence my guess is that it is the lapack RPM that is the problem. This is a pretty SEVERE problem which also happens for RHEL4.
Created attachment 119323 [details] file svdd-demo.c
Created attachment 119324 [details] header file for svdd-demo.c
Created attachment 119325 [details] test.dat (data file for svdd-demo.c)
Let me look into this. :/
Removing optimization from the rpm does not resolve the issue, thus, it doesn't seem to be the problem.
OK, its this patch (lapack-20010525.patch). I inherited that patch with the rest of lapack when I took over maintainership in Fedora Extras. Time to figure out why it causes this failure.
Alright, figured this one out. Once upon a time (Fri May 25 2001), Trond Eivind Glomsrød (aka, the pickled fish eater), decided that for reasons unknown (or at least undocumented), he would apply all of the patches that upstream had pending for lapack/blas, but had not yet rolled into a formal release. Given that lapack hadn't had a major release since May 31, 2000, he was probably trying to patch bugs or just be better prepared for a future release. Upstream had made revised versions of specific files in the lapack/blas tree and put them here: http://www.netlib.org/lapack/release_notes.html Trond took those files, and made a patch diff from the original lapack to these new updated source files and named it lapack-20010525.patch. One of these changes is described by upstream as: 10/5/2000 Corrected error with LQUERY and setting of WORK(1) This touches (amongst other files): LAPACK/SRC/dgesdd.f When we look at the diff of that specific file, we see the following: diff -uNr LAPACK.orig/SRC/dgesdd.f LAPACK/SRC/dgesdd.f --- LAPACK.orig/SRC/dgesdd.f Thu Nov 11 20:32:31 1999 +++ LAPACK/SRC/dgesdd.f Fri May 25 16:07:58 2001 @@ -4,7 +4,8 @@ * -- LAPACK driver routine (version 3.0) -- * Univ. of Tennessee, Univ. of California Berkeley, NAG Ltd., * Courant Institute, Argonne National Lab, and Rice University -* October 31, 1999 +* June 30, 1999 +* 8-15-00: Improve consistency of WS calculations (eca) * * .. Scalar Arguments .. CHARACTER JOBZ @@ -116,16 +117,20 @@ * LWORK (input) INTEGER * The dimension of the array WORK. LWORK >= 1. * If JOBZ = 'N', -* LWORK >= 3*min(M,N) + max(max(M,N),6*min(M,N)). +* LWORK >= max(14*min(M,N)+4, 10*min(M,N)+2+ +* SMLSIZ*(SMLSIZ+8)) + max(M,N) +* where SMLSIZ is returned by ILAENV and is equal to the +* maximum size of the subproblems at the bottom of the +* computation tree (usually about 25). * If JOBZ = 'O', -* LWORK >= 3*min(M,N)*min(M,N) + -* max(max(M,N),5*min(M,N)*min(M,N)+4*min(M,N)). +* LWORK >= 5*min(M,N)*min(M,N) + max(M,N) + 9*min(M,N). * If JOBZ = 'S' or 'A' -* LWORK >= 3*min(M,N)*min(M,N) + -* max(max(M,N),4*min(M,N)*min(M,N)+4*min(M,N)). +* LWORK >= 4*min(M,N)*min(M,N) + max(M,N) + 9*min(M,N). * For good performance, LWORK should generally be larger. -* If LWORK < 0 but other input arguments are legal, WORK(1) -* returns the optimal LWORK. +* +* If LWORK = -1, a workspace query is assumed. The optimal +* size for the WORK array is calculated and stored in WORK(1), +* and no other work except argument checking is performed. * * IWORK (workspace) INTEGER array, dimension (8*min(M,N)) * It goes on, but the notable change is in those comments. It changes the expected definition of LWORK, supposedly to improve consistency of WC calculations. So, I went to your test case, and made the same change: --- svdd-demo.c.orig 2005-09-28 17:30:11.000000000 -0500 +++ svdd-demo.c 2005-09-28 17:31:15.000000000 -0500 @@ -23,8 +23,7 @@ void dgesdd_( char *jobz, int svdd(double **a, int m, int n, double *d, double **u, double **v) { double *A, *U, *VT; - int lwork =3*MIN(m,n)*MIN(m,n) - + MAX(MAX(m,n),4*MIN(m,n)*MIN(m,n)+4*MIN(m,n)); + int lwork = 4*MIN(m,n)*MIN(m,n) + MAX(m,n) + 9*MIN(m,n); int liwork = 8*MIN(m,n); char jobz = 'A'; double *work; With that change, your test case starts working again: [spot@swoop ~]$ ./svdd-demo reading in matrix A from test.dat: 2.117079 2.992091 2.106365 1.970807 2.131384 3.130390 2.344235 2.667219 1.838804 3.243629 svdd returned 0 singular values are: 7.875120 0.769370 The left multiplier matrix, u, is: -0.465324 0.102357 -0.525449 -0.358025 -0.607221 -0.360284 -0.676273 0.043269 -0.489893 0.413500 -0.480519 0.194368 0.811234 -0.064990 -0.262676 -0.449331 -0.386186 -0.133477 0.791199 -0.071766 -0.469971 0.587587 -0.214743 0.040039 0.621410 The right multiplier matrix, v, is: -0.595001 -0.803725 -0.803725 0.595001 So, this answers the question of why your test case fails, but it doesn't really make it clear what we should do. Unfortunately, whoever wrote that improvement upstream didn't sign their name, so we can't ask him or her whether its worthwhile to keep it. It doesn't look likely that lapack will ever have another major release (it hasn't had any touch since 2001). This has been the Red Hat shipped behavior for lapack since 2001, and no one has hit this before. It is possible that people are relying on this patched value for LWORK, and to revert it would break their existing configuration. So, we can keep this patch and correct the user manual, or we can revert it, and hope that it doesn't break existing users or make the WC calculations less consistent. Your feedback here would be very much encouraged. :)
Thanks very much! I have verified that it works. I have also verified that the querying for optimal lwork which is the ideal way of doing things also works. I think the patch should remain, since it corrects for real errors. If possible, the manpages (user manual could be upated with all the corrections in this patch). http://www.netlib.org/lapack/html/errata.lug http://www.netlib.org/lapack/release_notes.html After digging around, I came up with the above websites. Maybe, if possible, the errata could also be included in the manpages in a good way so that we can use lapack correctly? Thanks a lot for looking into this, and so quickly!! I am so proud of Fedora Core as a user!!
Closing out this bug.