Red Hat Bugzilla – Bug 479939

mvapich / scalapack: xzsep hangs forever

Last modified: 2013-11-03 19:30:08 EST

Description of problem:the xzsep test case (scalapack) hangs Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1.install mvapich 2.recompile scalapack 3.run the xzsep test case Actual results: xzsep hangs and needs to be killed Expected results: pass Additional info: Note: I killed it; this test case is ok with mpich1 and openmpi [mbozzore@compute-0-11 scalapack]$ cat xzsep.out SCALAPACK Hermitian Eigendecomposition routines. ' ' Running tests of the parallel Hermitian eigenvalue routine: PZHEEVX. The following scaled residual checks will be computed: ||AQ - QL|| / ((abstol + ||A|| * eps) * N) ||Q^T*Q - I|| / (N * eps) An explanation of the input/output parameters follows: RESULT : passed; or an indication of which eigen request test failed N : The number of rows and columns of the matrix A. P : The number of process rows. Q : The number of process columns. NB : The size of the square blocks the matrix A is split into. THRESH : If a residual value is less than THRESH, RESULT is flagged as PASSED. : the QTQ norm is allowed to exceed THRESH for those eigenvectors : which could not be reorthogonalized for lack of workspace. TYP : matrix type (see pZSEPtst.f). SUB : Subtests (see pZSEPtst).f CHK : ||AQ - QL|| / ((abstol + ||A|| * eps) * N) QTQ : ||Q^T*Q - I||/ (N * eps) : when the adjusted QTQ exceeds THRESH the adjusted QTQ norm is printed : otherwise the true QTQ norm is printed If NT>1, CHK and QTQ are the max over all eigen request tests N NB P Q TYP SUB WALL CPU CHK QTQ CHECK ----- --- --- --- --- --- -------- -------- --------- --------- ----- 'TEST 1 - test tiny matrices - different process configurations' 0 1 1 2 8 N 0.00 -1.00 0.0 0.0 PASSED EVX { 0, 0}: On entry to PZSEPCHK parameter number 19 had an illegal value Killing remote processes...DONE Signal 2 received. [mbozzore@compute-0-11 scalapack]$

mvapich has been updated to 1.1.0-3143, the same as in ofed 1.4.1-rc3. Since I don't have access to Scalapack, please check to see if this bug still exists with this version of mvapich.

In RH5.4 x86_64, the command is still hang. mpirun -np 4 -machinefile /root/hosts --mca btl tcp,self ./xzsep ^^^ Command mpirun -np 4 -machinefile /root/hosts --mca btl openib,self ./xzsep ^^^^^^ is successful.

I just appended the test result for openmpi.

[root@rhhpc54 bin]# mpirun_rsh -ssh -np 4 -hostfile /root/hosts GFORTRAN_UNBUFFERED_ALL=yes ./xzsep SCALAPACK Hermitian Eigendecomposition routines. ' ' Running tests of the parallel Hermitian eigenvalue routine: PZHEEVX. The following scaled residual checks will be computed: ||AQ - QL|| / ((abstol + ||A|| * eps) * N) ||Q^T*Q - I|| / (N * eps) An explanation of the input/output parameters follows: RESULT : passed; or an indication of which eigen request test failed N : The number of rows and columns of the matrix A. P : The number of process rows. Q : The number of process columns. NB : The size of the square blocks the matrix A is split into. THRESH : If a residual value is less than THRESH, RESULT is flagged as PASSED. : the QTQ norm is allowed to exceed THRESH for those eigenvectors : which could not be reorthogonalized for lack of workspace. TYP : matrix type (see pZSEPtst.f). SUB : Subtests (see pZSEPtst).f CHK : ||AQ - QL|| / ((abstol + ||A|| * eps) * N) QTQ : ||Q^T*Q - I||/ (N * eps) : when the adjusted QTQ exceeds THRESH the adjusted QTQ norm is printed : otherwise the true QTQ norm is printed If NT>1, CHK and QTQ are the max over all eigen request tests N NB P Q TYP SUB WALL CPU CHK QTQ CHECK ----- --- --- --- --- --- -------- -------- --------- --------- ----- 'TEST 1 - test tiny matrices - different process configurations' 0 1 1 2 8 N 0.00 -1.00 0.0 0.0 PASSED EVX { 0, 0}: On entry to PZSEPCHK parameter number 19 had an illegal value Signal 2 received. <-------------Hang here. I have to kill the job Killing remote processes...DONE

This Bugzilla has been reviewed by Red Hat and is not planned on being addressed in Red Hat Enterprise Linux 5, and therefore is being closed. If this bug is critical to production systems, please contact your Red Hat support representative and provide a sufficient business justification in order to re-open it.