Bug 479939 - mvapich / scalapack: xzsep hangs forever
Summary: mvapich / scalapack: xzsep hangs forever
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: mvapich
Version: 5.3
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Jay Fenlason
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-01-14 07:40 UTC by Mehdi Bozzo-Rey
Modified: 2013-11-04 00:30 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-04 00:30:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mehdi Bozzo-Rey 2009-01-14 07:40:58 UTC
Description of problem:the xzsep test case (scalapack) hangs


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.install mvapich
2.recompile scalapack
3.run the xzsep test case
  
Actual results: xzsep hangs and needs to be killed


Expected results: pass


Additional info:

Note: I killed it; this test case is ok with mpich1 and openmpi

[mbozzore@compute-0-11 scalapack]$ cat xzsep.out
SCALAPACK Hermitian Eigendecomposition routines.
' '

Running tests of the parallel Hermitian eigenvalue routine:  PZHEEVX.
The following scaled residual checks will be computed:
 ||AQ - QL|| / ((abstol + ||A|| * eps) * N)
 ||Q^T*Q - I|| / (N * eps)

An explanation of the input/output parameters follows:
RESULT   : passed; or an indication of which eigen request test failed
N        : The number of rows and columns of the matrix A.
P        : The number of process rows.
Q        : The number of process columns.
NB       : The size of the square blocks the matrix A is split into.
THRESH   : If a residual value is less than THRESH, RESULT is flagged as PASSED.
         : the QTQ norm is allowed to exceed THRESH for those eigenvectors
         :  which could not be reorthogonalized for lack of workspace.
TYP      : matrix type (see pZSEPtst.f).
SUB      : Subtests (see pZSEPtst).f
CHK      : ||AQ - QL|| / ((abstol + ||A|| * eps) * N)
QTQ      : ||Q^T*Q - I||/ (N * eps)
         : when the adjusted QTQ exceeds THRESH
 the adjusted QTQ norm is printed
         : otherwise the true QTQ norm is printed
If NT>1, CHK and QTQ are the max over all eigen request tests

     N  NB   P   Q TYP SUB   WALL      CPU      CHK       QTQ    CHECK
 ----- --- --- --- --- --- -------- -------- --------- --------- -----
'TEST 1 - test tiny matrices - different process configurations'
     0   1   1   2   8   N     0.00    -1.00   0.0       0.0     PASSED   EVX
{    0,    0}:  On entry to PZSEPCHK parameter number   19 had an illegal value
Killing remote processes...DONE
Signal 2 received.
[mbozzore@compute-0-11 scalapack]$

Comment 1 Doug Ledford 2009-04-23 00:47:34 UTC
mvapich has been updated to 1.1.0-3143, the same as in ofed 1.4.1-rc3.  Since I don't have access to Scalapack, please check to see if this bug still exists with this version of mvapich.

Comment 2 Shuai Zhang 2009-09-29 06:15:27 UTC
In RH5.4 x86_64, the command is still hang.

mpirun -np 4 -machinefile /root/hosts --mca btl tcp,self ./xzsep
                                                ^^^
Command

mpirun -np 4 -machinefile /root/hosts --mca btl openib,self ./xzsep 
                                                ^^^^^^
is successful.

Comment 3 Shuai Zhang 2009-09-29 06:17:35 UTC
I just appended the test result for openmpi.

Comment 4 Shuai Zhang 2009-09-29 06:52:52 UTC
[root@rhhpc54 bin]# mpirun_rsh -ssh -np 4 -hostfile /root/hosts GFORTRAN_UNBUFFERED_ALL=yes ./xzsep
SCALAPACK Hermitian Eigendecomposition routines.
' '                                                                             
 
Running tests of the parallel Hermitian eigenvalue routine:  PZHEEVX.
The following scaled residual checks will be computed:
 ||AQ - QL|| / ((abstol + ||A|| * eps) * N)
 ||Q^T*Q - I|| / (N * eps)

An explanation of the input/output parameters follows:
RESULT   : passed; or an indication of which eigen request test failed
N        : The number of rows and columns of the matrix A.
P        : The number of process rows.
Q        : The number of process columns.
NB       : The size of the square blocks the matrix A is split into.
THRESH   : If a residual value is less than THRESH, RESULT is flagged as PASSED.
         : the QTQ norm is allowed to exceed THRESH for those eigenvectors
         :  which could not be reorthogonalized for lack of workspace.
TYP      : matrix type (see pZSEPtst.f).
SUB      : Subtests (see pZSEPtst).f
CHK      : ||AQ - QL|| / ((abstol + ||A|| * eps) * N)
QTQ      : ||Q^T*Q - I||/ (N * eps)
         : when the adjusted QTQ exceeds THRESH
 the adjusted QTQ norm is printed
         : otherwise the true QTQ norm is printed
If NT>1, CHK and QTQ are the max over all eigen request tests
 
     N  NB   P   Q TYP SUB   WALL      CPU      CHK       QTQ    CHECK
 ----- --- --- --- --- --- -------- -------- --------- --------- -----
'TEST 1 - test tiny matrices - different process configurations'                
     0   1   1   2   8   N     0.00    -1.00   0.0       0.0     PASSED   EVX  
{    0,    0}:  On entry to PZSEPCHK parameter number   19 had an illegal value
Signal 2 received.  <-------------Hang here. I have to kill the job
Killing remote processes...DONE

Comment 5 John Feeney 2013-11-04 00:30:08 UTC
This Bugzilla has been reviewed by Red Hat and is not planned on being
addressed in Red Hat Enterprise Linux 5, and therefore is being closed.
If this bug is critical to production systems, please contact your Red
Hat support representative and provide a sufficient business justification
in order to re-open it.


Note You need to log in before you can comment on or make changes to this bug.