Bug 311551

Summary: openmpi programs over tcp layer with more than one 1bX configured fails to start
Product: Red Hat Enterprise Linux 5 Reporter: Gurhan Ozen <gozen>
Component: openmpiAssignee: Doug Ledford <dledford>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 5.1CC: jburke
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-01 15:53:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gurhan Ozen 2007-09-28 18:58:37 UTC
Description of problem:
 When an openmpi program is run over tcp layer where there is more than one ibX
interface configured, the programs fail to start up properly:

# mpirun -np 2 -host
intel-s6e4533-01-mm.rhts.boston.redhat.com,ibm-ridgeback.rhts.boston.redhat.com
--mca btl tcp,self /usr/bin/mpitests-com 

################################################################################

  com Point-to-Point MPI Bandwidth and Latency Benchmark
  Version 1.4.0
  Run at 09/25/07  15:30:48, with rank 0 on
intel-s6e4533-01-mm.rhts.boston.redhat.com

################################################################################

  Test            Processes  Op Size (bytes)    Ops       BW (MB)
 -----------------------------------------------------------------
^C
mpirun: killing job...

mpirun noticed that job rank 0 with PID 6099 on node
intel-s6e4533-01-mm.rhts.boston.redhat.com exited on signal 15 (Terminated). 
1 additional process aborted (not shown)

# ifconfig 
eth0      Link encap:Ethernet  HWaddr 00:14:5E:57:08:71  
          inet addr:10.12.4.159  Bcast:10.12.7.255  Mask:255.255.252.0
          inet6 addr: fe80::214:5eff:fe57:871/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:14355409 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12087263 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:16347095408 (15.2 GiB)  TX bytes:10128450730 (9.4 GiB)
          Interrupt:185 Memory:e4000000-e4012100 

ib0       Link encap:InfiniBand  HWaddr
00:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  
          inet addr:192.168.1.40  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c902:0:6645/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:13268797 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8092611 errors:0 dropped:18 overruns:0 carrier:0
          collisions:0 txqueuelen:128 
          RX bytes:24119163468 (22.4 GiB)  TX bytes:11740527791 (10.9 GiB)

ib1       Link encap:InfiniBand  HWaddr
00:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  
          inet addr:192.168.1.41  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c902:0:6646/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:95306 errors:0 dropped:0 overruns:0 frame:0
          TX packets:64 errors:0 dropped:17 overruns:0 carrier:0
          collisions:0 txqueuelen:128 
          RX bytes:5011359 (4.7 MiB)  TX bytes:16237 (15.8 KiB)
# ibstat
CA 'mthca0'
        CA type: MT23108
        Number of ports: 2
        Firmware version: 3.5.0
        Hardware version: a1
        Node GUID: 0x0002c90200006644
        System image GUID: 0x0002c9000100d050
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 10
                Base lid: 5
                LMC: 0
                SM lid: 11
                Capability mask: 0x02510a68
                Port GUID: 0x0002c90200006645
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 10
                Base lid: 6
                LMC: 0
                SM lid: 11
                Capability mask: 0x02510a68
                Port GUID: 0x0002c90200006646

So , both ports are active and they each have an ib interface configured to it
when this fails.. Taking down one of the interfaces seem to work:

# ifdown ib1
[root@ibm-ridgeback SPECS]# mpirun -np 2 -host
intel-s6e4533-01-mm.rhts.boston.redhat.com,ibm-ridgeback.rhts.boston.redhat.com
--mca btl tcp,self /usr/bin/mpitests-com 

################################################################################

  com Point-to-Point MPI Bandwidth and Latency Benchmark
  Version 1.4.0
  Run at 09/25/07  15:35:11, with rank 0 on
intel-s6e4533-01-mm.rhts.boston.redhat.com

################################################################################

  Test            Processes  Op Size (bytes)    Ops       BW (MB)
 -----------------------------------------------------------------
  Unidirectional          2               32     10         2.853
  Unidirectional          2               64     10         4.672
  Unidirectional          2              128     10         8.285
  Unidirectional          2              256     10        14.226
  .....

Note that it works when there is only one interface, doesn't matter which one. I
tried with both ib0 and ib1.

Version-Release number of selected component (if applicable):
# rpm -qa | egrep "openmpi|openib" | sort | uniq openib-1.2-6.el5
openib-debuginfo-1.2-6.el5
openib-diags-1.2.7-6.el5
openib-mstflint-1.2-6.el5
openib-perftest-1.2-6.el5
openib-srptools-0.0.6-6.el5
openib-tvflash-0.9.2-6.el5
openmpi-1.2.3-4.el5
openmpi-debuginfo-1.2.3-4.el5
openmpi-devel-1.2.3-4.el5
openmpi-libs-1.2.3-4.el5



How reproducible:


Steps to Reproduce:
1. Have a dual port HCA with both ports active.
2. Have an ibX interface for each port and make sure that they are up
3. Try to run an mpitests-* program over tcp transport layer with --mca btl
tcp,self .
  
Actual results:
Program stalls.

Expected results:
should work.

Additional info:

Comment 1 Doug Ledford 2007-10-01 15:53:57 UTC
I suspect that this is a local configuration issue.  With two ib interfaces up
I'm still perfectly able to run an mpi job over tcp.
[root@pe840 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:15:C5:F6:00:FE  
          inet addr:192.168.33.125  Bcast:192.168.35.255  Mask:255.255.252.0
          inet6 addr: 2002:a00:0:1:215:c5ff:fef6:fe/64 Scope:Global
          inet6 addr: fe80::215:c5ff:fef6:fe/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1398620 errors:0 dropped:0 overruns:0 frame:0
          TX packets:196291 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:263800471 (251.5 MiB)  TX bytes:23703692 (22.6 MiB)
          Interrupt:169 

ib0       Link encap:InfiniBand  HWaddr
00:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  
          inet addr:10.250.2.254  Bcast:10.250.2.255  Mask:255.255.255.0
          inet6 addr: fe80::205:ad00:3:491/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:3550868 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3962304 errors:0 dropped:9 overruns:0 carrier:0
          collisions:0 txqueuelen:128 
          RX bytes:6146383878 (5.7 GiB)  TX bytes:6303006983 (5.8 GiB)

ib1       Link encap:InfiniBand  HWaddr
00:00:04:05:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  
          inet addr:10.250.3.254  Bcast:10.250.3.255  Mask:255.255.255.0
          inet6 addr: fe80::205:ad00:3:492/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:3366691 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3763472 errors:0 dropped:9 overruns:0 carrier:0
          collisions:0 txqueuelen:128 
          RX bytes:6130123368 (5.7 GiB)  TX bytes:6165782998 (5.7 GiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:688 errors:0 dropped:0 overruns:0 frame:0
          TX packets:688 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:82139 (80.2 KiB)  TX bytes:82139 (80.2 KiB)

[root@pe840 ~]# mpirun -np 2 -host ib0test1,ib0test2 -mca btl tcp,self
/usr/bin/mpitests-com 

################################################################################

  com Point-to-Point MPI Bandwidth and Latency Benchmark
  Version 1.4.0
  Run at 10/01/07  11:48:34, with rank 0 on ibtest1.test.redhat.com

################################################################################

  Test            Processes  Op Size (bytes)    Ops       BW (MB)
 -----------------------------------------------------------------
  Unidirectional          2               32     10         2.556
  Unidirectional          2               64     10         5.627
  Unidirectional          2              128     10        11.558
  Unidirectional          2              256     10        22.526
  Unidirectional          2              512     10        52.022
  Unidirectional          2             1024     10        94.917
  Unidirectional          2             2048     10       111.286
  Unidirectional          2             4096     10       161.933
  Unidirectional          2             8192     10       203.260
  Unidirectional          2            16384     10       242.796
  Unidirectional          2            32768     10       339.351
  Unidirectional          2            65536     10       257.030
  Unidirectional          2           131072     10       303.131
  Unidirectional          2           262144     10       349.909
  Unidirectional          2           524288     10       357.937
  Unidirectional          2          1048576     10       357.985
  Unidirectional          2          2097152     10       366.657
  Unidirectional          2          4194304     10       379.115
  Unidirectional          2          8388608     10       381.135

  Test            Processes  Op Size (bytes)    Ops       BW (MB)
 -----------------------------------------------------------------
  Bidirectional           2               32     10         1.985
  Bidirectional           2               64     10         3.360
  Bidirectional           2              128     10         7.645
  Bidirectional           2              256     10        14.463
  Bidirectional           2              512     10        29.998
  Bidirectional           2             1024     10        50.770
  Bidirectional           2             2048     10        64.068
  Bidirectional           2             4096     10        85.793
  Bidirectional           2             8192     10       107.259
  Bidirectional           2            16384     10       118.359
  Bidirectional           2            32768     10       122.310
  Bidirectional           2            65536     10       100.895
  Bidirectional           2           131072     10       122.261
  Bidirectional           2           262144     10       131.490
  Bidirectional           2           524288     10       133.315
  Bidirectional           2          1048576     10       130.882
  Bidirectional           2          2097152     10       132.207
  Bidirectional           2          4194304     10       134.658
  Bidirectional           2          8388608     10       143.842

  Test            Processes  Op Size (bytes)    Ops  Latency (us)
 -----------------------------------------------------------------
  Latency                 2                0     10        49.591

  Max Unidirectional  Bandwidth :         381.13 for message size of 8388608 bytes
  Max Bidirectional   Bandwidth :         143.84 for message size of 8388608 bytes

################################################################################

Test Parameters
---------------

Process pair allocation              : block
MB size for BW calculation           : 1000000

Barrier not included in measurement.

Bandwidth calculated as sum of process bandwidths.

MPI_Wtick returns           0.000001000
MPI_Wtime overhead          0.000000318

################################################################################
[root@pe840 ~]#