Bug 479938 - openibd not restarted after IB libs installed triggers opensm failure
Description Mehdi Bozzo-Rey 2009-01-14 02:37:41 EST
Description of problem: when installing IB rpms using yum, openibd is not restarted and will prevent opensmd to start

Version-Release number of selected component (if applicable): rh53rc2

How reproducible: a

Steps to Reproduce:
1. install IB libs needed
2. start opensm
Actual results: fails to start

Expected results: start

Additional info:

[root@compute-0-11 ~]# service opensmd start
Starting IB Subnet Manager:                                [FAILED]
[root@compute-0-11 ~]# cat /sys/class/infiniband/mthca0/ports/1/state
[root@compute-0-11 ~]# cat /var/log/opensm.log
Jan 12 23:40:47 138300 [6626B670] 0x03 -> OpenSM 3.2.2
OpenSM 3.2.2
 Reading Cached Option File: /etc/ofed/opensm.conf
Command Line Arguments:
 Daemon mode
 Max wire smp's = 2147483647
 Priority = 15
 Log File: /var/log/opensm.log
OpenSM 3.2.2

Jan 12 23:40:47 138759 [6626B670] 0x80 -> OpenSM 3.2.2
Jan 12 23:40:47 139090 [6626B670] 0x01 -> osm_vendor_init: ERR 5415: Error opening UMAD
ibwarn: [26473] umad_init: can't read ABI version from /sys/class/infiniband_mad/abi_version (No such file or directory): is ib_umad module loaded?
Entering DISCOVERING state

Jan 12 23:40:47 139136 [6626B670] 0x02 -> osm_vendor_init: 1000 pending umads specified
Jan 12 23:40:47 139340 [6626B670] 0x80 -> Entering DISCOVERING state
Jan 12 23:40:47 142438 [6626B670] 0x02 -> osm_vendor_bind: Binding to port 0x2c9020023db0d
Jan 12 23:40:47 150748 [6626B670] 0x01 -> osm_vendor_open_port: ERR 542C: umad_open_port() failed
Jan 12 23:40:47 150769 [6626B670] 0x01 -> osm_vendor_bind: ERR 5424: Unable to open port 0x2c9020023db0d
Jan 12 23:40:47 150779 [6626B670] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Jan 12 23:40:47 150787 [6626B670] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Jan 12 23:40:47 150810 [6626B670] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Using default GUID 0x2c9020023db0d

Error from osm_opensm_bind (0x2A)
Perhaps another instance of OpenSM is already running
Exiting SM

Jan 12 23:40:47 208960 [6626B670] 0x80 -> Exiting SM
[root@compute-0-11 ~]# service openibd restart
Unloading OpenIB kernel modules:                           [  OK  ]
Loading OpenIB kernel modules:                             [  OK  ]
[root@compute-0-11 ~]# service opensmd start
Starting IB Subnet Manager:                                [  OK  ]
[root@compute-0-11 ~]#

what was installed (on top of a manual install) prior to the opensm startup:

Installing for dependencies:
 compat-dapl                     x86_64            2.0.13-4.el5                             rhel53            112 k
 compat-dapl-devel               x86_64            2.0.13-4.el5                             rhel53             30 k
 dapl                            x86_64            2.0.13-4.el5                             rhel53            149 k
 dapl-utils                      x86_64            2.0.13-4.el5                             rhel53             93 k
 ibutils                         x86_64            1.2-9.el5                                rhel53            1.7 M
 ibutils-libs                    x86_64            1.2-9.el5                                rhel53            1.2 M
 infiniband-diags                x86_64            1.4.1-2.el5                              rhel53            174 k
 libcxgb3                        x86_64            1.2.2-1.el5                              rhel53             14 k
 libibcm                         x86_64            1.0.3-1.el5                              rhel53             18 k
 libibcommon                     x86_64            1.1.1-1.el5                              rhel53             21 k
 libibcommon-devel               x86_64            1.1.1-1.el5                              rhel53            4.8 k
 libibmad                        x86_64            1.2.1-1.el5                              rhel53             45 k
 libibumad                       x86_64            1.2.1-1.el5                              rhel53             50 k
 libibumad-devel                 x86_64            1.2.1-1.el5                              rhel53            5.2 k
 libibverbs                      x86_64            1.1.2-1.el5                              rhel53             43 k
 libibverbs-devel                x86_64            1.1.2-1.el5                              rhel53             56 k
 libibverbs-utils                x86_64            1.1.2-1.el5                              rhel53             38 k
 libipathverbs                   x86_64            1.1-11.el5                               rhel53             12 k
 libipathverbs-static            x86_64            1.1-11.el5                               rhel53            8.8 k
 libmlx4                         x86_64            1.0-4.el5                                rhel53             24 k
 libmlx4-static                  x86_64            1.0-4.el5                                rhel53             16 k
 libmthca                        x86_64            1.0.5-1.el5                              rhel53             33 k
 libmthca-static                 x86_64            1.0.5-1.el5                              rhel53             20 k
 libnes                          x86_64            0.5-4.el5                                rhel53             12 k
 libnes-static                   x86_64            0.5-4.el5                                rhel53            9.4 k
 librdmacm                       x86_64            1.0.8-1.el5                              rhel53             22 k
 librdmacm-devel                 x86_64            1.0.8-1.el5                              rhel53             33 k
 librdmacm-utils                 x86_64            1.0.8-1.el5                              rhel53             29 k
 libsdp                          x86_64            1:1.1.99-10.el5_2                        rhel53             39 k
 mstflint                        x86_64            1.3-1.el5                                rhel53            162 k
 openib                          noarch            1.3.2-0.20080728.0355.3.el5              rhel53             20 k
 opensm                          x86_64            3.2.2-3.el5                              rhel53            326 k
 opensm-libs                     x86_64            3.2.2-3.el5                              rhel53             60 k
 perftest                        x86_64            1.2-11.el5                               rhel53             72 k
 srptools                        x86_64            0.0.4-2.el5                              rhel53             46 k
 tvflash                         x86_64            0.9.0-2.el5                              rhel53             42 k
Comment 1 Doug Ledford 2009-01-14 11:25:07 EST
System daemons, and openib in particular, do not default to enabled when installed.  For normal systems and typical network services, this represents a security risk.  For openibd in particular, defaulting the openibd service to on loads kernel modules that system administrators don't want loaded by default.  This particularly problematic on systems that do an "Everything" install.  The openibd script was defaulted to remain off until enabled by the system administrator due to complaints from users.  I would recommend that the installation method on cluster machines include the following in the kickstart %post section (or similar post install configuration method if you aren't using kickstart):

/sbin/chkconfig --level 2345 openibd on
/sbin/service openibd start

Closing this out as NOTABUG.

