Description of problem: when installing IB rpms using yum, openibd is not restarted and will prevent opensmd to start Version-Release number of selected component (if applicable): rh53rc2 How reproducible: a Steps to Reproduce: 1. install IB libs needed 2. start opensm 3. Actual results: fails to start Expected results: start Additional info: [root@compute-0-11 ~]# service opensmd start Starting IB Subnet Manager: [FAILED] [root@compute-0-11 ~]# cat /sys/class/infiniband/mthca0/ports/1/state 2: INIT [root@compute-0-11 ~]# cat /var/log/opensm.log Jan 12 23:40:47 138300 [6626B670] 0x03 -> OpenSM 3.2.2 ------------------------------------------------- OpenSM 3.2.2 Reading Cached Option File: /etc/ofed/opensm.conf Command Line Arguments: Daemon mode Max wire smp's = 2147483647 Priority = 15 Log File: /var/log/opensm.log ------------------------------------------------- OpenSM 3.2.2 Jan 12 23:40:47 138759 [6626B670] 0x80 -> OpenSM 3.2.2 Jan 12 23:40:47 139090 [6626B670] 0x01 -> osm_vendor_init: ERR 5415: Error opening UMAD ibwarn: [26473] umad_init: can't read ABI version from /sys/class/infiniband_mad/abi_version (No such file or directory): is ib_umad module loaded? Entering DISCOVERING state Jan 12 23:40:47 139136 [6626B670] 0x02 -> osm_vendor_init: 1000 pending umads specified Jan 12 23:40:47 139340 [6626B670] 0x80 -> Entering DISCOVERING state Jan 12 23:40:47 142438 [6626B670] 0x02 -> osm_vendor_bind: Binding to port 0x2c9020023db0d Jan 12 23:40:47 150748 [6626B670] 0x01 -> osm_vendor_open_port: ERR 542C: umad_open_port() failed Jan 12 23:40:47 150769 [6626B670] 0x01 -> osm_vendor_bind: ERR 5424: Unable to open port 0x2c9020023db0d Jan 12 23:40:47 150779 [6626B670] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed Jan 12 23:40:47 150787 [6626B670] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR) Jan 12 23:40:47 150810 [6626B670] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind Using default GUID 0x2c9020023db0d Error from osm_opensm_bind (0x2A) Perhaps another instance of OpenSM is already running Exiting SM Jan 12 23:40:47 208960 [6626B670] 0x80 -> Exiting SM [root@compute-0-11 ~]# service openibd restart Unloading OpenIB kernel modules: [ OK ] Loading OpenIB kernel modules: [ OK ] [root@compute-0-11 ~]# service opensmd start Starting IB Subnet Manager: [ OK ] [root@compute-0-11 ~]# what was installed (on top of a manual install) prior to the opensm startup: Installing for dependencies: compat-dapl x86_64 2.0.13-4.el5 rhel53 112 k compat-dapl-devel x86_64 2.0.13-4.el5 rhel53 30 k dapl x86_64 2.0.13-4.el5 rhel53 149 k dapl-utils x86_64 2.0.13-4.el5 rhel53 93 k ibutils x86_64 1.2-9.el5 rhel53 1.7 M ibutils-libs x86_64 1.2-9.el5 rhel53 1.2 M infiniband-diags x86_64 1.4.1-2.el5 rhel53 174 k libcxgb3 x86_64 1.2.2-1.el5 rhel53 14 k libibcm x86_64 1.0.3-1.el5 rhel53 18 k libibcommon x86_64 1.1.1-1.el5 rhel53 21 k libibcommon-devel x86_64 1.1.1-1.el5 rhel53 4.8 k libibmad x86_64 1.2.1-1.el5 rhel53 45 k libibumad x86_64 1.2.1-1.el5 rhel53 50 k libibumad-devel x86_64 1.2.1-1.el5 rhel53 5.2 k libibverbs x86_64 1.1.2-1.el5 rhel53 43 k libibverbs-devel x86_64 1.1.2-1.el5 rhel53 56 k libibverbs-utils x86_64 1.1.2-1.el5 rhel53 38 k libipathverbs x86_64 1.1-11.el5 rhel53 12 k libipathverbs-static x86_64 1.1-11.el5 rhel53 8.8 k libmlx4 x86_64 1.0-4.el5 rhel53 24 k libmlx4-static x86_64 1.0-4.el5 rhel53 16 k libmthca x86_64 1.0.5-1.el5 rhel53 33 k libmthca-static x86_64 1.0.5-1.el5 rhel53 20 k libnes x86_64 0.5-4.el5 rhel53 12 k libnes-static x86_64 0.5-4.el5 rhel53 9.4 k librdmacm x86_64 1.0.8-1.el5 rhel53 22 k librdmacm-devel x86_64 1.0.8-1.el5 rhel53 33 k librdmacm-utils x86_64 1.0.8-1.el5 rhel53 29 k libsdp x86_64 1:1.1.99-10.el5_2 rhel53 39 k mstflint x86_64 1.3-1.el5 rhel53 162 k openib noarch 1.3.2-0.20080728.0355.3.el5 rhel53 20 k opensm x86_64 3.2.2-3.el5 rhel53 326 k opensm-libs x86_64 3.2.2-3.el5 rhel53 60 k perftest x86_64 1.2-11.el5 rhel53 72 k srptools x86_64 0.0.4-2.el5 rhel53 46 k tvflash x86_64 0.9.0-2.el5 rhel53 42 k
System daemons, and openib in particular, do not default to enabled when installed. For normal systems and typical network services, this represents a security risk. For openibd in particular, defaulting the openibd service to on loads kernel modules that system administrators don't want loaded by default. This particularly problematic on systems that do an "Everything" install. The openibd script was defaulted to remain off until enabled by the system administrator due to complaints from users. I would recommend that the installation method on cluster machines include the following in the kickstart %post section (or similar post install configuration method if you aren't using kickstart): /sbin/chkconfig --level 2345 openibd on /sbin/service openibd start Closing this out as NOTABUG.