Hide Forgot
[2010-06-29 11:01:46] E [ib-verbs.c:1299:ib_verbs_send_completion_proc] ib-verbs: connection between client and server not working. check by running 'ibv_srq_pingpong'. also make sure subnet manager is running (eg: 'opensm'), or check if ib-verbs port is valid (or active) by running 'ibv_devinfo'. contact Gluster Support Team if the problem persists. Details from the server [root@jr1 ~]# /etc/init.d/opensmd status opensm (pid 5292) is running... [root@jr1 ~]# tail /var/log/opensm.log Jul 14 05:03:49 206203 [47341940] 0x02 -> SUBNET UP Jul 14 05:03:59 209427 [47341940] 0x02 -> SUBNET UP Jul 14 05:04:09 212182 [47341940] 0x02 -> SUBNET UP root@jr1 ~]# ibv_devinfo hca_id: qib0 transport: InfiniBand (0) fw_ver: 0.0.0 node_guid: 0011:7500:00ff:5898 sys_image_guid: 0011:7500:00ff:5898 vendor_id: 0x1175 vendor_part_id: 29216 hw_ver: 0x2 board_id: InfiniPath_QLE7240 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 1 port_lid: 1 port_lmc: 0x00 link_layer: IB [root@jr2 ~]# ibv_devinfo hca_id: qib0 transport: InfiniBand (0) fw_ver: 0.0.0 node_guid: 0011:7500:00ff:58a8 sys_image_guid: 0011:7500:00ff:58a8 vendor_id: 0x1175 vendor_part_id: 29216 hw_ver: 0x2 board_id: InfiniPath_QLE7240 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 1 port_lid: 2 port_lmc: 0x00 link_layer: IB [root@jr1 ~]# ibv_srq_pingpong 10.100.1.2 local address: LID 0x0001, QPN 0x00014b, PSN 0x475c62, GID :: local address: LID 0x0001, QPN 0x00014c, PSN 0xf83edb, GID :: local address: LID 0x0001, QPN 0x00014d, PSN 0x0a9353, GID :: local address: LID 0x0001, QPN 0x00014e, PSN 0x11786f, GID :: local address: LID 0x0001, QPN 0x00014f, PSN 0xe02a8d, GID :: local address: LID 0x0001, QPN 0x000150, PSN 0xcaae35, GID :: local address: LID 0x0001, QPN 0x000151, PSN 0x1f18ba, GID :: local address: LID 0x0001, QPN 0x000152, PSN 0x3b6bf5, GID :: local address: LID 0x0001, QPN 0x000153, PSN 0x7f7bbf, GID :: local address: LID 0x0001, QPN 0x000154, PSN 0xd9549e, GID :: local address: LID 0x0001, QPN 0x000155, PSN 0xbea420, GID :: local address: LID 0x0001, QPN 0x000156, PSN 0xd1d56a, GID :: local address: LID 0x0001, QPN 0x000157, PSN 0x5f94d8, GID :: local address: LID 0x0001, QPN 0x000158, PSN 0x61e66e, GID :: local address: LID 0x0001, QPN 0x000159, PSN 0x13e8ca, GID :: local address: LID 0x0001, QPN 0x00015a, PSN 0x97dcfc, GID :: remote address: LID 0x0002, QPN 0x000037, PSN 0xe973be, GID :: remote address: LID 0x0002, QPN 0x000038, PSN 0xc8d907, GID :: remote address: LID 0x0002, QPN 0x000039, PSN 0xefc00f, GID :: remote address: LID 0x0002, QPN 0x00003a, PSN 0x266c7b, GID :: remote address: LID 0x0002, QPN 0x00003b, PSN 0x1edba9, GID :: remote address: LID 0x0002, QPN 0x00003c, PSN 0x6b3f21, GID :: remote address: LID 0x0002, QPN 0x00003d, PSN 0x40d536, GID :: remote address: LID 0x0002, QPN 0x00003e, PSN 0x4014c1, GID :: remote address: LID 0x0002, QPN 0x00003f, PSN 0x32a29b, GID :: remote address: LID 0x0002, QPN 0x000040, PSN 0xda884a, GID :: remote address: LID 0x0002, QPN 0x000041, PSN 0x3e2c5c, GID :: remote address: LID 0x0002, QPN 0x000042, PSN 0xd4bef6, GID :: remote address: LID 0x0002, QPN 0x000043, PSN 0x80cd74, GID :: remote address: LID 0x0002, QPN 0x000044, PSN 0x3328da, GID :: remote address: LID 0x0002, QPN 0x000045, PSN 0x6a38c6, GID :: remote address: LID 0x0002, QPN 0x000046, PSN 0x845348, GID :: 8192000 bytes in 0.01 seconds = 8790.88 Mbit/sec 1000 iters in 0.01 seconds = 7.45 usec/iter [root@jr2 ~]# ibv_srq_pingpong local address: LID 0x0002, QPN 0x000037, PSN 0xe973be, GID :: local address: LID 0x0002, QPN 0x000038, PSN 0xc8d907, GID :: local address: LID 0x0002, QPN 0x000039, PSN 0xefc00f, GID :: local address: LID 0x0002, QPN 0x00003a, PSN 0x266c7b, GID :: local address: LID 0x0002, QPN 0x00003b, PSN 0x1edba9, GID :: local address: LID 0x0002, QPN 0x00003c, PSN 0x6b3f21, GID :: local address: LID 0x0002, QPN 0x00003d, PSN 0x40d536, GID :: local address: LID 0x0002, QPN 0x00003e, PSN 0x4014c1, GID :: local address: LID 0x0002, QPN 0x00003f, PSN 0x32a29b, GID :: local address: LID 0x0002, QPN 0x000040, PSN 0xda884a, GID :: local address: LID 0x0002, QPN 0x000041, PSN 0x3e2c5c, GID :: local address: LID 0x0002, QPN 0x000042, PSN 0xd4bef6, GID :: local address: LID 0x0002, QPN 0x000043, PSN 0x80cd74, GID :: local address: LID 0x0002, QPN 0x000044, PSN 0x3328da, GID :: local address: LID 0x0002, QPN 0x000045, PSN 0x6a38c6, GID :: local address: LID 0x0002, QPN 0x000046, PSN 0x845348, GID :: remote address: LID 0x0001, QPN 0x00014b, PSN 0x475c62, GID :: remote address: LID 0x0001, QPN 0x00014c, PSN 0xf83edb, GID :: remote address: LID 0x0001, QPN 0x00014d, PSN 0x0a9353, GID :: remote address: LID 0x0001, QPN 0x00014e, PSN 0x11786f, GID :: remote address: LID 0x0001, QPN 0x00014f, PSN 0xe02a8d, GID :: remote address: LID 0x0001, QPN 0x000150, PSN 0xcaae35, GID :: remote address: LID 0x0001, QPN 0x000151, PSN 0x1f18ba, GID :: remote address: LID 0x0001, QPN 0x000152, PSN 0x3b6bf5, GID :: remote address: LID 0x0001, QPN 0x000153, PSN 0x7f7bbf, GID :: remote address: LID 0x0001, QPN 0x000154, PSN 0xd9549e, GID :: remote address: LID 0x0001, QPN 0x000155, PSN 0xbea420, GID :: remote address: LID 0x0001, QPN 0x000156, PSN 0xd1d56a, GID :: remote address: LID 0x0001, QPN 0x000157, PSN 0x5f94d8, GID :: remote address: LID 0x0001, QPN 0x000158, PSN 0x61e66e, GID :: remote address: LID 0x0001, QPN 0x000159, PSN 0x13e8ca, GID :: remote address: LID 0x0001, QPN 0x00015a, PSN 0x97dcfc, GID :: 8192000 bytes in 0.01 seconds = 8676.82 Mbit/sec 1000 iters in 0.01 seconds = 7.55 usec/iter The volumes are disconnecting often and reconnecting [2010-07-02 03:51:48] W [xlator.c:656:validate_xlator_volume_options] jr2-home11-2: option 'transport.remote-port' is deprecated, preferred is 'remote-port', continuing with correction [2010-07-02 03:51:48] W [xlator.c:656:validate_xlator_volume_options] jr1-home11-2: option 'transport.remote-port' is deprecated, preferred is 'remote-port', continuing with correction [2010-07-02 03:51:48] W [glusterfsd.c:548:_log_if_option_is_invalid] jr2-home11-2: option 'transport.socket.lowlat' is not recognized [2010-07-02 03:51:48] W [glusterfsd.c:548:_log_if_option_is_invalid] jr1-home11-2: option 'transport.socket.lowlat' is not recognized [2010-07-02 03:51:48] N [glusterfsd.c:1408:main] glusterfs: Successfully started [2010-07-02 03:51:48] E [ib-verbs.c:1287:ib_verbs_send_completion_proc] transport/ib-verbs: send work request on `qib0' returned error wc.status = 12, wc.vendor_err = 0, post->buf = 0x2aaaacaa1000, wc.byte_len = 0, post->reused = 1 [2010-07-02 03:51:48] E [ib-verbs.c:1299:ib_verbs_send_completion_proc] ib-verbs: connection between client and server not working. check by running 'ibv_srq_pingpong'. also make sure subnet manager is running (eg: 'opensm'), or check if ib-verbs port is valid (or active) by running 'ibv_devinfo'. contact Gluster Support Team if the problem persists. [2010-07-02 03:51:48] E [ib-verbs.c:1287:ib_verbs_send_completion_proc] transport/ib-verbs: send work request on `qib0' returned error wc.status = 12, wc.vendor_err = 0, post->buf = 0x2aaaaca1f000, wc.byte_len = 0, post->reused = 1 [2010-07-02 03:51:48] E [ib-verbs.c:1299:ib_verbs_send_completion_proc] ib-verbs: connection between client and server not working. check by running 'ibv_srq_pingpong'. also make sure subnet manager is running (eg: 'opensm'), or check if ib-verbs port is valid (or active) by running 'ibv_devinfo'. contact Gluster Support Team if the problem persists. [2010-07-02 03:51:48] E [ib-verbs.c:2071:ib_verbs_event_handler] transport/ib-verbs: jr2-home11-2: pollin received on tcp socket (peer: 10.100.1.2:6998) after handshake is complete [2010-07-02 03:51:48] E [saved-frames.c:165:saved_frames_unwind] jr2-home11-2: forced unwinding frame type(2) op(SETVOLUME) [2010-07-02 03:51:48] E [ib-verbs.c:2071:ib_verbs_event_handler] transport/ib-verbs: jr2-home11-2: pollin received on tcp socket (peer: 10.100.1.2:6998) after handshake is complete [2010-07-02 03:51:48] N [fuse-bridge.c:2950:fuse_init] glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.10 [2010-07-02 03:51:48] E [saved-frames.c:165:saved_frames_unwind] jr2-home11-2: forced unwinding frame type(2) op(SETVOLUME) [2010-07-02 03:51:48] N [client-protocol.c:6246:client_setvolume_cbk] jr1-home11-2: Connected to 10.100.1.1:6998, attached to remote volume 'threads-home11-2'. [2010-07-02 03:51:48] N [afr.c:2632:notify] mirror: Subvolume 'jr1-home11-2' came back up; going online. [2010-07-02 03:51:48] N [client-protocol.c:6246:client_setvolume_cbk] jr1-home11-2: Connected to 10.100.1.1:6998, attached to remote volume 'threads-home11-2'. [2010-07-02 03:51:48] N [afr.c:2632:notify] mirror: Subvolume 'jr1-home11-2' came back up; going online. [2010-07-02 03:51:59] N [client-protocol.c:6246:client_setvolume_cbk] jr2-home11-2: Connected to 10.100.1.2:6998, attached to remote volume 'threads-home11-2'. [2010-07-02 03:51:59] N [client-protocol.c:6246:client_setvolume_cbk] jr2-home11-2: Connected to 10.100.1.2:6998, attached to remote volume 'threads-home11-2'
Issue was resolved by reinstalling Operating system and IB software stack. Closing the bug.