Bug 762808 (GLUSTER-1076)

Summary: Connection issue between client and server on gluster 3.0.4
Product: [Community] GlusterFS Reporter: Sachidananda Urs <sac>
Component: coreAssignee: Raghavendra G <raghavendra>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 3.0.4CC: amarts, gluster-bugs, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: DNR CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Sachidananda Urs 2010-07-15 06:06:11 UTC
[2010-06-29 11:01:46] E [ib-verbs.c:1299:ib_verbs_send_completion_proc]
ib-verbs: connection between client and server not working. check by
running 'ibv_srq_pingpong'. also make sure subnet manager is running
(eg: 'opensm'), or check if ib-verbs port is valid (or active) by
running 'ibv_devinfo'. contact Gluster Support Team if the problem
persists.

Details from the server
[root@jr1 ~]# /etc/init.d/opensmd status
opensm (pid 5292) is running...

[root@jr1 ~]# tail /var/log/opensm.log
Jul 14 05:03:49 206203 [47341940] 0x02 -> SUBNET UP
Jul 14 05:03:59 209427 [47341940] 0x02 -> SUBNET UP
Jul 14 05:04:09 212182 [47341940] 0x02 -> SUBNET UP

root@jr1 ~]# ibv_devinfo
hca_id: qib0
transport: InfiniBand (0)
fw_ver: 0.0.0
node_guid: 0011:7500:00ff:5898
sys_image_guid: 0011:7500:00ff:5898
vendor_id: 0x1175
vendor_part_id: 29216
hw_ver: 0x2
board_id: InfiniPath_QLE7240
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 1
port_lmc: 0x00
link_layer: IB

[root@jr2 ~]# ibv_devinfo
hca_id: qib0
transport: InfiniBand (0)
fw_ver: 0.0.0
node_guid: 0011:7500:00ff:58a8
sys_image_guid: 0011:7500:00ff:58a8
vendor_id: 0x1175
vendor_part_id: 29216
hw_ver: 0x2
board_id: InfiniPath_QLE7240
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 2
port_lmc: 0x00
link_layer: IB


[root@jr1 ~]# ibv_srq_pingpong 10.100.1.2
local address: LID 0x0001, QPN 0x00014b, PSN 0x475c62, GID ::
local address: LID 0x0001, QPN 0x00014c, PSN 0xf83edb, GID ::
local address: LID 0x0001, QPN 0x00014d, PSN 0x0a9353, GID ::
local address: LID 0x0001, QPN 0x00014e, PSN 0x11786f, GID ::
local address: LID 0x0001, QPN 0x00014f, PSN 0xe02a8d, GID ::
local address: LID 0x0001, QPN 0x000150, PSN 0xcaae35, GID ::
local address: LID 0x0001, QPN 0x000151, PSN 0x1f18ba, GID ::
local address: LID 0x0001, QPN 0x000152, PSN 0x3b6bf5, GID ::
local address: LID 0x0001, QPN 0x000153, PSN 0x7f7bbf, GID ::
local address: LID 0x0001, QPN 0x000154, PSN 0xd9549e, GID ::
local address: LID 0x0001, QPN 0x000155, PSN 0xbea420, GID ::
local address: LID 0x0001, QPN 0x000156, PSN 0xd1d56a, GID ::
local address: LID 0x0001, QPN 0x000157, PSN 0x5f94d8, GID ::
local address: LID 0x0001, QPN 0x000158, PSN 0x61e66e, GID ::
local address: LID 0x0001, QPN 0x000159, PSN 0x13e8ca, GID ::
local address: LID 0x0001, QPN 0x00015a, PSN 0x97dcfc, GID ::
remote address: LID 0x0002, QPN 0x000037, PSN 0xe973be, GID ::
remote address: LID 0x0002, QPN 0x000038, PSN 0xc8d907, GID ::
remote address: LID 0x0002, QPN 0x000039, PSN 0xefc00f, GID ::
remote address: LID 0x0002, QPN 0x00003a, PSN 0x266c7b, GID ::
remote address: LID 0x0002, QPN 0x00003b, PSN 0x1edba9, GID ::
remote address: LID 0x0002, QPN 0x00003c, PSN 0x6b3f21, GID ::
remote address: LID 0x0002, QPN 0x00003d, PSN 0x40d536, GID ::
remote address: LID 0x0002, QPN 0x00003e, PSN 0x4014c1, GID ::
remote address: LID 0x0002, QPN 0x00003f, PSN 0x32a29b, GID ::
remote address: LID 0x0002, QPN 0x000040, PSN 0xda884a, GID ::
remote address: LID 0x0002, QPN 0x000041, PSN 0x3e2c5c, GID ::
remote address: LID 0x0002, QPN 0x000042, PSN 0xd4bef6, GID ::
remote address: LID 0x0002, QPN 0x000043, PSN 0x80cd74, GID ::
remote address: LID 0x0002, QPN 0x000044, PSN 0x3328da, GID ::
remote address: LID 0x0002, QPN 0x000045, PSN 0x6a38c6, GID ::
remote address: LID 0x0002, QPN 0x000046, PSN 0x845348, GID ::
8192000 bytes in 0.01 seconds = 8790.88 Mbit/sec
1000 iters in 0.01 seconds = 7.45 usec/iter

[root@jr2 ~]# ibv_srq_pingpong
local address: LID 0x0002, QPN 0x000037, PSN 0xe973be, GID ::
local address: LID 0x0002, QPN 0x000038, PSN 0xc8d907, GID ::
local address: LID 0x0002, QPN 0x000039, PSN 0xefc00f, GID ::
local address: LID 0x0002, QPN 0x00003a, PSN 0x266c7b, GID ::
local address: LID 0x0002, QPN 0x00003b, PSN 0x1edba9, GID ::
local address: LID 0x0002, QPN 0x00003c, PSN 0x6b3f21, GID ::
local address: LID 0x0002, QPN 0x00003d, PSN 0x40d536, GID ::
local address: LID 0x0002, QPN 0x00003e, PSN 0x4014c1, GID ::
local address: LID 0x0002, QPN 0x00003f, PSN 0x32a29b, GID ::
local address: LID 0x0002, QPN 0x000040, PSN 0xda884a, GID ::
local address: LID 0x0002, QPN 0x000041, PSN 0x3e2c5c, GID ::
local address: LID 0x0002, QPN 0x000042, PSN 0xd4bef6, GID ::
local address: LID 0x0002, QPN 0x000043, PSN 0x80cd74, GID ::
local address: LID 0x0002, QPN 0x000044, PSN 0x3328da, GID ::
local address: LID 0x0002, QPN 0x000045, PSN 0x6a38c6, GID ::
local address: LID 0x0002, QPN 0x000046, PSN 0x845348, GID ::
remote address: LID 0x0001, QPN 0x00014b, PSN 0x475c62, GID ::
remote address: LID 0x0001, QPN 0x00014c, PSN 0xf83edb, GID ::
remote address: LID 0x0001, QPN 0x00014d, PSN 0x0a9353, GID ::
remote address: LID 0x0001, QPN 0x00014e, PSN 0x11786f, GID ::
remote address: LID 0x0001, QPN 0x00014f, PSN 0xe02a8d, GID ::
remote address: LID 0x0001, QPN 0x000150, PSN 0xcaae35, GID ::
remote address: LID 0x0001, QPN 0x000151, PSN 0x1f18ba, GID ::
remote address: LID 0x0001, QPN 0x000152, PSN 0x3b6bf5, GID ::
remote address: LID 0x0001, QPN 0x000153, PSN 0x7f7bbf, GID ::
remote address: LID 0x0001, QPN 0x000154, PSN 0xd9549e, GID ::
remote address: LID 0x0001, QPN 0x000155, PSN 0xbea420, GID ::
remote address: LID 0x0001, QPN 0x000156, PSN 0xd1d56a, GID ::
remote address: LID 0x0001, QPN 0x000157, PSN 0x5f94d8, GID ::
remote address: LID 0x0001, QPN 0x000158, PSN 0x61e66e, GID ::
remote address: LID 0x0001, QPN 0x000159, PSN 0x13e8ca, GID ::
remote address: LID 0x0001, QPN 0x00015a, PSN 0x97dcfc, GID ::
8192000 bytes in 0.01 seconds = 8676.82 Mbit/sec
1000 iters in 0.01 seconds = 7.55 usec/iter


The volumes are disconnecting often and reconnecting

[2010-07-02 03:51:48] W [xlator.c:656:validate_xlator_volume_options]
jr2-home11-2: option 'transport.remote-port' is deprecated, preferred is
'remote-port', continuing with correction
[2010-07-02 03:51:48] W [xlator.c:656:validate_xlator_volume_options]
jr1-home11-2: option 'transport.remote-port' is deprecated, preferred is
'remote-port', continuing with correction
[2010-07-02 03:51:48] W [glusterfsd.c:548:_log_if_option_is_invalid]
jr2-home11-2: option 'transport.socket.lowlat' is not recognized
[2010-07-02 03:51:48] W [glusterfsd.c:548:_log_if_option_is_invalid]
jr1-home11-2: option 'transport.socket.lowlat' is not recognized
[2010-07-02 03:51:48] N [glusterfsd.c:1408:main] glusterfs: Successfully
started
[2010-07-02 03:51:48] E [ib-verbs.c:1287:ib_verbs_send_completion_proc]
transport/ib-verbs: send work request on `qib0' returned error wc.status
= 12, wc.vendor_err = 0, post->buf = 0x2aaaacaa1000, wc.byte_len = 0,
post->reused = 1
[2010-07-02 03:51:48] E [ib-verbs.c:1299:ib_verbs_send_completion_proc]
ib-verbs: connection between client and server not working. check by
running 'ibv_srq_pingpong'. also make sure subnet manager is running
(eg: 'opensm'), or check if ib-verbs port is valid (or active) by
running 'ibv_devinfo'. contact Gluster Support Team if the problem
persists.
[2010-07-02 03:51:48] E [ib-verbs.c:1287:ib_verbs_send_completion_proc]
transport/ib-verbs: send work request on `qib0' returned error wc.status
= 12, wc.vendor_err = 0, post->buf = 0x2aaaaca1f000, wc.byte_len = 0,
post->reused = 1
[2010-07-02 03:51:48] E [ib-verbs.c:1299:ib_verbs_send_completion_proc]
ib-verbs: connection between client and server not working. check by
running 'ibv_srq_pingpong'. also make sure subnet manager is running
(eg: 'opensm'), or check if ib-verbs port is valid (or active) by
running 'ibv_devinfo'. contact Gluster Support Team if the problem
persists.
[2010-07-02 03:51:48] E [ib-verbs.c:2071:ib_verbs_event_handler]
transport/ib-verbs: jr2-home11-2: pollin received on tcp socket (peer:
10.100.1.2:6998) after handshake is complete
[2010-07-02 03:51:48] E [saved-frames.c:165:saved_frames_unwind]
jr2-home11-2: forced unwinding frame type(2) op(SETVOLUME)
[2010-07-02 03:51:48] E [ib-verbs.c:2071:ib_verbs_event_handler]
transport/ib-verbs: jr2-home11-2: pollin received on tcp socket (peer:
10.100.1.2:6998) after handshake is complete
[2010-07-02 03:51:48] N [fuse-bridge.c:2950:fuse_init] glusterfs-fuse:
FUSE inited with protocol versions: glusterfs 7.13 kernel 7.10
[2010-07-02 03:51:48] E [saved-frames.c:165:saved_frames_unwind]
jr2-home11-2: forced unwinding frame type(2) op(SETVOLUME)
[2010-07-02 03:51:48] N [client-protocol.c:6246:client_setvolume_cbk]
jr1-home11-2: Connected to 10.100.1.1:6998, attached to remote volume
'threads-home11-2'.
[2010-07-02 03:51:48] N [afr.c:2632:notify] mirror: Subvolume
'jr1-home11-2' came back up; going online.
[2010-07-02 03:51:48] N [client-protocol.c:6246:client_setvolume_cbk]
jr1-home11-2: Connected to 10.100.1.1:6998, attached to remote volume
'threads-home11-2'.
[2010-07-02 03:51:48] N [afr.c:2632:notify] mirror: Subvolume
'jr1-home11-2' came back up; going online.
[2010-07-02 03:51:59] N [client-protocol.c:6246:client_setvolume_cbk]
jr2-home11-2: Connected to 10.100.1.2:6998, attached to remote volume
'threads-home11-2'.
[2010-07-02 03:51:59] N [client-protocol.c:6246:client_setvolume_cbk]
jr2-home11-2: Connected to 10.100.1.2:6998, attached to remote volume
'threads-home11-2'

Comment 1 Raghavendra G 2010-11-10 03:01:59 UTC
Issue was resolved by reinstalling Operating system and IB software stack. Closing the bug.