Bug 762808 (GLUSTER-1076) - Connection issue between client and server on gluster 3.0.4
Summary: Connection issue between client and server on gluster 3.0.4
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-1076
Product: GlusterFS
Classification: Community
Component: core
Version: 3.0.4
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-07-15 06:06 UTC by Sachidananda Urs
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: DNR
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Sachidananda Urs 2010-07-15 06:06:11 UTC
[2010-06-29 11:01:46] E [ib-verbs.c:1299:ib_verbs_send_completion_proc]
ib-verbs: connection between client and server not working. check by
running 'ibv_srq_pingpong'. also make sure subnet manager is running
(eg: 'opensm'), or check if ib-verbs port is valid (or active) by
running 'ibv_devinfo'. contact Gluster Support Team if the problem
persists.

Details from the server
[root@jr1 ~]# /etc/init.d/opensmd status
opensm (pid 5292) is running...

[root@jr1 ~]# tail /var/log/opensm.log
Jul 14 05:03:49 206203 [47341940] 0x02 -> SUBNET UP
Jul 14 05:03:59 209427 [47341940] 0x02 -> SUBNET UP
Jul 14 05:04:09 212182 [47341940] 0x02 -> SUBNET UP

root@jr1 ~]# ibv_devinfo
hca_id: qib0
transport: InfiniBand (0)
fw_ver: 0.0.0
node_guid: 0011:7500:00ff:5898
sys_image_guid: 0011:7500:00ff:5898
vendor_id: 0x1175
vendor_part_id: 29216
hw_ver: 0x2
board_id: InfiniPath_QLE7240
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 1
port_lmc: 0x00
link_layer: IB

[root@jr2 ~]# ibv_devinfo
hca_id: qib0
transport: InfiniBand (0)
fw_ver: 0.0.0
node_guid: 0011:7500:00ff:58a8
sys_image_guid: 0011:7500:00ff:58a8
vendor_id: 0x1175
vendor_part_id: 29216
hw_ver: 0x2
board_id: InfiniPath_QLE7240
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 2
port_lmc: 0x00
link_layer: IB


[root@jr1 ~]# ibv_srq_pingpong 10.100.1.2
local address: LID 0x0001, QPN 0x00014b, PSN 0x475c62, GID ::
local address: LID 0x0001, QPN 0x00014c, PSN 0xf83edb, GID ::
local address: LID 0x0001, QPN 0x00014d, PSN 0x0a9353, GID ::
local address: LID 0x0001, QPN 0x00014e, PSN 0x11786f, GID ::
local address: LID 0x0001, QPN 0x00014f, PSN 0xe02a8d, GID ::
local address: LID 0x0001, QPN 0x000150, PSN 0xcaae35, GID ::
local address: LID 0x0001, QPN 0x000151, PSN 0x1f18ba, GID ::
local address: LID 0x0001, QPN 0x000152, PSN 0x3b6bf5, GID ::
local address: LID 0x0001, QPN 0x000153, PSN 0x7f7bbf, GID ::
local address: LID 0x0001, QPN 0x000154, PSN 0xd9549e, GID ::
local address: LID 0x0001, QPN 0x000155, PSN 0xbea420, GID ::
local address: LID 0x0001, QPN 0x000156, PSN 0xd1d56a, GID ::
local address: LID 0x0001, QPN 0x000157, PSN 0x5f94d8, GID ::
local address: LID 0x0001, QPN 0x000158, PSN 0x61e66e, GID ::
local address: LID 0x0001, QPN 0x000159, PSN 0x13e8ca, GID ::
local address: LID 0x0001, QPN 0x00015a, PSN 0x97dcfc, GID ::
remote address: LID 0x0002, QPN 0x000037, PSN 0xe973be, GID ::
remote address: LID 0x0002, QPN 0x000038, PSN 0xc8d907, GID ::
remote address: LID 0x0002, QPN 0x000039, PSN 0xefc00f, GID ::
remote address: LID 0x0002, QPN 0x00003a, PSN 0x266c7b, GID ::
remote address: LID 0x0002, QPN 0x00003b, PSN 0x1edba9, GID ::
remote address: LID 0x0002, QPN 0x00003c, PSN 0x6b3f21, GID ::
remote address: LID 0x0002, QPN 0x00003d, PSN 0x40d536, GID ::
remote address: LID 0x0002, QPN 0x00003e, PSN 0x4014c1, GID ::
remote address: LID 0x0002, QPN 0x00003f, PSN 0x32a29b, GID ::
remote address: LID 0x0002, QPN 0x000040, PSN 0xda884a, GID ::
remote address: LID 0x0002, QPN 0x000041, PSN 0x3e2c5c, GID ::
remote address: LID 0x0002, QPN 0x000042, PSN 0xd4bef6, GID ::
remote address: LID 0x0002, QPN 0x000043, PSN 0x80cd74, GID ::
remote address: LID 0x0002, QPN 0x000044, PSN 0x3328da, GID ::
remote address: LID 0x0002, QPN 0x000045, PSN 0x6a38c6, GID ::
remote address: LID 0x0002, QPN 0x000046, PSN 0x845348, GID ::
8192000 bytes in 0.01 seconds = 8790.88 Mbit/sec
1000 iters in 0.01 seconds = 7.45 usec/iter

[root@jr2 ~]# ibv_srq_pingpong
local address: LID 0x0002, QPN 0x000037, PSN 0xe973be, GID ::
local address: LID 0x0002, QPN 0x000038, PSN 0xc8d907, GID ::
local address: LID 0x0002, QPN 0x000039, PSN 0xefc00f, GID ::
local address: LID 0x0002, QPN 0x00003a, PSN 0x266c7b, GID ::
local address: LID 0x0002, QPN 0x00003b, PSN 0x1edba9, GID ::
local address: LID 0x0002, QPN 0x00003c, PSN 0x6b3f21, GID ::
local address: LID 0x0002, QPN 0x00003d, PSN 0x40d536, GID ::
local address: LID 0x0002, QPN 0x00003e, PSN 0x4014c1, GID ::
local address: LID 0x0002, QPN 0x00003f, PSN 0x32a29b, GID ::
local address: LID 0x0002, QPN 0x000040, PSN 0xda884a, GID ::
local address: LID 0x0002, QPN 0x000041, PSN 0x3e2c5c, GID ::
local address: LID 0x0002, QPN 0x000042, PSN 0xd4bef6, GID ::
local address: LID 0x0002, QPN 0x000043, PSN 0x80cd74, GID ::
local address: LID 0x0002, QPN 0x000044, PSN 0x3328da, GID ::
local address: LID 0x0002, QPN 0x000045, PSN 0x6a38c6, GID ::
local address: LID 0x0002, QPN 0x000046, PSN 0x845348, GID ::
remote address: LID 0x0001, QPN 0x00014b, PSN 0x475c62, GID ::
remote address: LID 0x0001, QPN 0x00014c, PSN 0xf83edb, GID ::
remote address: LID 0x0001, QPN 0x00014d, PSN 0x0a9353, GID ::
remote address: LID 0x0001, QPN 0x00014e, PSN 0x11786f, GID ::
remote address: LID 0x0001, QPN 0x00014f, PSN 0xe02a8d, GID ::
remote address: LID 0x0001, QPN 0x000150, PSN 0xcaae35, GID ::
remote address: LID 0x0001, QPN 0x000151, PSN 0x1f18ba, GID ::
remote address: LID 0x0001, QPN 0x000152, PSN 0x3b6bf5, GID ::
remote address: LID 0x0001, QPN 0x000153, PSN 0x7f7bbf, GID ::
remote address: LID 0x0001, QPN 0x000154, PSN 0xd9549e, GID ::
remote address: LID 0x0001, QPN 0x000155, PSN 0xbea420, GID ::
remote address: LID 0x0001, QPN 0x000156, PSN 0xd1d56a, GID ::
remote address: LID 0x0001, QPN 0x000157, PSN 0x5f94d8, GID ::
remote address: LID 0x0001, QPN 0x000158, PSN 0x61e66e, GID ::
remote address: LID 0x0001, QPN 0x000159, PSN 0x13e8ca, GID ::
remote address: LID 0x0001, QPN 0x00015a, PSN 0x97dcfc, GID ::
8192000 bytes in 0.01 seconds = 8676.82 Mbit/sec
1000 iters in 0.01 seconds = 7.55 usec/iter


The volumes are disconnecting often and reconnecting

[2010-07-02 03:51:48] W [xlator.c:656:validate_xlator_volume_options]
jr2-home11-2: option 'transport.remote-port' is deprecated, preferred is
'remote-port', continuing with correction
[2010-07-02 03:51:48] W [xlator.c:656:validate_xlator_volume_options]
jr1-home11-2: option 'transport.remote-port' is deprecated, preferred is
'remote-port', continuing with correction
[2010-07-02 03:51:48] W [glusterfsd.c:548:_log_if_option_is_invalid]
jr2-home11-2: option 'transport.socket.lowlat' is not recognized
[2010-07-02 03:51:48] W [glusterfsd.c:548:_log_if_option_is_invalid]
jr1-home11-2: option 'transport.socket.lowlat' is not recognized
[2010-07-02 03:51:48] N [glusterfsd.c:1408:main] glusterfs: Successfully
started
[2010-07-02 03:51:48] E [ib-verbs.c:1287:ib_verbs_send_completion_proc]
transport/ib-verbs: send work request on `qib0' returned error wc.status
= 12, wc.vendor_err = 0, post->buf = 0x2aaaacaa1000, wc.byte_len = 0,
post->reused = 1
[2010-07-02 03:51:48] E [ib-verbs.c:1299:ib_verbs_send_completion_proc]
ib-verbs: connection between client and server not working. check by
running 'ibv_srq_pingpong'. also make sure subnet manager is running
(eg: 'opensm'), or check if ib-verbs port is valid (or active) by
running 'ibv_devinfo'. contact Gluster Support Team if the problem
persists.
[2010-07-02 03:51:48] E [ib-verbs.c:1287:ib_verbs_send_completion_proc]
transport/ib-verbs: send work request on `qib0' returned error wc.status
= 12, wc.vendor_err = 0, post->buf = 0x2aaaaca1f000, wc.byte_len = 0,
post->reused = 1
[2010-07-02 03:51:48] E [ib-verbs.c:1299:ib_verbs_send_completion_proc]
ib-verbs: connection between client and server not working. check by
running 'ibv_srq_pingpong'. also make sure subnet manager is running
(eg: 'opensm'), or check if ib-verbs port is valid (or active) by
running 'ibv_devinfo'. contact Gluster Support Team if the problem
persists.
[2010-07-02 03:51:48] E [ib-verbs.c:2071:ib_verbs_event_handler]
transport/ib-verbs: jr2-home11-2: pollin received on tcp socket (peer:
10.100.1.2:6998) after handshake is complete
[2010-07-02 03:51:48] E [saved-frames.c:165:saved_frames_unwind]
jr2-home11-2: forced unwinding frame type(2) op(SETVOLUME)
[2010-07-02 03:51:48] E [ib-verbs.c:2071:ib_verbs_event_handler]
transport/ib-verbs: jr2-home11-2: pollin received on tcp socket (peer:
10.100.1.2:6998) after handshake is complete
[2010-07-02 03:51:48] N [fuse-bridge.c:2950:fuse_init] glusterfs-fuse:
FUSE inited with protocol versions: glusterfs 7.13 kernel 7.10
[2010-07-02 03:51:48] E [saved-frames.c:165:saved_frames_unwind]
jr2-home11-2: forced unwinding frame type(2) op(SETVOLUME)
[2010-07-02 03:51:48] N [client-protocol.c:6246:client_setvolume_cbk]
jr1-home11-2: Connected to 10.100.1.1:6998, attached to remote volume
'threads-home11-2'.
[2010-07-02 03:51:48] N [afr.c:2632:notify] mirror: Subvolume
'jr1-home11-2' came back up; going online.
[2010-07-02 03:51:48] N [client-protocol.c:6246:client_setvolume_cbk]
jr1-home11-2: Connected to 10.100.1.1:6998, attached to remote volume
'threads-home11-2'.
[2010-07-02 03:51:48] N [afr.c:2632:notify] mirror: Subvolume
'jr1-home11-2' came back up; going online.
[2010-07-02 03:51:59] N [client-protocol.c:6246:client_setvolume_cbk]
jr2-home11-2: Connected to 10.100.1.2:6998, attached to remote volume
'threads-home11-2'.
[2010-07-02 03:51:59] N [client-protocol.c:6246:client_setvolume_cbk]
jr2-home11-2: Connected to 10.100.1.2:6998, attached to remote volume
'threads-home11-2'

Comment 1 Raghavendra G 2010-11-10 03:01:59 UTC
Issue was resolved by reinstalling Operating system and IB software stack. Closing the bug.


Note You need to log in before you can comment on or make changes to this bug.