Bug 852585

Summary: [glusterfs-3.2.5qa2] - iozone fails in volume with tcp,rdma transport type
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vidya Sakar <vinaraya>
Component: rdmaAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED CURRENTRELEASE QA Contact: storage-qa-internal <storage-qa-internal>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.0CC: aavati, gluster-bugs, rhs-bugs, rwheeler, sdharane, vagarwal, vbellur, vbhat, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: GLUSTER-3758 Environment:
Last Closed: 2015-02-13 10:20:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 765490    
Bug Blocks:    

Description Vidya Sakar 2012-08-29 01:31:40 UTC
+++ This bug was initially created as a clone of Bug #765490 +++

Created a replicate volume with tcp,rdma transport type. Mounted via nfs and started running iozone. It failed with following error. 

5 2947000  2748949  5306340  1512364  1565562 3413030  5184636
            2048     512 1419165 1542232  3127245  5044577 5238385 2230685 3419824  2691246  5212953  1647536  1841829 3292669  5119743
            2048    1024 1407538 1558461  3302797  5059433 5277002 2178635 2756890  2482062  4643695  1802788  2231264 2880782  4676561
            2048    2048fsync: Input/output error

iozone: interrupted 

exiting iozone

glusterd logs say that the glusterd is loaded with following volfile.

  1: volume management
  2:     type mgmt/glusterd
  3:     option working-directory /etc/glusterd
  4:     option transport-type socket,rdma
  5:     option transport.socket.keepalive-time 10
  6:     option transport.socket.keepalive-interval 2
  7: end-volume
  8:

I'm not sure whether 'socket,rdma' is a valid transport type. And I see following errors in logs.

[2011-10-24 23:59:02.28709] I [glusterd-handler.c:2781:glusterd_op_commit_send_resp] 0-glusterd: Responded to commit, ret: 0
[2011-10-24 23:59:02.29857] I [glusterd-handler.c:2690:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: 0a25570c-4342-4187-a57b-ce61dd4dd73e
[2011-10-24 23:59:02.29903] I [glusterd-handler.c:2668:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0
[2011-10-24 23:59:02.30441] E [rdma.c:4468:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.21:1017) after handshake is complete
[2011-10-24 23:59:02.593025] E [rdma.c:4468:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.24:1020) after handshake is complete
[2011-10-25 00:10:18.940995] I [glusterd-handler.c:448:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: 0a25570c-4342-4187-a57b-ce61dd4dd73e
[2011-10-25 00:10:18.941078] I [glusterd-utils.c:243:glusterd_lock] 0-glusterd: Cluster lock held by 0a25570c-4342-4187-a57b-ce61dd4dd73e
[2011-10-25 00:10:18.941139] I [glusterd-handler.c:2648:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0


I see the following entries in brick logs.

[2011-10-24 23:57:39.934419] I [server-handshake.c:542:server_setvolume] 0-hosdu-server: accepted client from 10.1.10.24:1019 (version: 3.2.5qa1)
[2011-10-24 23:57:43.190129] E [socket.c:1395:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1598180427) received from 10.1.10.21:1019
[2011-10-24 23:57:43.190184] I [server.c:438:server_rpc_notify] 0-hosdu-server: disconnected connection from 10.1.10.21:1019
[2011-10-24 23:57:46.193808] E [socket.c:1395:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1598180427) received from 10.1.10.21:1022
[2011-10-24 23:57:46.193856] I [server.c:438:server_rpc_notify] 0-hosdu-server: disconnected connection from 10.1.10.21:1022
[2011-10-24 23:57:55.996217] E [socket.c:1395:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1598180427) received from 10.1.10.21:1022
[2011-10-24 23:57:55.996286] I [server.c:438:server_rpc_notify] 0-hosdu-server: disconnected connection from 10.1.10.21:1022
[2011-10-24 23:58:05.606191] E [socket.c:1395:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1598180427) received from 10.1.10.21:1022
[2011-10-24 23:58:05.606234] I [server.c:438:server_rpc_notify] 0-hosdu-server: disconnected connection from 10.1.10.21:1022
[2011-10-24 23:58:14.617234] E [socket.c:1395:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1598180427) received from 10.1.10.21:1022
[2011-10-24 23:58:14.617278] I [server.c:438:server_rpc_notify] 0-hosdu-server: disconnected connection from 10.1.10.21:1022
[2011-10-24 23:58:23.628203] E [socket.c:1395:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1598180427) received from 10.1.10.21:1022
[2011-10-24 23:58:23.628248] I [server.c:438:server_rpc_notify] 0-hosdu-server: disconnected connection from 10.1.10.21:1022
[2011-10-24 23:58:32.639200] E [socket.c:1395:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1598180427) received from 10.1.10.21:1022
[2011-10-24 23:58:32.639242] I [server.c:438:server_rpc_notify] 0-hosdu-server: disconnected connection from 10.1.10.21:1022
[2011-10-24 23:58:41.650281] E [socket.c:1395:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1598180427) received from 10.1.10.21:1022
[2011-10-24 23:58:41.650326] I [server.c:438:server_rpc_notify] 0-hosdu-server: disconnected connection from 10.1.10.21:1022
[2011-10-24 23:58:50.661278] E [socket.c:1395:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1598180427) received from 10.1.10.21:1022
[2011-10-24 23:58:50.661396] I [server.c:438:server_rpc_notify] 0-hosdu-server: disconnected connection from 10.1.10.21:1022
[2011-10-24 23:58:59.672278] E [socket.c:1395:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1598180427) received from 10.1.10.21:1022
[2011-10-24 23:58:59.672325] I [server.c:438:server_rpc_notify] 0-hosdu-server: disconnected connection from 10.1.10.21:1022
[2011-10-24 23:59:00.959274] I [glusterfsd-mgmt.c:62:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2011-10-24 23:59:01.352547] E [rdma.c:4468:rdma_event_handler] 0-rpc-transport/rdma: rdma.hosdu-server: pollin received on tcp socket (peer: 10.1.10.24:1019) after handshake is complete
[2011-10-24 23:59:01.353181] I [server.c:438:server_rpc_notify] 0-hosdu-server: disconnected connection from 10.1.10.24:1019
[2011-10-24 23:59:01.353292] I [server-helpers.c:783:server_connection_destroy] 0-hosdu-server: destroyed connection of client4-5543-2011/10/24-23:57:35:247380-hosdu-client-1
[2011-10-24 23:59:02.30027] E [server.c:609:reconfigure] 0-hosdu-server: Reconfigure not found for transport
[2011-10-24 23:59:05.518636] I [server-handshake.c:542:server_setvolume] 0-hosdu-server: accepted client from 10.1.10.21:1020 (version: 3.2.5qa1)

--- Additional comment from raghavendra on 2011-11-16 22:44:59 EST ---

On RHEL5, iozone on nfs completes without any issues with rdma transport b/w nfs-server and gluster server.

# uname -a
Linux client13 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:54:20 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

This was b/w client13 and client12.

@MS,
Can you please confirm whether iozone works fine on NFS mounts on SSA?

regards,
Raghavendra.

--- Additional comment from amarts on 2012-02-27 05:36:03 EST ---

This is the priority for immediate future (before 3.3.0 GA release). Will bump the priority up once we take RDMA related tasks.

Comment 3 Amar Tumballi 2012-12-06 05:21:56 UTC
RDMA related issues have been reduced in priority for now. Will be working on them parallelly, but with medium priority.