Bug 764025 - (GLUSTER-2293) crash in rpc connect
crash in rpc connect
Status: CLOSED WORKSFORME
Product: GlusterFS
Classification: Community
Component: protocol (Show other bugs)
3.1.1
All Linux
low Severity high
: ---
: ---
Assigned To: Raghavendra G
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-01-14 00:51 EST by Amar Tumballi
Modified: 2013-12-18 19:05 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Amar Tumballi 2011-01-14 00:51:30 EST
I have two servers with SLES11 SP1 x86_64 and compiled last version of glusterfs 3.1.1.
firewall is disabled on both nodes and they are on the same network.

I put both hostnames in the hosts file, so that each node can resolv the others hostname correctly
192.168.8.104   virt-zabbix-02
192.168.8.105   virt-zabbix-03

this is my config on both nodes: "/etc/glusterfs/glusterd.vol"
volume management
   type mgmt/glusterd
   option working-directory /etc/glusterd
   option transport-type socket,rdma
   option transport.socket.keepalive-time 10
   option transport.socket.keepalive-interval 2
end-volume

virt-zabbix-02# gluster peer status
No peers present

log:
[2011-01-13 19:53:31.576554] I [glusterd-handler.c:674:glusterd_handle_cli_list_friends] glusterd: Received cli list req

this is okay, but then, when I want to add the other node to the cluster, the "glusterfsd" dies on "virt-zabbix-02" where I type the command and a core-dump file is generated:
virt-zabbix-02# gluster peer probe virt-zabbix-03

log virt-zabbix-02:
[2011-01-13 19:54:29.284735] I [glusterd-handler.c:563:glusterd_handle_cli_probe] glusterd: Received CLI probe req virt-zabbix-03 24007
[2011-01-13 19:54:29.285110] I [glusterd-handler.c:398:glusterd_friend_find] glusterd: Unable to find hostname: virt-zabbix-03
[2011-01-13 19:54:29.285136] I [glusterd-handler.c:2618:glusterd_probe_begin] glusterd: Unable to find peerinfo for host: virt-zabbix-03 (24007)
[2011-01-13 19:54:29.287625] W [rpc-transport.c:849:rpc_transport_load] rpc-transport: missing 'option transport-type'. defaulting to "socket"
[2011-01-13 19:54:29.288496] I [glusterd-handler.c:2600:glusterd_friend_add] glusterd: connect returned 0
[2011-01-13 19:54:29.293369] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] glusterd: Friend virt-zabbix-03 found.. state: 0
[2011-01-13 19:54:29.302062] I [glusterd3_1-mops.c:80:glusterd3_1_probe_cbk] glusterd: Received probe resp from uuid: 255540da-4b86-46f2-963c-3214e2c5e28a, host: virt-zabbix-03
[2011-01-13 19:54:29.302097] I [glusterd-handler.c:386:glusterd_friend_find] glusterd: Unable to find peer by uuid
[2011-01-13 19:54:29.302111] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] glusterd: Friend virt-zabbix-03 found.. state: 0
pending frames:

patchset: v3.1.1
signal received: 11
time of crash: 2011-01-13 19:54:29
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.1.1
/lib64/libc.so.6(+0x329e0)[0x7f1cbbb589e0]
/usr/lib64/libgfrpc.so.0(rpc_transport_connect+0xc)[0x7f1cbc4c506c]
/usr/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3d8)[0x7f1cbc4ca878]
/usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd_submit_request+0x15e)[0x7f1cba4203be]
/usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd3_1_friend_add+0x11b)[0x7f1cba424f3b]
/usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(+0x27b17)[0x7f1cba40db17]
/usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd_friend_sm+0x175)[0x7f1cba40d675]
/usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd3_1_probe_cbk+0x495)[0x7f1cba4281f5]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa4)[0x7f1cbc4c9a94]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xc8)[0x7f1cbc4c9cd8]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2e)[0x7f1cbc4c4f2e]
/usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x7f1cba1def9f]
/usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x114)[0x7f1cba1df0d4]
/usr/lib64/libglusterfs.so.0(+0x3a384)[0x7f1cbc70b384]
/usr/sbin/glusterd(main+0x23c)[0x4055dc]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x7f1cbbb44bc6]
/usr/sbin/glusterd[0x4032c9]
---------

log virt-zabbix-03:
[2011-01-13 19:54:29.296723] I [glusterd-handler.c:2387:glusterd_handle_probe_query] glusterd: Received probe from uuid: a9b660c5-456d-4e96-9bdd-d23c917ae941
[2011-01-13 19:54:29.296802] I [glusterd-handler.c:386:glusterd_friend_find] glusterd: Unable to find peer by uuid
[2011-01-13 19:54:29.297224] I [glusterd-handler.c:398:glusterd_friend_find] glusterd: Unable to find hostname: 192.168.8.104
[2011-01-13 19:54:29.297278] I [glusterd-handler.c:2401:glusterd_handle_probe_query] glusterd: Unable to find peerinfo for host: 192.168.8.104 (24007)
[2011-01-13 19:54:29.300119] W [rpc-transport.c:849:rpc_transport_load] rpc-transport: missing 'option transport-type'. defaulting to "socket"
[2011-01-13 19:54:29.304856] I [glusterd-handler.c:2600:glusterd_friend_add] glusterd: connect returned 0
[2011-01-13 19:54:29.304994] I [glusterd-handler.c:2422:glusterd_handle_probe_query] glusterd: Responded to virt-zabbix-03, op_ret: 0, op_errno: 0, ret: 0
[2011-01-13 19:54:35.314773] E [socket.c:1656:socket_connect_finish] management: connection to 192.168.8.104:24007 failed (Connection refused)


so I start the "gluserfsd" on virt-zabbix-02 again - a few secounds later the glusterfsd dies on the other node virt-zabbix-03 and there also a core-dump file is generated

log virt-zabbix-02:
[2011-01-13 19:57:08.911495] I [glusterd-handler.c:2387:glusterd_handle_probe_query] glusterd: Received probe from uuid: 255540da-4b86-46f2-963c-3214e2c5e28a
[2011-01-13 19:57:08.911559] I [glusterd-handler.c:386:glusterd_friend_find] glusterd: Unable to find peer by uuid
[2011-01-13 19:57:08.911643] I [glusterd-utils.c:2140:glusterd_friend_find_by_hostname] glusterd: Friend 192.168.8.105 found.. state: 0
[2011-01-13 19:57:08.911715] I [glusterd-handler.c:2422:glusterd_handle_probe_query] glusterd: Responded to 192.168.8.104, op_ret: 0, op_errno: 0, ret: 0
[2011-01-13 19:57:11.956152] E [socket.c:1656:socket_connect_finish] management: connection to 192.168.8.105:24007 failed (Connection refused)


log virt-zabbix-03:
[2011-01-13 19:57:08.913897] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] glusterd: Friend 192.168.8.104 found.. state: 0
[2011-01-13 19:57:08.915052] I [glusterd3_1-mops.c:80:glusterd3_1_probe_cbk] glusterd: Received probe resp from uuid: a9b660c5-456d-4e96-9bdd-d23c917ae941, host: 192.168.8.104
[2011-01-13 19:57:08.915085] I [glusterd-handler.c:386:glusterd_friend_find] glusterd: Unable to find peer by uuid
[2011-01-13 19:57:08.915100] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] glusterd: Friend 192.168.8.104 found.. state: 0
pending frames:

patchset: v3.1.1
signal received: 11
time of crash: 2011-01-13 19:57:08
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.1.1
/lib64/libc.so.6(+0x329e0)[0x7fe84e6ee9e0]
/usr/lib64/libgfrpc.so.0(rpc_transport_connect+0xc)[0x7fe84f05b06c]
/usr/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3d8)[0x7fe84f060878]
/usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd_submit_request+0x15e)[0x7fe84cfb63be]
/usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd3_1_friend_add+0x11b)[0x7fe84cfbaf3b]
/usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(+0x27b17)[0x7fe84cfa3b17]
/usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd_friend_sm+0x175)[0x7fe84cfa3675]
/usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd3_1_probe_cbk+0x495)[0x7fe84cfbe1f5]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa4)[0x7fe84f05fa94]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xc8)[0x7fe84f05fcd8]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2e)[0x7fe84f05af2e]
/usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x7fe84cd74f9f]
/usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x114)[0x7fe84cd750d4]
/usr/lib64/libglusterfs.so.0(+0x3a384)[0x7fe84f2a1384]
/usr/sbin/glusterd(main+0x23c)[0x4055dc]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x7fe84e6dabc6]
/usr/sbin/glusterd[0x4032c9]
---------


starting the glusterfsd on virt-zabbix-03 again, let die the glusterfsd on virt-zabbix-02 and so on
so I make sure the daemon is stopped on both hosts.
the peer file generated on the nodes are different one is named with the hostname, the other with the IP:
virt-zabbix-02:#  cat /etc/glusterd/peers/virt-zabbix-03
uuid=
state=0
hostname1=virt-zabbix-03

virt-zabbix-03:# cat /etc/glusterd/peers/192.168.8.104
uuid=
state=0
hostname1=192.168.8.104


so I see the uuid is empty in both files and I fill it with the uuid from each others "/etc/glusterd/glusterd.info" file:
virt-zabbix-02:/ # cat /etc/glusterd/glusterd.info
UUID=a9b660c5-456d-4e96-9bdd-d23c917ae941
virt-zabbix-03:/ # cat etc/glusterd/glusterd.info
UUID=255540da-4b86-46f2-963c-3214e2c5e28a

virt-zabbix-02:/ # cat /etc/glusterd/peers/virt-zabbix-03
uuid=255540da-4b86-46f2-963c-3214e2c5e28a
state=0
hostname1=virt-zabbix-03

virt-zabbix-03:/ # cat /etc/glusterd/peers/192.168.8.104
uuid=a9b660c5-456d-4e96-9bdd-d23c917ae941
state=0
hostname1=192.168.8.104


now I start "glusterfsd" on both nodes again and both daemons keep running and I can type the command:
virt-zabbix-02:/ # gluster peer status
Number of Peers: 1

Hostname: virt-zabbix-03
Uuid: 255540da-4b86-46f2-963c-3214e2c5e28a
State: Establishing Connection (Connected)

I'd like to create my first test volume:
gluster volume create mytest transport tcp virt-zabbix-02:/gfs1 virt-zabbix-03:/gfs1
Creation of volume mytest has been unsuccessful
Host virt-zabbix-03 not connected

log virt-zabbix-02:
[2011-01-13 20:11:10.706931] I [glusterd-handler.c:674:glusterd_handle_cli_list_friends] glusterd: Received cli list req
[2011-01-13 20:12:20.950199] I [glusterd-handler.c:785:glusterd_handle_create_volume] glusterd: Received create volume req
[2011-01-13 20:12:20.950907] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] glusterd: Friend virt-zabbix-03 found.. state: 0
[2011-01-13 20:12:20.950935] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Establishing Connection
[2011-01-13 20:12:20.950950] E [glusterd-utils.c:2324:glusterd_new_brick_validate] glusterd: Host virt-zabbix-03 not connected
[2011-01-13 20:12:20.951005] E [glusterd-handler.c:906:glusterd_handle_create_volume] glusterd: Unlock on opinfo failed

no logfiles on virt-zabbix-03

not connected? strange! status info again:
virt-zabbix-02:/ # gluster peer status
Number of Peers: 1

Hostname: virt-zabbix-03
Uuid: 255540da-4b86-46f2-963c-3214e2c5e28a
State: Establishing Connection (Connected)

log virt-zabbix-02:
[2011-01-13 20:13:24.601901] I [glusterd-handler.c:674:glusterd_handle_cli_list_friends] glusterd: Received cli list req


so I restart the glusterfsd on virt-zabbix-03 and the daemon on virt-zabbix-02 dies again

has some one any idea whats going wrong?

kind regards
Comment 1 Amar Tumballi 2011-01-20 03:12:33 EST
More info of this bug in the mailing list: 

http://gluster.org/pipermail/gluster-users/2011-January/006394.html
Comment 2 Raghavendra G 2011-01-27 04:22:02 EST
After asking to rebuild glusterfs with debugging enabled, the crash is resolved.

<Comments from Markus>

I added the compiler flags into the SPEC file and compiled again, removed the old RPMs and deleted
the whole /etc/gluser* dirs, configfiles and logfiles
then I installed the new RPMs and started glusterd and ran the mgmt commands - the strange thing:
NOW IT WORKS!!!
no segfault, no coredump

that means, one or both of the CFLAGS solves the segfault.
I'll going on compiling v 3.1.2 with the CFLAGS too and testing it.
</Comments>

Most likely the libraries of different versions of glusterfs got mixed up causing the crash. Hence marking this bug as resolved.

regards,
Raghavendra

Note You need to log in before you can comment on or make changes to this bug.