Bug 820673

Summary: glusterd crash when unable to resolve hostname
Product: [Community] GlusterFS Reporter: Joe Julian <joe>
Component: glusterdAssignee: Kaushal <kaushal>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 3.2.6CC: amarts, gluster-bugs, mikeneiderhauser, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-05 09:13:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joe Julian 2012-05-10 16:20:56 UTC
This crash from the IRC channel this morning.

[2012-05-10 11:13:20.290505] I [glusterfsd.c:1493:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.2.6
[2012-05-10 11:13:20.292919] I [glusterd.c:550:init] 0-management: Using /etc/glusterd as working directory
[2012-05-10 11:13:20.294797] C [rdma.c:3934:rdma_init] 0-rpc-transport/rdma: Failed to get IB devices
[2012-05-10 11:13:20.294901] E [rdma.c:4813:init] 0-rdma.management: Failed to initialize IB Device
[2012-05-10 11:13:20.294931] E [rpc-transport.c:742:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2012-05-10 11:13:20.294960] W [rpcsvc.c:1288:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed
[2012-05-10 11:13:20.295121] I [glusterd.c:88:glusterd_uuid_init] 0-glusterd: retrieved UUID: 9cab7843-ff14-4c64-bdb9-d7f0588d4041
[2012-05-10 11:13:20.322226] E [common-utils.c:125:gf_resolve_ip6] 0-resolver: getaddrinfo failed (No address associated with hostname)
[2012-05-10 11:13:20.322298] E [name.c:253:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host BTGlusterPC4
pending frames:
 
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-05-10 11:13:20
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.2.6
[0xb773e400]
/usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(glusterd_friend_sm+0x37)[0xb6073637]
/usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(glusterd_peer_rpc_notify+0x201)[0xb6059dd1]
/usr/lib/libgfrpc.so.0(rpc_clnt_reconnect+0x106)[0xb76be306]
/usr/lib/libgfrpc.so.0(rpc_clnt_start+0x22)[0xb76bee72]
/usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(glusterd_rpc_create+0x100)[0xb606d9f0]
/usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(glusterd_friend_add+0x350)[0xb606e0d0]
/usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(glusterd_store_retrieve_peers+0x428)[0xb60ab6e8]
/usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(glusterd_restore+0xa0)[0xb60abc30]
/usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(init+0x15cc)[0xb6059a5c]
/usr/lib/libglusterfs.so.0(xlator_init+0x55)[0xb76e0135]
/usr/lib/libglusterfs.so.0(glusterfs_graph_init+0x37)[0xb770e927]
/usr/lib/libglusterfs.so.0(glusterfs_graph_activate+0xa8)[0xb770f258]
/usr/sbin/glusterd(glusterfs_process_volfp+0x126)[0x804d536]
/usr/sbin/glusterd(glusterfs_volumes_init+0x184)[0x804d764]
/usr/sbin/glusterd(main+0x2ea)[0x804ab7a]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb74f04d3]
/usr/sbin/glusterd[0x804ae5d]
---------

Comment 1 Joe Julian 2012-05-21 22:41:29 UTC
This should not be a crash, but it should fail gracefully.

The problem arises from using NetworkManager to manage the network. Using static upstart managed network configuration should not produce this problem.

Comment 2 Amar Tumballi 2012-05-28 10:21:50 UTC
Kaushal, Check if this is still valid bug on master/release-3.3

Comment 3 Kaushal 2012-06-05 06:42:35 UTC
Doesn't happen on 3.3, on a failure of hostname resolution, the peer is marked as disconnected and glusterd continues its operation. A snippet of the glusterd log file is below

.
.
[2012-06-05 12:02:59.120357] D [rpc-transport.c:248:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/local/lib
/glusterfs/3.3git/rpc-transport/socket.so
[2012-06-05 12:03:00.132951] E [common-utils.c:125:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or service not known
)
[2012-06-05 12:03:00.133021] E [name.c:245:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host arch3
[2012-06-05 12:03:00.133049] D [glusterd-handler.c:2886:glusterd_peer_rpc_notify] 0-management: got RPC_CLNT_DISCONNECT 3
.
.

Comment 4 Kaushal 2012-06-05 09:13:55 UTC
Closing as this is fixed upstream in release-3.3