This crash from the IRC channel this morning. [2012-05-10 11:13:20.290505] I [glusterfsd.c:1493:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.2.6 [2012-05-10 11:13:20.292919] I [glusterd.c:550:init] 0-management: Using /etc/glusterd as working directory [2012-05-10 11:13:20.294797] C [rdma.c:3934:rdma_init] 0-rpc-transport/rdma: Failed to get IB devices [2012-05-10 11:13:20.294901] E [rdma.c:4813:init] 0-rdma.management: Failed to initialize IB Device [2012-05-10 11:13:20.294931] E [rpc-transport.c:742:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2012-05-10 11:13:20.294960] W [rpcsvc.c:1288:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2012-05-10 11:13:20.295121] I [glusterd.c:88:glusterd_uuid_init] 0-glusterd: retrieved UUID: 9cab7843-ff14-4c64-bdb9-d7f0588d4041 [2012-05-10 11:13:20.322226] E [common-utils.c:125:gf_resolve_ip6] 0-resolver: getaddrinfo failed (No address associated with hostname) [2012-05-10 11:13:20.322298] E [name.c:253:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host BTGlusterPC4 pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2012-05-10 11:13:20 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.2.6 [0xb773e400] /usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(glusterd_friend_sm+0x37)[0xb6073637] /usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(glusterd_peer_rpc_notify+0x201)[0xb6059dd1] /usr/lib/libgfrpc.so.0(rpc_clnt_reconnect+0x106)[0xb76be306] /usr/lib/libgfrpc.so.0(rpc_clnt_start+0x22)[0xb76bee72] /usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(glusterd_rpc_create+0x100)[0xb606d9f0] /usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(glusterd_friend_add+0x350)[0xb606e0d0] /usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(glusterd_store_retrieve_peers+0x428)[0xb60ab6e8] /usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(glusterd_restore+0xa0)[0xb60abc30] /usr/lib/glusterfs/3.2.6/xlator/mgmt/glusterd.so(init+0x15cc)[0xb6059a5c] /usr/lib/libglusterfs.so.0(xlator_init+0x55)[0xb76e0135] /usr/lib/libglusterfs.so.0(glusterfs_graph_init+0x37)[0xb770e927] /usr/lib/libglusterfs.so.0(glusterfs_graph_activate+0xa8)[0xb770f258] /usr/sbin/glusterd(glusterfs_process_volfp+0x126)[0x804d536] /usr/sbin/glusterd(glusterfs_volumes_init+0x184)[0x804d764] /usr/sbin/glusterd(main+0x2ea)[0x804ab7a] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb74f04d3] /usr/sbin/glusterd[0x804ae5d] ---------
This should not be a crash, but it should fail gracefully. The problem arises from using NetworkManager to manage the network. Using static upstart managed network configuration should not produce this problem.
Kaushal, Check if this is still valid bug on master/release-3.3
Doesn't happen on 3.3, on a failure of hostname resolution, the peer is marked as disconnected and glusterd continues its operation. A snippet of the glusterd log file is below . . [2012-06-05 12:02:59.120357] D [rpc-transport.c:248:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/local/lib /glusterfs/3.3git/rpc-transport/socket.so [2012-06-05 12:03:00.132951] E [common-utils.c:125:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or service not known ) [2012-06-05 12:03:00.133021] E [name.c:245:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host arch3 [2012-06-05 12:03:00.133049] D [glusterd-handler.c:2886:glusterd_peer_rpc_notify] 0-management: got RPC_CLNT_DISCONNECT 3 . .
Closing as this is fixed upstream in release-3.3