Created attachment 373
It crashed while trying to restore a brick named ":". Below are the contents of the file: raghu@booradley:/etc/glusterd/vols/local/bricks$ cat /etc/glusterd/vols/local/bricks/: hostname= path= listen-port=0 hostname= path= listen-port=0 I've attached .cmd_log_history. Below is the backtrace: (gdb) bt #0 0xb7d65490 in strncpy () from /lib/libc.so.6 #1 0xb6945fc6 in glusterd_store_retrieve_bricks (volinfo=0x8084c68) at ../../../../../xlators/mgmt/glusterd/src/glusterd-store.c:961 #2 0xb6946760 in glusterd_store_retrieve_volume (volname=0x807aebb "local") at ../../../../../xlators/mgmt/glusterd/src/glusterd-store.c:1108 #3 0xb6946a13 in glusterd_store_retrieve_volumes (this=0x8076808) at ../../../../../xlators/mgmt/glusterd/src/glusterd-store.c:1153 #4 0xb6947dfd in glusterd_restore () at ../../../../../xlators/mgmt/glusterd/src/glusterd-store.c:1536 #5 0xb690f705 in init (this=0x8076808) at ../../../../../xlators/mgmt/glusterd/src/glusterd.c:404 #6 0xb7e994fa in __xlator_init (xl=0x8076808) at ../../../libglusterfs/src/xlator.c:875 #7 0xb7e9960a in xlator_init (xl=0x8076808) at ../../../libglusterfs/src/xlator.c:903 #8 0xb7ec67b9 in glusterfs_graph_init (graph=0x80725e0) at ../../../libglusterfs/src/graph.c:328 #9 0xb7ec6cb3 in glusterfs_graph_activate (graph=0x80725e0, ctx=0x8071008) at ../../../libglusterfs/src/graph.c:491 #10 0x0804d07f in glusterfs_process_volfp (ctx=0x8071008, fp=0x80723c8) at ../../../glusterfsd/src/glusterfsd.c:1316 #11 0x0804d1ab in glusterfs_volumes_init (ctx=0x8071008) at ../../../glusterfsd/src/glusterfsd.c:1362 #12 0x0804d2ad in main (argc=2, argv=0xbfab3464) at ../../../glusterfsd/src/glusterfsd.c:1407 (gdb) f 1 #1 0xb6945fc6 in glusterd_store_retrieve_bricks (volinfo=0x8084c68) at ../../../../../xlators/mgmt/glusterd/src/glusterd-store.c:961 961 strncpy (brickinfo->hostname, value, 1024); (gdb) p value $16 = 0x0 (gdb) p key $17 = 0x8086718 "hostname" raghu@booradley:~/work/gluster.org/git/current/glusterfs.git/build$ cat /etc/hosts # # hosts This file describes a number of hostname-to-address # mappings for the TCP/IP subsystem. It is mostly # used at boot time, when no name servers are running. # On small systems, this file can be used instead of a # "named" name server. Just add the names, addresses # and any aliases to this file... # # By the way, Arnt Gulbrandsen <agulbra.no> says that 127.0.0.1 # should NEVER be named with the name of the machine. It causes problems # for some (stupid) programs, irc and reputedly talk. :^) # # For loopbacking. 127.0.0.1 localhost 127.0.0.1 booradley #192.168.1.13 #booradley.zillionresearch.com booradley 192.168.1.201 n1 192.168.1.202 n2 192.168.1.203 n3 192.168.1.204 n4
PATCH: http://patches.gluster.com/patch/6224 in master (mgmt/glusterd: In store-retrieve exit with error message instead of crashing.)
(In reply to comment #2) > PATCH: http://patches.gluster.com/patch/6224 in master (mgmt/glusterd: In > store-retrieve exit with error message instead of crashing.) An intermediate fix that handled this crash already went in the fix for 2271. I made it a little more robust. If any of the entries in any stores/ or the files it self are missing glusterd should print the error and exit out.
Probed a peer, stopeed glusterd and then removed the entry of the other peer from /etc/glusterd/peers directory. Now started glusterd. It logs the error message. [2011-03-11 16:35:13.165866] D [glusterd-store.c:1610:glusterd_store_retrieve_peers] 0-: Returning with 0 [2011-03-11 16:35:13.165882] D [glusterd-store.c:1640:glusterd_resolve_all_bricks] 0-: Returning with 0 [2011-03-11 16:35:13.165896] D [glusterd-store.c:1667:glusterd_restore] 0-: Returning 0 Given volfile: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option working-directory /etc/glusterd 4: option transport-type socket,rdma 5: option transport.socket.keepalive-time 10 6: option transport.socket.keepalive-interval 2 7: end-volume 8: +------------------------------------------------------------------------------+ [2011-03-11 16:35:13.214601] I [glusterd-handler.c:2611:glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: eaae880d-fa3d-4ba9-a53d-417323598df0 [2011-03-11 16:35:13.214682] I [glusterd-handler.c:379:glusterd_friend_find] 0-glusterd: Unable to find peer by uuid [2011-03-11 16:35:13.248038] I [glusterd-handler.c:391:glusterd_friend_find] 0-glusterd: Unable to find hostname: 192.168.1.104 [2011-03-11 16:35:13.248298] I [glusterd-handler.c:3267:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 192.168.1.104 (24007), ret: 0