Bug 763798 (GLUSTER-2066) - glusterd crashed while trying to restore volumes
Summary: glusterd crashed while trying to restore volumes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-2066
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-09 06:52 UTC by Raghavendra G
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
cmd_log_history (4.05 KB, application/octet-stream)
2010-11-09 03:53 UTC, Raghavendra G
no flags Details

Description Raghavendra G 2010-11-09 03:53:44 UTC
Created attachment 373

Comment 1 Raghavendra G 2010-11-09 06:52:42 UTC
It crashed while trying to restore a brick named ":". Below are the contents of the file:

raghu@booradley:/etc/glusterd/vols/local/bricks$ cat /etc/glusterd/vols/local/bricks/:
hostname=
path=
listen-port=0
hostname=
path=
listen-port=0

I've attached .cmd_log_history.

Below is the backtrace:
(gdb) bt
#0  0xb7d65490 in strncpy () from /lib/libc.so.6
#1  0xb6945fc6 in glusterd_store_retrieve_bricks (volinfo=0x8084c68)
    at ../../../../../xlators/mgmt/glusterd/src/glusterd-store.c:961
#2  0xb6946760 in glusterd_store_retrieve_volume (volname=0x807aebb "local")
    at ../../../../../xlators/mgmt/glusterd/src/glusterd-store.c:1108
#3  0xb6946a13 in glusterd_store_retrieve_volumes (this=0x8076808)
    at ../../../../../xlators/mgmt/glusterd/src/glusterd-store.c:1153
#4  0xb6947dfd in glusterd_restore () at ../../../../../xlators/mgmt/glusterd/src/glusterd-store.c:1536
#5  0xb690f705 in init (this=0x8076808) at ../../../../../xlators/mgmt/glusterd/src/glusterd.c:404
#6  0xb7e994fa in __xlator_init (xl=0x8076808) at ../../../libglusterfs/src/xlator.c:875
#7  0xb7e9960a in xlator_init (xl=0x8076808) at ../../../libglusterfs/src/xlator.c:903
#8  0xb7ec67b9 in glusterfs_graph_init (graph=0x80725e0) at ../../../libglusterfs/src/graph.c:328
#9  0xb7ec6cb3 in glusterfs_graph_activate (graph=0x80725e0, ctx=0x8071008)
    at ../../../libglusterfs/src/graph.c:491
#10 0x0804d07f in glusterfs_process_volfp (ctx=0x8071008, fp=0x80723c8)
    at ../../../glusterfsd/src/glusterfsd.c:1316
#11 0x0804d1ab in glusterfs_volumes_init (ctx=0x8071008) at ../../../glusterfsd/src/glusterfsd.c:1362
#12 0x0804d2ad in main (argc=2, argv=0xbfab3464) at ../../../glusterfsd/src/glusterfsd.c:1407
(gdb) f 1
#1  0xb6945fc6 in glusterd_store_retrieve_bricks (volinfo=0x8084c68)
    at ../../../../../xlators/mgmt/glusterd/src/glusterd-store.c:961
961                                     strncpy (brickinfo->hostname, value, 1024);
(gdb) p value
$16 = 0x0
(gdb) p key
$17 = 0x8086718 "hostname"

raghu@booradley:~/work/gluster.org/git/current/glusterfs.git/build$ cat /etc/hosts
#
# hosts         This file describes a number of hostname-to-address
#               mappings for the TCP/IP subsystem.  It is mostly
#               used at boot time, when no name servers are running.
#               On small systems, this file can be used instead of a
#               "named" name server.  Just add the names, addresses
#               and any aliases to this file...
#
# By the way, Arnt Gulbrandsen <agulbra.no> says that 127.0.0.1
# should NEVER be named with the name of the machine.  It causes problems
# for some (stupid) programs, irc and reputedly talk. :^)
#

# For loopbacking.
127.0.0.1               localhost
127.0.0.1               booradley
#192.168.1.13           #booradley.zillionresearch.com booradley
192.168.1.201           n1
192.168.1.202           n2
192.168.1.203           n3
192.168.1.204           n4

Comment 2 Anand Avati 2011-02-22 07:11:51 UTC
PATCH: http://patches.gluster.com/patch/6224 in master (mgmt/glusterd: In store-retrieve exit with error message instead of crashing.)

Comment 3 Pranith Kumar K 2011-03-11 07:48:16 UTC
(In reply to comment #2)
> PATCH: http://patches.gluster.com/patch/6224 in master (mgmt/glusterd: In
> store-retrieve exit with error message instead of crashing.)

An intermediate fix that handled this crash already went in the fix for 2271. I made it a little more robust. If any of the entries in any stores/ or the files it self are missing glusterd should print the error and exit out.

Comment 4 Raghavendra Bhat 2011-03-11 08:07:43 UTC
Probed a peer, stopeed glusterd and then removed the entry of the other peer from /etc/glusterd/peers directory. Now started glusterd. It logs the error message.




[2011-03-11 16:35:13.165866] D [glusterd-store.c:1610:glusterd_store_retrieve_peers] 0-: Returning with 0
[2011-03-11 16:35:13.165882] D [glusterd-store.c:1640:glusterd_resolve_all_bricks] 0-: Returning with 0
[2011-03-11 16:35:13.165896] D [glusterd-store.c:1667:glusterd_restore] 0-: Returning 0
Given volfile:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option working-directory /etc/glusterd
  4:     option transport-type socket,rdma
  5:     option transport.socket.keepalive-time 10
  6:     option transport.socket.keepalive-interval 2
  7: end-volume
  8: 

+------------------------------------------------------------------------------+
[2011-03-11 16:35:13.214601] I [glusterd-handler.c:2611:glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: eaae880d-fa3d-4ba9-a53d-417323598df0
[2011-03-11 16:35:13.214682] I [glusterd-handler.c:379:glusterd_friend_find] 0-glusterd: Unable to find peer by uuid
[2011-03-11 16:35:13.248038] I [glusterd-handler.c:391:glusterd_friend_find] 0-glusterd: Unable to find hostname: 192.168.1.104
[2011-03-11 16:35:13.248298] I [glusterd-handler.c:3267:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 192.168.1.104 (24007), ret: 0


Note You need to log in before you can comment on or make changes to this bug.