Bug 763652 (GLUSTER-1920)

Summary: Start and stop glusterd fails to start a previously created volume
Product: [Community] GlusterFS Reporter: Harshavardhana <fharshav>
Component: glusterdAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: urgent    
Version: 3.1.0CC: cww, gluster-bugs, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Harshavardhana 2010-10-11 19:50:55 EDT
[2010-10-11 16:45:27.311338] I [glusterd.c:274:init] management: Using /etc/glusterd as working directory
[2010-10-11 16:45:27.312687] C [rdma.c:3817:rdma_init] rpc-transport/rdma: No IB devices found
[2010-10-11 16:45:27.312714] E [rdma.c:4744:init] rdma.management: Failed to initialize IB Device
[2010-10-11 16:45:27.312730] E [rpc-transport.c:965:rpc_transport_load] rpc-transport: 'rdma' initialization failed
[2010-10-11 16:45:27.312824] I [glusterd.c:86:glusterd_uuid_init] glusterd: retrieved UUID: 1bc70cba-23a1-4565-9ef0-309360442b44
[2010-10-11 16:45:27.312927] E [glusterd-store.c:1092:glusterd_store_retrieve_volume] : Unknown key: brick-0
[2010-10-11 16:45:27.317269] E [glusterd-utils.c:2137:glusterd_friend_find_by_hostname] : error in getaddrinfo: Name or service not known

[2010-10-11 16:45:27.317630] E [glusterd-utils.c:113:glusterd_is_local_addr] : error in getaddrinfo: Name or service not known

[2010-10-11 16:45:27.317650] E [glusterd-store.c:1516:glusterd_resolve_all_bricks] glusterd: resolve brick failed in restore


[root@platform test]# gluster volume start test
Starting volume test has been unsuccessful

[root@platform test]# ps -ef | grep glusterfs
root      7414  9724  0 16:46 pts/2    00:00:00 grep glusterfs
[root@platform test]

[root@platform test]# gluster volume info

Volume Name: test
Type: Distribute
Status: Stopped
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: :

I see that Brick1 is NULL not sure why it is null since i have entries. 

[root@platform test]# ls -l /etc/glusterd/vols/test/
total 24
drwxr-xr-x 2 root root 4096 2010-10-11 16:43 bricks
-rw-r--r-- 1 root root   16 2010-10-11 16:45 cksum
-rw-r--r-- 1 root root  139 2010-10-11 16:43 info
drwxr-xr-x 2 root root 4096 2010-10-11 16:39 run
-rw-r--r-- 1 root root  620 2010-10-11 16:33 test.10.1.10.202.storage.vol
-rw-r--r-- 1 root root  628 2010-10-11 16:33 test-fuse.vol
[root@platform test]# less /etc/glusterd/vols/test/info 
type=0
count=1
status=2
sub_count=0
version=1
transport-type=0
volume-id=0d30ab75-0500-4468-800c-c95b99a8b25c
brick-0=10.1.10.202:-storage
[root@platform test]# less /etc/glusterd/vols/test/cksum 
info=2332094601

PS: this happened after the segfault.
Comment 1 Pranith Kumar K 2011-03-09 23:02:20 EST
This bug is the result of not raising errors when retrieving the glusterd-store for the brick is corrupted. Ideally if there are errors in retrieving the glusterd-store, glusterd should exit giving the error, which is done as part of 2066. This corruption happened because of the bugs in glusterd-store which is fixed as part of 1754. So closing this bug