Bug 795634

Summary: glusterd crashed when started
Product: [Community] GlusterFS Reporter: Shwetha Panduranga <shwetha.h.panduranga>
Component: glusterdAssignee: Vivek Agarwal <vagarwal>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: amarts, gluster-bugs, sankarshan, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 13:27:59 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 817967    

Description Shwetha Panduranga 2012-02-21 00:36:13 EST
Description of problem:
gluster peer probe <HOSTNAME> crashed the glusterd 

Program terminated with signal 6, Aborted.
#0  0x0000003af1a32905 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt
#0  0x0000003af1a32905 in raise () from /lib64/libc.so.6
#1  0x0000003af1a340e5 in abort () from /lib64/libc.so.6
#2  0x0000003af1a2b9be in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003af1a2ba80 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f7848f23a17 in glusterd_auth_get_username (volinfo=0x240d220) at glusterd-utils.c:587
#5  0x00007f7848f47fad in volgen_graph_build_clients (graph=0x7fffcd493650, volinfo=0x240d220, set_dict=0x240a0b0, param=0x0) at glusterd-volgen.c:2101
#6  0x00007f7848f4893c in client_graph_builder (graph=0x7fffcd493650, volinfo=0x240d220, set_dict=0x240a0b0, param=0x0) at glusterd-volgen.c:2390
#7  0x00007f7848f461f3 in build_graph_generic (graph=0x7fffcd493650, volinfo=0x240d220, mod_dict=0x2408e10, param=0x0, builder=0x7f7848f488e9 <client_graph_builder>)
    at glusterd-volgen.c:1253
#8  0x00007f7848f48ab5 in build_client_graph (graph=0x7fffcd493650, volinfo=0x240d220, mod_dict=0x2408e10) at glusterd-volgen.c:2436
#9  0x00007f7848f49918 in build_nfs_graph (graph=0x7fffcd493790, mod_dict=0x0) at glusterd-volgen.c:2816
#10 0x00007f7848f4a996 in glusterd_create_global_volfile (builder=0x7f7848f49470 <build_nfs_graph>, filepath=0x7fffcd493880 "/etc/glusterd/nfs/nfs-server.vol", 
    mod_dict=0x0) at glusterd-volgen.c:3154
#11 0x00007f7848f4aa56 in glusterd_create_nfs_volfile () at glusterd-volgen.c:3171
#12 0x00007f7848f2bb3c in glusterd_check_generate_start_service (create_volfile=0x7f7848f4a9d7 <glusterd_create_nfs_volfile>, 
    stop=0x7f7848f2b92e <glusterd_nfs_server_stop>, start=0x7f7848f2b596 <glusterd_nfs_server_start>) at glusterd-utils.c:2845
#13 0x00007f7848f2bc0c in glusterd_check_generate_start_nfs () at glusterd-utils.c:2884
#14 0x00007f7848f2bc81 in glusterd_nodesvcs_batch_op (volinfo=0x0, nfs_op=0x7f7848f2bbda <glusterd_check_generate_start_nfs>, 
    shd_op=0x7f7848f2bc14 <glusterd_check_generate_start_shd>) at glusterd-utils.c:2909
#15 0x00007f7848f2bf87 in glusterd_nodesvcs_handle_graph_change (volinfo=0x0) at glusterd-utils.c:2996
#16 0x00007f7848f2c544 in glusterd_restart_bricks (conf=0x24038c0) at glusterd-utils.c:3121
#17 0x00007f7848efc4ec in init (this=0x23ff160) at glusterd.c:998
#18 0x00007f784d4e827e in __xlator_init (xl=0x23ff160) at xlator.c:365
#19 0x00007f784d4e83a8 in xlator_init (xl=0x23ff160) at xlator.c:388
#20 0x00007f784d529813 in glusterfs_graph_init (graph=0x23fad90) at graph.c:300
#21 0x00007f784d529fb9 in glusterfs_graph_activate (graph=0x23fad90, ctx=0x23de010) at graph.c:490
#22 0x00000000004078c2 in glusterfs_process_volfp (ctx=0x23de010, fp=0x23fab30) at glusterfsd.c:1485
#23 0x0000000000407a1b in glusterfs_volumes_init (ctx=0x23de010) at glusterfsd.c:1537
#24 0x0000000000407c42 in main (argc=1, argv=0x7fffcd497d38) at glusterfsd.c:1597
(gdb) f 4
#4  0x00007f7848f23a17 in glusterd_auth_get_username (volinfo=0x240d220) at glusterd-utils.c:587
587	        GF_ASSERT (volinfo->auth.username);
(gdb) l
582	
583	char *
584	glusterd_auth_get_username (glusterd_volinfo_t *volinfo) {
585	
586	        GF_ASSERT (volinfo);
587	        GF_ASSERT (volinfo->auth.username);
588	
589	        return volinfo->auth.username;
590	}
(gdb) p *volinfo
$2 = {volname = "datastore", '\000' <repeats 990 times>, type = 2, brick_count = 4, vol_list = {next = 0x2404930, prev = 0x2404930}, bricks = {next = 0x24107f0, 
    prev = 0x2414890}, status = GLUSTERD_STATUS_STARTED, sub_count = 2, stripe_count = 1, replica_count = 2, dist_leaf_count = 2, port = 0, shandle = 0x24036c0, 
  rb_shandle = 0x2415c30, defrag_status = GF_DEFRAG_STATUS_NOT_STARTED, rebalance_files = 0, rebalance_data = 0, lookedup_files = 0, defrag = 0x0, defrag_cmd = 0, 
  rb_status = GF_RB_STATUS_STARTED, src_brick = 0x2415c80, dst_brick = 0x2417110, version = 16, cksum = 4085574853, transport_type = GF_TRANSPORT_TCP, 
  nfs_transport_type = GF_TRANSPORT_TCP, dict = 0x2402930, volume_id = "\311\332\177\213\023\001I\217\266\242\377\177\\\354<\254", auth = {username = 0x0, 
    password = 0x0}, logdir = 0x0, gsync_slaves = 0x2403760, decommission_in_progress = 0, xl = 0x23ff160}


Version-Release number of selected component (if applicable):
mainline

How reproducible:
often

Steps to Reproduce:
1.gluster peer probe <HOSTNAME>
Comment 1 Shwetha Panduranga 2012-02-21 00:49:59 EST
Glusterd crashed even before the peer probe operation. Hence changing the summary message.
Comment 2 Rajesh 2012-02-21 01:08:13 EST
was the volume created using an earlier version of the master? try deleting all the volumes and creating a fresh volume.
The problem here is glusterd looks to regain the username/password combo from the per-volume "info" file. volumes created with versions before this patch went in dont have them.
Comment 3 Vijay Bellur 2012-02-21 01:19:11 EST
(In reply to comment #2)
> was the volume created using an earlier version of the master? try deleting all
> the volumes and creating a fresh volume.
> The problem here is glusterd looks to regain the username/password combo from
> the per-volume "info" file. volumes created with versions before this patch
> went in dont have them.

this can't be an acceptable behavior. Ignore existing volumes for now and have username/password set only for new volumes. For existing volumes, let us provide an upgrade path.
Comment 4 Rajesh 2012-02-21 01:31:58 EST
this has collateral effects on the nfs servers, self-heal daemons and other operations too. what the bug is asking is n-1 patch-level (instead of version-level) compatibility. Even the alpha is not yet released.
Comment 5 Vijay Bellur 2012-02-21 01:41:23 EST
(In reply to comment #4)
> this has collateral effects on the nfs servers, self-heal daemons and other
> operations too. what the bug is asking is n-1 patch-level (instead of
> version-level) compatibility. Even the alpha is not yet released.

How do you handle upgrades from volumes created in 3.2? If that works, then we are fine. Or do we fail always if the info file doesn't have username, password keys?
Comment 6 Anand Avati 2012-02-21 06:19:07 EST
CHANGE: http://review.gluster.com/2779 (glusterd/auth: 3.2.x compatibility) merged in master by Vijay Bellur (vijay@gluster.com)
Comment 7 Shwetha Panduranga 2012-02-23 04:45:03 EST
Verified on mainline. Working fine.