Bug 764855 (GLUSTER-3123)

Summary: [a7cdaf3de307c96cb55219a0743962ee1e1fc955]: glusterd crashed when started
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: glusterdAssignee: Raghavendra Bhat <rabhat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: gluster-bugs, lakshmipathi
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Raghavendra Bhat 2011-07-04 03:51:23 EDT
glusterd crashed when started with the following backtrace. Operations performed:

1) With some previous git head created a stripe replicate volume 

(gluster volume create dsr replica 3 stripe 2 10.1.12.172:/export/dsr1 10.1.12.172:/export/dsr2 10.1.12.170:/export/dsr1 10.1.12.170:/export/dsr2 10.1.12.173:/export/dsr1 10.1.12.173:/export/dsr2)

2) After some operations and testing did a git pull and then when started glusterd glusterd crashed on one of the machines.



0  gf_print_trace (signum=58) at common-utils.c:347
#1  <signal handler called>
#2  0x00002aaaabfe49ce in client_graph_builder (graph=0x7fffffff9e70, 
    volinfo=0x641330, set_dict=0x63bf40, param=0x0) at glusterd-volgen.c:1610
#3  0x00002aaaabfe33f3 in build_graph_generic (graph=0x7fffffff9e70, 
    volinfo=0x641330, mod_dict=0x63aaf0, param=0x0, 
    builder=0x2aaaabfe444a <client_graph_builder>) at glusterd-volgen.c:1113
#4  0x00002aaaabfe4d8f in build_client_graph (graph=0x7fffffff9e70, 
    volinfo=0x641330, mod_dict=0x63aaf0) at glusterd-volgen.c:1711
#5  0x00002aaaabfe56fe in build_nfs_graph (graph=0x7fffffffaf80, mod_dict=0x0)
    at glusterd-volgen.c:1973
#6  0x00002aaaabfe65d2 in glusterd_create_nfs_volfile ()
    at glusterd-volgen.c:2288
#7  0x00002aaaabfccc18 in glusterd_check_generate_start_nfs ()
    at glusterd-utils.c:2284
#8  0x00002aaaabfcd2ba in glusterd_restart_bricks (conf=0x6379e0)
    at glusterd-utils.c:2413
#9  0x00002aaaabf8bd2b in init (this=0x6341f0) at glusterd.c:731
#10 0x00002aaaaaac8fe2 in __xlator_init (xl=0x6341f0) at xlator.c:1369
#11 0x00002aaaaaac9103 in xlator_init (xl=0x6341f0) at xlator.c:1392

#12 0x00002aaaaab01e0c in glusterfs_graph_init (graph=0x62fed0) at graph.c:328
#13 0x00002aaaaab02476 in glusterfs_graph_activate (graph=0x62fed0, 
    ctx=0x62e010) at graph.c:501
#14 0x0000000000406e9a in glusterfs_process_volfp (ctx=0x62e010, fp=0x62fbe0)
    at glusterfsd.c:1423
#15 0x0000000000406fd2 in glusterfs_volumes_init (ctx=0x62e010)
    at glusterfsd.c:1475
#16 0x00000000004070f7 in main (argc=2, argv=0x7fffffffe438)
    at glusterfsd.c:1523

(gdb) f 2
#2  0x00002aaaabfe49ce in client_graph_builder (graph=0x7fffffff9e70, 
    volinfo=0x641330, set_dict=0x63bf40, param=0x0) at glusterd-volgen.c:1610
1610                            if (i % sub_count == 0) {
(gdb) p i
$17 = 0
(gdb) p sub_count
$18 = 0
(gdb) 


This is the /etc/glusterd/vols/dsr/info file which says the stripe count is 0.


type=3
count=6
status=1
sub_count=6
stripe_count=0
version=8
transport-type=0
volume-id=a7cfdba5-292d-46e0-ad2f-458798b77253
brick-0=10.1.12.172:-export-dsr1
brick-1=10.1.12.172:-export-dsr2
brick-2=10.1.12.170:-export-dsr1
brick-3=10.1.12.170:-export-dsr2
brick-4=10.1.12.173:-export-dsr1
brick-5=10.1.12.173:-export-dsr2


This is the /etc/glusterd/vols/dsr/info from the other machine of the cluster.


type=3
count=6
status=1
sub_count=6
stripe_count=2
version=15
transport-type=0
volume-id=a7cfdba5-292d-46e0-ad2f-458798b77253
brick-0=10.1.12.172:-export-dsr1
brick-1=10.1.12.172:-export-dsr2
brick-2=10.1.12.170:-export-dsr1
brick-3=10.1.12.170:-export-dsr2
brick-4=10.1.12.173:-export-dsr1
brick-5=10.1.12.173:-export-dsr2

Here stripe count is 2.

In glusterd_store_retrieve_volume function we do this.
 if (volinfo->stripe_count)
                volinfo->replica_count = (volinfo->sub_count /
                                          volinfo->stripe_count);


But since we check for volinfo->stripe_count (which is zero here) we do not set the volinfo->replica_count value which is 0. 

In client_graph_builder from volgen we do this.

case GF_CLUSTER_TYPE_STRIPE_REPLICATE:
			/* Replicate after the clients, then stripe */
                        sub_count = volinfo->replica_count;
                        cluster_args = replicate_args;
			break;


It makes sub_count zero and when we do i % sub_count we get the crash.

We will have to check volinfo->replica_count for 0 and return if it is before assigning it to sub_count. And have to investigate why stripe count is zero.
Comment 1 Amar Tumballi 2011-07-05 21:41:44 EDT
Raghavendra Bhat has sent a patch for this. That should fix the crash. I am keeping this bug so I can figure out the root cause why it was written as '0' in first place.
Comment 2 Anand Avati 2011-07-12 05:38:01 EDT
PATCH: http://patches.gluster.com/patch/7743 in master (glusterd: check replica_count for 0 before using it for volume creation in stripe replicate volume)
Comment 3 Raghavendra Bhat 2011-07-25 00:22:15 EDT
Now is fixed since we are checking volinfo->replica_count for zero before assigning it to sub_count and performing the division operation.