Bug 764983 (GLUSTER-3251)

Summary: [73eca3be5c5ccc71bbad934338c1ef58ed37c483]: crash due to assert in afr
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: replicateAssignee: Raghavendra Bhat <rabhat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: gluster-bugs, jdarcy, pkarampu, vbhat
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: master Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Raghavendra Bhat 2011-07-26 08:53:04 UTC
glusterfs client crashed in afr_inode_set_read_ctx while doing (untar kernel and rm -rf parallely).

This is the backtrace.



Core was generated by `/usr/local/sbin/glusterfs --volfile-id=mirror --volfile-server=bigbang /mnt/cli'.
Program terminated with signal 6, Aborted.
#0  0x00007f4364d18a75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	../nptl/sysdeps/unix/sysv/linux/raise.c: Transport endpoint is not connected.
	in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0  0x00007f4364d18a75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f4364d1c5c0 in *__GI_abort () at abort.c:92
#2  0x00007f4364d11941 in *__GI___assert_fail (assertion=0x7f436220f621 "read_child >= 0", file=<value optimized out>, line=379, 
    function=0x7f4362211750 "afr_inode_set_read_ctx") at assert.c:81
#3  0x00007f43621f8eee in afr_inode_set_read_ctx (this=0x224c8b0, inode=0x7f435fe094d4, read_child=-1, fresh_children=0x22e7810)
    at ../../../../../xlators/cluster/afr/src/afr-common.c:379
#4  0x00007f43621f9290 in afr_set_read_ctx_from_policy (this=0x224c8b0, inode=0x7f435fe094d4, fresh_children=0x22e7810, prev_read_child=1, 
    config_read_child=-1) at ../../../../../xlators/cluster/afr/src/afr-common.c:498
#5  0x00007f43621b27ba in afr_create_wind_cbk (frame=0x7f4363f8a804, cookie=0x1, this=0x224c8b0, op_ret=-1, op_errno=2, fd=0x7f435fa26044, 
    inode=0x7f435fe094d4, buf=0x7fff932298b0, preparent=0x7fff93229840, postparent=0x7fff932297d0)
    at ../../../../../xlators/cluster/afr/src/afr-dir-write.c:197
#6  0x00007f436243a1ed in client3_1_create_cbk (req=0x7f43603804cc, iov=0x7f436038050c, count=1, myframe=0x7f43641f3608)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1724
#7  0x00007f43656b79b0 in rpc_clnt_handle_reply (clnt=0x2263020, pollin=0x7f435006c2f0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:744
#8  0x00007f43656b7d0f in rpc_clnt_notify (trans=0x22631d0, mydata=0x2263050, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f435006c2f0)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:857
#9  0x00007f43656b400b in rpc_transport_notify (this=0x22631d0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f435006c2f0)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:930
#10 0x00007f4363072184 in socket_event_poll_in (this=0x22631d0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1647
#11 0x00007f4363072708 in socket_event_handler (fd=9, idx=2, data=0x22631d0, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1762
#12 0x00007f4365911976 in event_dispatch_epoll_handler (event_pool=0x223dca0, events=0x2242980, i=0)
    at ../../../libglusterfs/src/event.c:794
#13 0x00007f4365911b99 in event_dispatch_epoll (event_pool=0x223dca0) at ../../../libglusterfs/src/event.c:856
#14 0x00007f4365911f24 in event_dispatch (event_pool=0x223dca0) at ../../../libglusterfs/src/event.c:956
#15 0x0000000000407c59 in main (argc=4, argv=0x7fff93229e58) at ../../../glusterfsd/src/glusterfsd.c:1550
(gdb)  f 3
#3  0x00007f43621f8eee in afr_inode_set_read_ctx (this=0x224c8b0, inode=0x7f435fe094d4, read_child=-1, fresh_children=0x22e7810)
    at ../../../../../xlators/cluster/afr/src/afr-common.c:379
379	        GF_ASSERT (read_child >= 0);
(gdb) p read_child
$1 = -1
(gdb) f 4
#4  0x00007f43621f9290 in afr_set_read_ctx_from_policy (this=0x224c8b0, inode=0x7f435fe094d4, fresh_children=0x22e7810, prev_read_child=1, 
    config_read_child=-1) at ../../../../../xlators/cluster/afr/src/afr-common.c:498
498	        afr_inode_set_read_ctx (this, inode, read_child, fresh_children);
(gdb) l
493	        read_child = afr_select_read_child_from_policy (fresh_children,
494	                                                        priv->child_count,
495	                                                        prev_read_child,
496	                                                        config_read_child,
497	                                                        NULL);
498	        afr_inode_set_read_ctx (this, inode, read_child, fresh_children);
499	}
500	
501	/* afr_next_call_child ()
502	 * This is a common function used by all the read-type fops
(gdb) l afr_select_read_child_from_policy
447	 */
448	int
449	afr_select_read_child_from_policy (int32_t *success_children, int32_t child_count,
450	                                   int32_t prev_read_child,
451	                                   int32_t config_read_child, int32_t *sources)
452	{
453	        int32_t                  read_child   = -1;
454	        int                      i            = 0;
455	
456	        GF_ASSERT (success_children);
(gdb) 
457	
458	        read_child = prev_read_child;
459	        if (afr_is_read_child (success_children, sources, child_count,
460	                               read_child))
461	                goto out;
462	
463	        read_child = config_read_child;
464	        if (afr_is_read_child (success_children, sources, child_count,
465	                               read_child))
466	                goto out;
(gdb) 
for (i = 0; i < child_count; i++) {
469	                read_child = success_children[i];
470	                if (read_child < 0)
471	                        break;
472	                if (afr_is_read_child (success_children, sources, child_count,
473	                                       read_child))
474	                        goto out;
475	        }
476	        read_child = -1;
(gdb) 
477	
478	out:
479	        return read_child;
480	}


We can see that from afr_select_read_child_from_policy read_child can be returned as -1 which should have been checked before calling afr_inode_set_read_ctx.

Comment 1 Pranith Kumar K 2011-08-01 03:03:25 UTC
Johnny already sent a patch for this.

Comment 2 Anand Avati 2011-08-13 13:27:11 UTC
CHANGE: http://review.gluster.com/233 (Change-Id: I447fb6a93cdd77de322cd5ded30673411c4cf79e) merged in master by Vijay Bellur (vijay)

Comment 3 Pranith Kumar K 2011-08-14 11:31:30 UTC
*** Bug 3398 has been marked as a duplicate of this bug. ***

Comment 4 Raghavendra Bhat 2011-08-25 05:30:00 UTC
Checked in glusterfs-3.3 qa released. This crash is not happening.