Bug 765369 (GLUSTER-3637)

Summary: [glusterfs-3.3.0qa11]: nfs server crashed due to frame->local being NULL in afr
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: pre-releaseCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Raghavendra Bhat 2011-09-27 04:43:58 UTC
gluster nfs server crashed while sanity scripts was running.

Setup:
2 replica volume.

1 fuse client and 1 nfs client

Both the clients were executing sanity script and volume set operation was happening parallely. Killed one of the bricks after few minutes of tests running, slept for 15-20 minutes and restarted the volume.

gluster nfs server crashed with the following backtrace.


Core was generated by `/usr/local/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002aaaaad0f191 in afr_changelog_post_op_cbk (frame=0x2b97ee4a0228, cookie=0x2b97ee1ee9d4, this=0x134934c0, op_ret=0, op_errno=22,
    xattr=0x134ac210) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:350
350                     call_count = --local->call_count;
(gdb) bt
#0  0x00002aaaaad0f191 in afr_changelog_post_op_cbk (frame=0x2b97ee4a0228, cookie=0x2b97ee1ee9d4, this=0x134934c0, op_ret=0, op_errno=22, 
    xattr=0x134ac210) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:350
#1  0x00002aaaaaac5192 in client3_1_xattrop_cbk (req=0x2aaaad72b4f8, iov=0x2aaaad72b538, count=1, myframe=0x2b97ee1ee9d4)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1425
#2  0x00002b97ed55c25e in rpc_clnt_handle_reply (clnt=0x134a6ec0, pollin=0x134f1610) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:789
#3  0x00002b97ed55c586 in rpc_clnt_notify (trans=0x134a71f0, mydata=0x134a6ef0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x134f1610)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:902
#4  0x00002b97ed5589f3 in rpc_transport_notify (this=0x134a71f0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x134f1610)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#5  0x00002aaaad4daea7 in socket_event_poll_in (this=0x134a71f0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1675
#6  0x00002aaaad4db3e9 in socket_event_handler (fd=6, idx=3, data=0x134a71f0, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1790
#7  0x00002b97ed30484c in event_dispatch_epoll_handler (event_pool=0x1348b960, events=0x134a9e30, i=0)
    at ../../../libglusterfs/src/event.c:794
#8  0x00002b97ed304a51 in event_dispatch_epoll (event_pool=0x1348b960) at ../../../libglusterfs/src/event.c:856
#9  0x00002b97ed304dab in event_dispatch (event_pool=0x1348b960) at ../../../libglusterfs/src/event.c:956
#10 0x000000000040784d in main (argc=7, argv=0x7fffbc72d2a8) at ../../../glusterfsd/src/glusterfsd.c:1592
(gdb) f 0
#0  0x00002aaaaad0f191 in afr_changelog_post_op_cbk (frame=0x2b97ee4a0228, cookie=0x2b97ee1ee9d4, this=0x134934c0, op_ret=0, op_errno=22, 
    xattr=0x134ac210) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:350
350                     call_count = --local->call_count;
(gdb) p local
$1 = (afr_local_t *) 0x0
(gdb) l
345
346             child_index = (long) cookie;
347
348             LOCK (&frame->lock);
349             {
350                     call_count = --local->call_count;
351             }
352             UNLOCK (&frame->lock);
353
354             if (call_count == 0) {
(gdb) p *frame
$2 = {root = 0x2b97ee4a01a0, parent = 0x0, next = 0x0, prev = 0x0, local = 0x0, this = 0x134934c0, ret = 0, ref_count = 0, lock = 0, 
  cookie = 0x0, complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x0, 
  wind_to = 0x0, unwind_from = 0x0, unwind_to = 0x0}
(gdb) info thr
  4 Thread 13575  0x0000003eb660e838 in do_sigwait () from /lib64/libpthread.so.0
  3 Thread 13576  0x0000003eb660aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 13577  0x0000003eb5e9a541 in nanosleep () from /lib64/libc.so.6
* 1 Thread 13574  0x00002aaaaad0f191 in afr_changelog_post_op_cbk (frame=0x2b97ee4a0228, cookie=0x2b97ee1ee9d4, this=0x134934c0, op_ret=0, 
    op_errno=22, xattr=0x134ac210) at ../../../../../xlators/cluster/afr/src/afr-transaction.c:350
(gdb) t 3
[Switching to thread 3 (Thread 13576)]#0  0x0000003eb660aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x0000003eb660aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b97ed314b61 in syncenv_task (env=0x1348c490) at ../../../libglusterfs/src/syncop.c:194
#2  0x00002b97ed314ca0 in syncenv_processor (thdata=0x1348c490) at ../../../libglusterfs/src/syncop.c:232
#3  0x0000003eb660673d in start_thread () from /lib64/libpthread.so.0
#4  0x0000003eb5ed44bd in clone () from /lib64/libc.so.6
(gdb) f 1
#1  0x00002b97ed314b61 in syncenv_task (env=0x1348c490) at ../../../libglusterfs/src/syncop.c:194
194                             pthread_cond_wait (&env->cond, &env->mutex);
(gdb) l
189             struct synctask  *task = NULL;
190
191             pthread_mutex_lock (&env->mutex);
192             {
193                     while (list_empty (&env->runq))
194                             pthread_cond_wait (&env->cond, &env->mutex);
195
196                     task = list_entry (env->runq.next, struct synctask, all_tasks);
197
198                     list_del_init (&task->all_tasks);
(gdb)

Comment 1 Anand Avati 2011-09-30 02:43:14 UTC
CHANGE: http://review.gluster.com/536 (Change-Id: Ica845035781f47de990e9dcfefdeb37bed99d515) merged in master by Vijay Bellur (vijay)