Description of problem: Core file created while running tests/basic/afr-heal-quota.t Version-Release number of selected component (if applicable): master How reproducible: Only once. Not able to recreate. 127:root@vm4 regression$ gdb -ex 'set sysroot ./' -ex 'core-file build/install/cores/glusterfs-7968.core' ./build/install/sbin/glusterfsd (gdb) bt #0 0x00007fa6bebe9470 in pthread_mutex_lock () from ./lib64/libpthread.so.0 #1 0x00007fa6b1688b56 in wb_add_head_for_retry (head=0x7fa69c083700) at /home/jenkins/root/workspace/centos6-regression/xlators/performance/write-behind/src/write-behind.c:814 #2 0x00007fa6b16891fd in wb_fulfill_short_write (head=0x7fa69c083700, size=0) at /home/jenkins/root/workspace/centos6-regression/xlators/performance/write-behind/src/write-behind.c:1031 #3 0x00007fa6b1689296 in wb_fulfill_cbk (frame=0x7fa6a4014a30, cookie=0x7fa6a4005ea0, this=0x7fa6ac00e960, op_ret=106496, op_errno=0, prebuf=0x7fa6a401c3e0, postbuf=0x7fa6a401c450, xdata=0x7fa6a400f760) at /home/jenkins/root/workspace/centos6-regression/xlators/performance/write-behind/src/write-behind.c:1052 #4 0x00007fa6b1914100 in dht_writev_cbk (frame=0x7fa6a4005ea0, cookie=0x7fa6ac00b920, this=0x7fa6ac00d2e0, op_ret=106496, op_errno=0, prebuf=0x7fa6a401c3e0, postbuf=0x7fa6a401c450, xdata=0x7fa6a400f760) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-inode-write.c:106 #5 0x00007fa6b1b760d5 in afr_writev_unwind (frame=0x7fa6a400b410, this=0x7fa6ac00b920) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-inode-write.c:247 #6 0x00007fa6b1b765af in afr_writev_wind_cbk (frame=0x7fa6a4005260, cookie=0x1, this=0x7fa6ac00b920, op_ret=106496, op_errno=0, prebuf=0x7fa6ab6fe860, postbuf=0x7fa6ab6fe7f0, xdata=0x7fa6a400f760) at /home/jenkins/root/workspace/centos6-regression/xlators/cluster/afr/src/afr-inode-write.c:394 #7 0x00007fa6b1e14a33 in client3_3_writev_cbk (req=0x7fa6a4001820, iov=0x7fa6a4001860, count=1, myframe=0x7fa6a400bda0) at /home/jenkins/root/workspace/centos6-regression/xlators/protocol/client/src/client-rpc-fops.c:858 #8 0x00007fa6bf6d470e in rpc_clnt_handle_reply (clnt=0x7fa6ac049d90, pollin=0x7fa6a4016e10) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-clnt.c:793 #9 0x00007fa6bf6d4cd8 in rpc_clnt_notify (trans=0x7fa6ac049f90, mydata=0x7fa6ac049dc0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fa6a4016e10) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-clnt.c:986 #10 0x00007fa6bf6d0d6c in rpc_transport_notify (this=0x7fa6ac049f90, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fa6a4016e10) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-lib/src/rpc-transport.c:538 #11 0x00007fa6b42872d2 in socket_event_poll_in (this=0x7fa6ac049f90) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-transport/socket/src/socket.c:2273 #12 0x00007fa6b428787f in socket_event_handler (fd=10, idx=3, data=0x7fa6ac049f90, poll_in=1, poll_out=0, poll_err=0) at /home/jenkins/root/workspace/centos6-regression/rpc/rpc-transport/socket/src/socket.c:2409 #13 0x00007fa6bf97f13a in event_dispatch_epoll_handler (event_pool=0x202eef0, event=0x7fa6ab6fee70) at /home/jenkins/root/workspace/centos6-regression/libglusterfs/src/event-epoll.c:572 #14 0x00007fa6bf97f566 in event_dispatch_epoll_worker (data=0x7fa6ac049ad0) at /home/jenkins/root/workspace/centos6-regression/libglusterfs/src/event-epoll.c:675 #15 0x00007fa6bebe7aa1 in start_thread () from ./lib64/libpthread.so.0 #16 0x00007fa6be550aad in clone () from ./lib64/libc.so.6 (gdb) # (gdb) (gdb) (gdb) f 1 #1 0x00007fa6b1688b56 in wb_add_head_for_retry (head=0x7fa69c083700) at /home/jenkins/root/workspace/centos6-regression/xlators/performance/write-behind/src/write-behind.c:814 warning: Source file is more recent than executable. 814 LOCK (&head->wb_inode->lock); (gdb) l 809 wb_add_head_for_retry (wb_request_t *head) 810 { 811 if (!head) 812 goto out; 813 814 LOCK (&head->wb_inode->lock); 815 { 816 __wb_add_head_for_retry (head); 817 } 818 UNLOCK (&head->wb_inode->lock); (gdb) p head->wb_inode->lock $1 = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}} (gdb) Everything seems to be zero in the lock structure.
Forgot to add the link to the run: https://build.gluster.org/job/centos6-regression/3707/consoleFull
Have not seen it in a long time. But would be good to check this out with latest master and close it.
looks to be a dup of bz 1528558. Particularly the commit msg of the patch exactly says how a corrupted/freedup request can end up in todo list: COMMIT: https://review.gluster.org/19064 committed in master by \"Raghavendra G\" <rgowdapp> with a commit message- performance/write-behind: fix bug while handling short writes The variabled "fulfilled" in wb_fulfill_short_write is not reset to 0 while handling every member of the list. This has some interesting consequences: * If we break from the loop while processing last member of the list head->winds, req is reset to head as the list is a circular one. However, head is already fulfilled and can potentially be freed. So, we end up adding a freed request to wb_inode->todo list. This is the RCA for the crash tracked by the bug associated with this patch (Note that we saw "holder" which is freed in todo list). * If we break from the loop while processing any of the last but one member of the list head->winds, req is set to next member in the list, skipping the current request, even though it is not entirely synced. This can lead to data corruption. The fix is very simple and we've to change the code to make sure "fulfilled" reflects whether the current request is fulfilled or not and it doesn't carry history of previous requests in the list. Change-Id: Ia3d6988175a51c9e08efdb521a7b7938b01f93c8 BUG: 1528558 Signed-off-by: Raghavendra G <rgowdapp>
Given that the failure has not been seen for a long time and there's a patch that seems to match the problem, I'm closing this bug as a duplicate of bug #1528558 *** This bug has been marked as a duplicate of bug 1528558 ***