Description of problem: Client crashes when IO is high and graph is changed (gluster volume set/unset). Version-Release number of selected component (if applicable): Update 2: glusterfs 3.3.0rhs built on Sep 10 2012 00:49:11 Steps to Reproduce: 1. Do some IO intensive work on the client. 2. Change the graph multiple times. Additional info: #0 0x00007f86176e9214 in client3_1_flush_cbk (req=0x7f8616400c50, iov=0x7f8616400c90, count=<value optimized out>, myframe=0x7f861aeeff94) at client3_1-fops.c:865 #1 0x00007f861be810c5 in rpc_clnt_handle_reply (clnt=0x15b14b0, pollin=0x1edc4c0) at rpc-clnt.c:788 #2 0x00007f861be818c0 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x15b14e0, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:907 #3 0x00007f861be7d018 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:489 #4 0x00007f8618744954 in socket_event_poll_in (this=0x15c0f10) at socket.c:1677 #5 0x00007f8618744a37 in socket_event_handler (fd=<value optimized out>, idx=5, data=0x15c0f10, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1792 #6 0x00007f861c0c7d84 in event_dispatch_epoll_handler (event_pool=0x1524e00) at event.c:785 #7 event_dispatch_epoll (event_pool=0x1524e00) at event.c:847 #8 0x00000000004073ca in main (argc=<value optimized out>, argv=0x7fffeae4d888) at glusterfsd.c:1689
Created attachment 611372 [details] Client log file
Created attachment 611373 [details] Core file
The below patch should fix it... amar@ganaka:~/work/glusterfs$ git diff diff --git a/xlators/performance/write-behind/src/write-behind.c b/xlators/performance/write-behind/src/write-behind.c index ad1e5f0..59cbd00 100644 --- a/xlators/performance/write-behind/src/write-behind.c +++ b/xlators/performance/write-behind/src/write-behind.c @@ -2604,7 +2604,8 @@ wb_flush_helper (call_frame_t *frame, xlator_t *this, fd_t *fd, dict_t *xdata) wb_request_unref (local->request); } - if (conf->flush_behind) { + int flag = conf->flush_behind; + if (flag) { flush_frame = copy_frame (frame); if (flush_frame == NULL) { op_errno = ENOMEM; @@ -2628,7 +2629,7 @@ wb_flush_helper (call_frame_t *frame, xlator_t *this, fd_t *fd, dict_t *xdata) STACK_DESTROY (process_frame->root); } - if (conf->flush_behind) { + if (flag) { STACK_UNWIND_STRICT (flush, frame, op_ret, op_errno, NULL); } -------------- also, if the proposed patch @ http://review.gluster.org/3947 goes in, then this race won't exist.
patch accepted upstream
Crash still happens and related to bug: https://bugzilla.redhat.com/show_bug.cgi?id=885008 Will continue testing once bug 885008 is fixed.
marking as MODIFIED as there is no work for this particular bug, but will keep it in MODIFIED till the blocker bug for verification is fixed.
Verified with load on multiple clients and series of graph changes in a loop on the servers. No crashes seen.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html