Bug 855787

Summary: glusterfs: client crash while testing statedump
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sachidananda Urs <sac>
Component: glusterfsAssignee: Raghavendra Bhat <rabhat>
Status: CLOSED ERRATA QA Contact: Sudhir D <sdharane>
Severity: unspecified Docs Contact:
Priority: high    
Version: 2.0CC: amarts, rhs-bugs, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0qa8 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-23 22:33:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 885008    
Bug Blocks:    
Attachments:
Description Flags
Client log file
none
Core file none

Description Sachidananda Urs 2012-09-10 09:43:07 UTC
Description of problem:
Client crashes when IO is high and graph is changed (gluster volume set/unset).


Version-Release number of selected component (if applicable):
Update 2:
glusterfs 3.3.0rhs built on Sep 10 2012 00:49:11


Steps to Reproduce:
1. Do some IO intensive work on the client.
2. Change the graph multiple times.


Additional info:


#0  0x00007f86176e9214 in client3_1_flush_cbk (req=0x7f8616400c50, iov=0x7f8616400c90, count=<value optimized out>, 
    myframe=0x7f861aeeff94) at client3_1-fops.c:865
#1  0x00007f861be810c5 in rpc_clnt_handle_reply (clnt=0x15b14b0, pollin=0x1edc4c0) at rpc-clnt.c:788
#2  0x00007f861be818c0 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x15b14e0, 
    event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:907
#3  0x00007f861be7d018 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, 
    data=<value optimized out>) at rpc-transport.c:489
#4  0x00007f8618744954 in socket_event_poll_in (this=0x15c0f10) at socket.c:1677
#5  0x00007f8618744a37 in socket_event_handler (fd=<value optimized out>, idx=5, data=0x15c0f10, poll_in=1, 
    poll_out=0, poll_err=<value optimized out>) at socket.c:1792
#6  0x00007f861c0c7d84 in event_dispatch_epoll_handler (event_pool=0x1524e00) at event.c:785
#7  event_dispatch_epoll (event_pool=0x1524e00) at event.c:847
#8  0x00000000004073ca in main (argc=<value optimized out>, argv=0x7fffeae4d888) at glusterfsd.c:1689

Comment 1 Sachidananda Urs 2012-09-10 09:44:58 UTC
Created attachment 611372 [details]
Client log file

Comment 2 Sachidananda Urs 2012-09-10 09:45:45 UTC
Created attachment 611373 [details]
Core file

Comment 4 Amar Tumballi 2012-09-17 05:24:20 UTC
The below patch should fix it...

amar@ganaka:~/work/glusterfs$ git diff
diff --git a/xlators/performance/write-behind/src/write-behind.c b/xlators/performance/write-behind/src/write-behind.c
index ad1e5f0..59cbd00 100644
--- a/xlators/performance/write-behind/src/write-behind.c
+++ b/xlators/performance/write-behind/src/write-behind.c
@@ -2604,7 +2604,8 @@ wb_flush_helper (call_frame_t *frame, xlator_t *this, fd_t *fd, dict_t *xdata)
                 wb_request_unref (local->request);
         }
 
-        if (conf->flush_behind) {
+        int flag = conf->flush_behind;
+        if (flag) {
                 flush_frame = copy_frame (frame);
                 if (flush_frame == NULL) {
                         op_errno = ENOMEM;
@@ -2628,7 +2629,7 @@ wb_flush_helper (call_frame_t *frame, xlator_t *this, fd_t *fd, dict_t *xdata)
                 STACK_DESTROY (process_frame->root);
         }
 
-        if (conf->flush_behind) {
+        if (flag) {
                 STACK_UNWIND_STRICT (flush, frame, op_ret, op_errno, NULL);
         }
 
--------------
also, if the proposed patch @ http://review.gluster.org/3947 goes in, then this race won't exist.

Comment 5 Amar Tumballi 2012-10-06 15:01:59 UTC
patch accepted upstream

Comment 6 Sachidananda Urs 2012-12-18 07:07:17 UTC
Crash still happens and related to bug: https://bugzilla.redhat.com/show_bug.cgi?id=885008

Will continue testing once bug 885008 is fixed.

Comment 7 Amar Tumballi 2012-12-20 07:50:03 UTC
marking as MODIFIED as there is no work for this particular bug, but will keep it in MODIFIED till the blocker bug for verification is fixed.

Comment 8 Sachidananda Urs 2013-03-03 13:45:51 UTC
Verified with load on multiple clients and series of graph changes in a loop on the servers. No crashes seen.

Comment 9 Scott Haines 2013-09-23 22:33:22 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html