855787 – glusterfs: client crash while testing statedump

Bug 855787 - glusterfs: client crash while testing statedump

Summary: glusterfs: client crash while testing statedump

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Raghavendra Bhat
QA Contact:	Sudhir D
Docs Contact:
URL:
Whiteboard:
Depends On:	885008
Blocks:
TreeView+	depends on / blocked

Reported:	2012-09-10 09:43 UTC by Sachidananda Urs
Modified:	2013-09-23 22:33 UTC (History)
CC List:	3 users (show)
Fixed In Version:	glusterfs-3.4.0qa8
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-23 22:33:22 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Client log file (93.20 KB, application/octet-stream) 2012-09-10 09:44 UTC, Sachidananda Urs	no flags	Details
Core file (3.49 MB, application/x-bzip) 2012-09-10 09:45 UTC, Sachidananda Urs	no flags	Details
View All

Description Sachidananda Urs 2012-09-10 09:43:07 UTC

Description of problem:
Client crashes when IO is high and graph is changed (gluster volume set/unset).


Version-Release number of selected component (if applicable):
Update 2:
glusterfs 3.3.0rhs built on Sep 10 2012 00:49:11


Steps to Reproduce:
1. Do some IO intensive work on the client.
2. Change the graph multiple times.


Additional info:


#0  0x00007f86176e9214 in client3_1_flush_cbk (req=0x7f8616400c50, iov=0x7f8616400c90, count=<value optimized out>, 
    myframe=0x7f861aeeff94) at client3_1-fops.c:865
#1  0x00007f861be810c5 in rpc_clnt_handle_reply (clnt=0x15b14b0, pollin=0x1edc4c0) at rpc-clnt.c:788
#2  0x00007f861be818c0 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x15b14e0, 
    event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:907
#3  0x00007f861be7d018 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, 
    data=<value optimized out>) at rpc-transport.c:489
#4  0x00007f8618744954 in socket_event_poll_in (this=0x15c0f10) at socket.c:1677
#5  0x00007f8618744a37 in socket_event_handler (fd=<value optimized out>, idx=5, data=0x15c0f10, poll_in=1, 
    poll_out=0, poll_err=<value optimized out>) at socket.c:1792
#6  0x00007f861c0c7d84 in event_dispatch_epoll_handler (event_pool=0x1524e00) at event.c:785
#7  event_dispatch_epoll (event_pool=0x1524e00) at event.c:847
#8  0x00000000004073ca in main (argc=<value optimized out>, argv=0x7fffeae4d888) at glusterfsd.c:1689

Comment 1 Sachidananda Urs 2012-09-10 09:44:58 UTC

Created attachment 611372 [details]
Client log file

Comment 2 Sachidananda Urs 2012-09-10 09:45:45 UTC

Created attachment 611373 [details]
Core file

Comment 4 Amar Tumballi 2012-09-17 05:24:20 UTC

The below patch should fix it...

amar@ganaka:~/work/glusterfs$ git diff
diff --git a/xlators/performance/write-behind/src/write-behind.c b/xlators/performance/write-behind/src/write-behind.c
index ad1e5f0..59cbd00 100644
--- a/xlators/performance/write-behind/src/write-behind.c
+++ b/xlators/performance/write-behind/src/write-behind.c
@@ -2604,7 +2604,8 @@ wb_flush_helper (call_frame_t *frame, xlator_t *this, fd_t *fd, dict_t *xdata)
                 wb_request_unref (local->request);
         }
 
-        if (conf->flush_behind) {
+        int flag = conf->flush_behind;
+        if (flag) {
                 flush_frame = copy_frame (frame);
                 if (flush_frame == NULL) {
                         op_errno = ENOMEM;
@@ -2628,7 +2629,7 @@ wb_flush_helper (call_frame_t *frame, xlator_t *this, fd_t *fd, dict_t *xdata)
                 STACK_DESTROY (process_frame->root);
         }
 
-        if (conf->flush_behind) {
+        if (flag) {
                 STACK_UNWIND_STRICT (flush, frame, op_ret, op_errno, NULL);
         }
 
--------------
also, if the proposed patch @ http://review.gluster.org/3947 goes in, then this race won't exist.

Comment 5 Amar Tumballi 2012-10-06 15:01:59 UTC

patch accepted upstream

Comment 6 Sachidananda Urs 2012-12-18 07:07:17 UTC

Crash still happens and related to bug: https://bugzilla.redhat.com/show_bug.cgi?id=885008

Will continue testing once bug 885008 is fixed.

Comment 7 Amar Tumballi 2012-12-20 07:50:03 UTC

marking as MODIFIED as there is no work for this particular bug, but will keep it in MODIFIED till the blocker bug for verification is fixed.

Comment 8 Sachidananda Urs 2013-03-03 13:45:51 UTC

Verified with load on multiple clients and series of graph changes in a loop on the servers. No crashes seen.

Comment 9 Scott Haines 2013-09-23 22:33:22 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.