1398331 – With compound fops on, client process crashes when a replica is brought down while IO is in progress

Bug 1398331 - With compound fops on, client process crashes when a replica is brought down while IO is in progress

Summary: With compound fops on, client process crashes when a replica is brought down ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Krutika Dhananjay
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1398333 (view as bug list)
Depends On:	1398226 1398333 1398499 1404982
Blocks:	Gluster-HC-2 1351528
TreeView+	depends on / blocked

Reported:	2016-11-24 13:17 UTC by Krutika Dhananjay
Modified:	2017-03-23 05:51 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.8.4-6
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1398226
Environment:
Last Closed:	2017-03-23 05:51:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Krutika Dhananjay 2016-11-24 13:17:06 UTC

+++ This bug was initially created as a clone of Bug #1398226 +++

Description of problem:


(gdb) bt
#0  0x00007f976ed9169d in afr_pre_op_writev_cbk (frame=0x7f97601255dc, cookie=0x0, this=0x7f976800f860, op_ret=-1, op_errno=107, data=0x0, xdata=0x0)
    at afr-transaction.c:1252
#1  0x00007f976f022d04 in client3_3_compound_cbk (req=0x7f976017bc9c, iov=0x7f976fa53770, count=1, myframe=0x7f97600c11ec) at client-rpc-fops.c:3213
#2  0x00007f977c3764cb in saved_frames_unwind (saved_frames=0x7f9764000bd0) at rpc-clnt.c:369
#3  0x00007f977c376563 in saved_frames_destroy (frames=0x7f9764000bd0) at rpc-clnt.c:386
#4  0x00007f977c376a7c in rpc_clnt_connection_cleanup (conn=0x7f9768060790) at rpc-clnt.c:555
#5  0x00007f977c377523 in rpc_clnt_notify (trans=0x7f9768060bc0, mydata=0x7f9768060790, event=RPC_TRANSPORT_DISCONNECT, data=0x7f9768060bc0) at rpc-clnt.c:901
#6  0x00007f977c373a27 in rpc_transport_notify (this=0x7f9768060bc0, event=RPC_TRANSPORT_DISCONNECT, data=0x7f9768060bc0) at rpc-transport.c:537
#7  0x00007f977151788d in socket_event_poll_err (this=0x7f9768060bc0) at socket.c:1177
#8  0x00007f977151c23d in socket_event_handler (fd=14, idx=3, data=0x7f9768060bc0, poll_in=1, poll_out=0, poll_err=24) at socket.c:2402
#9  0x00007f977c61c323 in event_dispatch_epoll_handler (event_pool=0x1488010, event=0x7f976fa53f20) at event-epoll.c:571
#10 0x00007f977c61c702 in event_dispatch_epoll_worker (data=0x14cb7c0) at event-epoll.c:674
#11 0x00007f977b6025ca in start_thread () from /lib64/libpthread.so.0
#12 0x00007f977aedc0ed in clone () from /lib64/libc.so.6
(gdb) f 1
#1  0x00007f976f022d04 in client3_3_compound_cbk (req=0x7f976017bc9c, iov=0x7f976fa53770, count=1, myframe=0x7f97600c11ec) at client-rpc-fops.c:3213
3213	        CLIENT_STACK_UNWIND (compound, frame, rsp.op_ret,
(gdb) p req->rpc_status
$3 = -1
(gdb) f 0
#0  0x00007f976ed9169d in afr_pre_op_writev_cbk (frame=0x7f97601255dc, cookie=0x0, this=0x7f976800f860, op_ret=-1, op_errno=107, data=0x0, xdata=0x0)
    at afr-transaction.c:1252
1252	        write_args_cbk = &args_cbk->rsp_list[1];
(gdb) p args_cbk
$4 = (compound_args_cbk_t *) 0x0



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Worker Ant on 2016-11-24 08:07:09 EST ---

REVIEW: http://review.gluster.org/15924 (cluster/afr: Handle rpc errors, xdr failures etc with proper NULL checks) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

Comment 2 Pranith Kumar K 2016-11-24 13:19:26 UTC

*** Bug 1398333 has been marked as a duplicate of this bug. ***

Comment 7 SATHEESARAN 2016-12-22 03:52:36 UTC

This bug couldn't be verified as there is a bug - https://bugzilla.redhat.com/show_bug.cgi?id=1404982 -  which blocks verification.

The above mentioned bug results in VM pause issue, when a brick is brought down in replica 3 volume

Comment 8 SATHEESARAN 2017-01-13 07:49:39 UTC

Tested with RHGS 3.2.0 interim build ( glusterfs-3.8.4-11.el7rhgs ) with the following steps :

1. Create a replica 3 and enable compound fops
2. Fuse mounted the volume
3. Created a VM image file on the volume
4. Create a VM with this disk image and started OS installation
5. While the OS installation is in progress, killed one of the brick of the volume

Observation - There is no crash of the client. All works well.

Comment 10 errata-xmlrpc 2017-03-23 05:51:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.