1456696 – Multiple crashes observed on slave side coming from: dht_rmdir_cached_lookup_cbk on 3.2.0_async

Bug 1456696 - Multiple crashes observed on slave side coming from: dht_rmdir_cached_lookup_cbk on 3.2.0_async

Summary: Multiple crashes observed on slave side coming from: dht_rmdir_cached_lookup_...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.2.0 Async
Assignee:	Nithya Balachandran
QA Contact:	Prasad Desala
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-05-30 07:47 UTC by Rahul Hinduja
Modified:	2017-06-08 09:37 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.8.4-18.4
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-06-08 09:37:15 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:1418	0	normal	SHIPPED_LIVE	glusterfs bug fix update	2017-06-08 13:33:58 UTC

Description Rahul Hinduja 2017-05-30 07:47:35 UTC

Description of problem:
=======================

While running the fop (rm -rf) on geo-rep master setup found lots of crashes on slave side with bt:

(gdb) bt
#0  0x00007f570287bc00 in dht_rmdir_do (frame=frame@entry=0x7f570e68b1d0, this=this@entry=0x7f56fc00e5e0) at dht-common.c:7944
#1  0x00007f570287c4ab in dht_rmdir_cached_lookup_cbk (frame=frame@entry=0x7f570e68a06c, cookie=<optimized out>, this=0x7f56fc00e5e0, op_ret=0, op_errno=<optimized out>, inode=<optimized out>, 
    stbuf=stbuf@entry=0x7f56f0021410, xattr=0x7f570de29e88, parent=0x7f56f0021480) at dht-common.c:8137
#2  0x00007f5702b13056 in afr_lookup_done (frame=frame@entry=0x7f570e68ac04, this=this@entry=0x7f56fc00d670) at afr-common.c:2167
#3  0x00007f5702b13a04 in afr_lookup_metadata_heal_check (frame=frame@entry=0x7f570e68ac04, this=0x7f56fc00d670, this@entry=0x8072f15dde9c0700) at afr-common.c:2410
#4  0x00007f5702b14331 in afr_lookup_entry_heal (frame=frame@entry=0x7f570e68ac04, this=0x8072f15dde9c0700, this@entry=0x7f56fc00d670) at afr-common.c:2501
#5  0x00007f5702b1469d in afr_lookup_cbk (frame=frame@entry=0x7f570e68ac04, cookie=<optimized out>, this=0x7f56fc00d670, op_ret=<optimized out>, op_errno=<optimized out>, inode=inode@entry=0x7f56fa6af20c, 
    buf=buf@entry=0x7f56fb48e940, xdata=0x7f570de2a138, postparent=postparent@entry=0x7f56fb48e9b0) at afr-common.c:2549
#6  0x00007f5702d515dd in client3_3_lookup_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f570e68b0fc) at client-rpc-fops.c:2945
#7  0x00007f571097a860 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f56fc090970, pollin=pollin@entry=0x7f56f0004e20) at rpc-clnt.c:794
#8  0x00007f571097ab4f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f56fc0909a0, event=<optimized out>, data=0x7f56f0004e20) at rpc-clnt.c:987
#9  0x00007f57109769f3 in rpc_transport_notify (this=this@entry=0x7f56fc0a0690, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f56f0004e20) at rpc-transport.c:538
#10 0x00007f570523b314 in socket_event_poll_in (this=this@entry=0x7f56fc0a0690) at socket.c:2272
#11 0x00007f570523d7c5 in socket_event_handler (fd=<optimized out>, idx=1, data=0x7f56fc0a0690, poll_in=1, poll_out=0, poll_err=0) at socket.c:2402
#12 0x00007f5710c0a770 in event_dispatch_epoll_handler (event=0x7f56fb48ee80, event_pool=0x7f571156de10) at event-epoll.c:571
#13 event_dispatch_epoll_worker (data=0x7f56fc07bbf0) at event-epoll.c:674
#14 0x00007f570fa11dc5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f570f35673d in clone () from /lib64/libc.so.6
(gdb) 



Version-Release number of selected component (if applicable):
=============================================================

glusterfs-server-3.8.4-18.2.el7rhgs.x86_64


How reproducible:
=================

Always


Steps to Reproduce:
===================
1. Setup geo-rep between master and slave
2. Create data on master 
3. Perform rm -rf on master

Actual results:
===============

Multiple fs process crashed

Comment 8 Rahul Hinduja 2017-06-03 10:22:30 UTC

Verified the same case with the build: glusterfs-3.8.4-18.4.el7rhgs.x86_64

No core is observed at slave and the sync is completed for all fops including rmdir. Moving this bug to verified state...

Comment 10 errata-xmlrpc 2017-06-08 09:37:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1418

Note You need to log in before you can comment on or make changes to this bug.