Bug 1403714

Summary: Ganesha + Multi-Volume/Single-Mount] - Ganesha crashes during inode_destroy
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: nfs-ganeshaAssignee: Jiffin <jthottan>
Status: CLOSED ERRATA QA Contact: Ambarish <asoman>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, asoman, bturner, dang, ffilz, jthottan, kkeithle, mbenjamin, pkarampu, rcyriac, rgowdapp, rhinduja, rhs-bugs, skoduri, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: nfs-ganesha-2.4.1-4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1400780 Environment:
Last Closed: 2017-03-23 06:27:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351528, 1400780, 1401160    

Comment 2 Jiffin 2016-12-12 09:51:27 UTC
copying from https://bugzilla.redhat.com/show_bug.cgi?id=1400780#c9

From BT of core 1 in bz#1400780 and core 2 in bz#1401160 , it is clear that issue will hit only when ganesha is trying to remove a entry from its lru list. By default lru limit for ganesha's MD_CACHE is 25000 and in gfapi layer it is 131072. We suspect crashed occurred when there is race b/w removal of entry from ganesha and gluster layer.
I tried to reproduce similar issue with 3 volumes(two 1x2 and one 1x1) and clients no varying from 4 to 7. Also I tried with lower value for lru limit to 20 for ganesha and 100 for gluster. But never hit this with ongoing I/O's (ran dd and linux untar from different clients). In my setup the I/O continuously ran for atleast 4 hours, then it error out saying "no space left on the device".

But during clean up (rm -rf on same directories from different mount) I have consistently got crash with a similar BT during lru clean up. The crashes are more easily reproduced with lower lru limit value. When I increased the lru value to 150000 in ganesha, crash was not seen(may be it will crash eventually)

Comment 4 Atin Mukherjee 2016-12-14 12:56:28 UTC
Devel ack is provided as the crash is consistently reproducible.

Comment 10 Ambarish 2017-01-20 07:52:52 UTC
The reported issue was not reproducible on Ganesha 2.4.1-6,Gluster 3.8.4-12 on two tries.

Will reopen if hit again during regressions.

Comment 12 errata-xmlrpc 2017-03-23 06:27:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0493.html