1403714 – Ganesha + Multi-Volume/Single-Mount] - Ganesha crashes during inode_destroy

Bug 1403714 - Ganesha + Multi-Volume/Single-Mount] - Ganesha crashes during inode_destroy

Summary: Ganesha + Multi-Volume/Single-Mount] - Ganesha crashes during inode_destroy

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Jiffin
QA Contact:	Ambarish
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1351528 1400780 1401160
TreeView+	depends on / blocked

Reported:	2016-12-12 09:17 UTC by Ambarish
Modified:	2017-03-28 06:52 UTC (History)
CC List:	15 users (show)
Fixed In Version:	nfs-ganesha-2.4.1-4
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1400780
Environment:
Last Closed:	2017-03-23 06:27:19 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1353561	0	unspecified	CLOSED	Multiple bricks could crash after TCP port probing	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHEA-2017:0493	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.2.0 nfs-ganesha bug fix and enhancement update	2017-03-23 09:19:13 UTC

Internal Links: 1353561

Comment 2 Jiffin 2016-12-12 09:51:27 UTC

copying from https://bugzilla.redhat.com/show_bug.cgi?id=1400780#c9

From BT of core 1 in bz#1400780 and core 2 in bz#1401160 , it is clear that issue will hit only when ganesha is trying to remove a entry from its lru list. By default lru limit for ganesha's MD_CACHE is 25000 and in gfapi layer it is 131072. We suspect crashed occurred when there is race b/w removal of entry from ganesha and gluster layer.
I tried to reproduce similar issue with 3 volumes(two 1x2 and one 1x1) and clients no varying from 4 to 7. Also I tried with lower value for lru limit to 20 for ganesha and 100 for gluster. But never hit this with ongoing I/O's (ran dd and linux untar from different clients). In my setup the I/O continuously ran for atleast 4 hours, then it error out saying "no space left on the device".

But during clean up (rm -rf on same directories from different mount) I have consistently got crash with a similar BT during lru clean up. The crashes are more easily reproduced with lower lru limit value. When I increased the lru value to 150000 in ganesha, crash was not seen(may be it will crash eventually)

Comment 4 Atin Mukherjee 2016-12-14 12:56:28 UTC

Devel ack is provided as the crash is consistently reproducible.

Comment 10 Ambarish 2017-01-20 07:52:52 UTC

The reported issue was not reproducible on Ganesha 2.4.1-6,Gluster 3.8.4-12 on two tries.

Will reopen if hit again during regressions.

Comment 12 errata-xmlrpc 2017-03-23 06:27:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0493.html

Note You need to log in before you can comment on or make changes to this bug.