Bug 1573089

Summary:	[GSS] NFS-Ganesha server becoming unresponsive
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	nravinas
Component:	nfs-ganesha	Assignee:	Kaleb KEITHLEY <kkeithle>
Status:	CLOSED NOTABUG	QA Contact:	Manisha Saini <msaini>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	rhgs-3.3	CC:	abhishku, amukherj, dang, ffilz, jthottan, kkeithle, mbenjamin, nravinas, pasik, rhs-bugs, sankarshan, storage-qa-internal
Target Milestone:	---	Keywords:	ZStream
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-11-19 10:50:14 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 17 Kaleb KEITHLEY 2018-05-17 14:39:19 UTC

These 
  dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
messages make me suspect that the box is low on available memory and is thrashing.

When this happens again please collect resident and virt size of the ganesha.nfsd process and the overall memory consumption on the machine.

Thanks

Comment 26 Daniel Gryniewicz 2018-06-07 13:38:59 UTC

NFSv4 runs over TCP, so jumbo frames should make no difference at all, unless one of the machines (or one of the switches between them) had a badly configured MTU.  If all the MTUs were the same (1500 is default for non-jumbo ethernet), then turning on jumbo frames won't make a difference, since TCP streams across multiple frames when packets are sent that are larger than the MTU.

The most likely scenarios are:

1) one machine has a lower MTU than the other, and when (rare) packets are sent that are over that machines MTU, they are silently dropped.  TCP will resend them over and over forever, and they will continue to be dropped.

2) A switch in the network has a lower MTU than the machines, and does not properly fragment large frames.  In this case, it's the switch that drops the frames, and again, TCP will resend forever.

Configuring jumbo frames would fix this by causing them to go through the entire network and set all the MTUs to the new, large value.

Comment 28 Yaniv Kaul 2018-10-25 11:43:53 UTC

What's the next step here?

Comment 32 Red Hat Bugzilla 2023-09-15 00:07:53 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days