1573089 – [GSS] NFS-Ganesha server becoming unresponsive

Bug 1573089 - [GSS] NFS-Ganesha server becoming unresponsive

Summary: [GSS] NFS-Ganesha server becoming unresponsive

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.3
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Kaleb KEITHLEY
QA Contact:	Manisha Saini
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-30 07:49 UTC by nravinas
Modified:	2023-09-15 00:07 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-11-19 10:50:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 17 Kaleb KEITHLEY 2018-05-17 14:39:19 UTC

These 
  dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
messages make me suspect that the box is low on available memory and is thrashing.

When this happens again please collect resident and virt size of the ganesha.nfsd process and the overall memory consumption on the machine.

Thanks

Comment 26 Daniel Gryniewicz 2018-06-07 13:38:59 UTC

NFSv4 runs over TCP, so jumbo frames should make no difference at all, unless one of the machines (or one of the switches between them) had a badly configured MTU.  If all the MTUs were the same (1500 is default for non-jumbo ethernet), then turning on jumbo frames won't make a difference, since TCP streams across multiple frames when packets are sent that are larger than the MTU.

The most likely scenarios are:

1) one machine has a lower MTU than the other, and when (rare) packets are sent that are over that machines MTU, they are silently dropped.  TCP will resend them over and over forever, and they will continue to be dropped.

2) A switch in the network has a lower MTU than the machines, and does not properly fragment large frames.  In this case, it's the switch that drops the frames, and again, TCP will resend forever.

Configuring jumbo frames would fix this by causing them to go through the entire network and set all the MTUs to the new, large value.

Comment 28 Yaniv Kaul 2018-10-25 11:43:53 UTC

What's the next step here?

Comment 32 Red Hat Bugzilla 2023-09-15 00:07:53 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.