Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 738046

Summary:	NFS Performance Degradation in RHEL 5
Product:	Red Hat Enterprise Linux 5	Reporter:	Chris Mitchell <cmitchel>
Component:	kernel	Assignee:	Jeff Layton <jlayton>
Status:	CLOSED NOTABUG	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	5.7	CC:	cww, harshula, jlayton, jwest, rwheeler, steved
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-09-27 17:22:33 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 2 Jeff Layton 2011-09-13 18:29:52 UTC

So am I correct that when they use the older kernel (-194 based), that they do not see these roughly 30s delays when writing?

What might be helpful is to test something around 5.6.z and see if you have the same performance issues with it. That would help narrow down whether this is a regression and when it might have crept in.

Clearly, that won't have the fix for the original kswapd deadlock, so this would just be to help us determine when the performance issue might have started for you.

Comment 3 Jeff Layton 2011-09-13 18:44:09 UTC

If the client is writing with 30s gaps in between, then it sounds like the VM subsystem is not attempting to write the data. The NFS client really only issues writes under three conditions:

1) a data-integrity flush -- fsync() or close(). The client needs to flush out data to ensure that it's written out to the server before it releases the inode.

2) memory pressure flush -- the VM is asking us to write out dirty pages so it can free up memory

3) kupdated -- the data has hit the expire interval so kupdated is going to try and write it out. This generally happens every 30s...

The catch here is that if you are not doing fsyncs, and you have gobs of memory then there's little need for the kernel to flush out memory. That leaves kupdated and since it's only going to flush out data that's older than 30s then that may help explain why you only see I/O on that interval.

Comment 4 Harshula Jayasuriya 2011-09-13 19:06:05 UTC

(In reply to comment #2)
> So am I correct that when they use the older kernel (-194 based), that they do
> not see these roughly 30s delays when writing?

Yep. Just the kswapd hangs.

Comment 5 Jeff Layton 2011-09-13 19:21:46 UTC

What would be most helpful here is a self-contained testcase that demonstrates the problem. Could someone at EMC provide something along those lines?

In the absence of that, could you bisect down the problem, starting with pre-built kernels? Our support folks ought to be able to provide you some to test.

Comment 6 Harshula Jayasuriya 2011-09-16 02:58:47 UTC

More data points ...

NFS:
kernel -238 (plus patches from BZ 516490): smooth NFS write I/O
kernel -250: bursty NFS write I/O (~35 second intervals)

Local f/s:
kernel -238 (plus patches from BZ 516490): relatively smooth local f/s I/O
kernel -250: we have one iostat which shows hardly any I/O

Comment 7 Jeff Layton 2011-09-16 11:14:35 UTC

Perhaps it's the patches for 441730 then? If you have a reproducer, it would be nice to try and bisect that down as well...

Comment 8 Jeff Layton 2011-09-16 14:49:19 UTC

Ok, so you've tested a -238 kernel with the patchset for 516490. The thing to do now I think would be to add on the patchset for bug 441730 and test with that. If that does not show the problem, then I'd be more inclined to think this is a problem at the VM layer. Otherwise, we can dive more deeply into the patchset for 441730 and see if we can discern the cause.

Comment 9 Harshula Jayasuriya 2011-09-16 15:41:23 UTC

Kernel -245 is queued up next. Depending on how that goes, might do -250 without the patchset from bug 441730.

Comment 11 Jeff Layton 2011-09-27 17:22:33 UTC

Based on the last set of comments that indicate that this may be a hardware issue, I'm going to go ahead and close this as NOTABUG. If it looks like it is one after all, then please reopen it.