Bug 396081

Summary: Since "Patch2037: linux-2.6.9-vm-balance.patch" my NFS performance is poorly
Product: Red Hat Enterprise Linux 4 Reporter: Marcus Alves Grando <marcus>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.0CC: ddomingo, jbaron, jlayton, k.georgiou, lwoodman, mikel, mmayer, sev, sputhenp, staubach, steved
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: GSSApproved
Fixed In Version: RHSA-2008-0665 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 19:22:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 391231, 438476, 438477    
Attachments:
Description Flags
tested 4.5-zstream patch to fix this problem
none
Tested 4.6 zstream patch that fixes this problem. none

Description Marcus Alves Grando 2007-11-22 20:50:23 UTC
Since Patch2041 applied in 2.6.9-55.0.(something > 2) my nfs performance is bad.
Trying to find out, i see this:

# ps ax -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN | grep blk_congestion_wait
13392 D    trrlmtpd_tcp     blk_congestion_wait
13400 D    trrlmtpd_tcp     blk_congestion_wait
13401 D    trrlmtpd_tcp     blk_congestion_wait
13402 D    trrlmtpd_tcp     blk_congestion_wait
13403 D    trrlmtpd_tcp     blk_congestion_wait
13413 D    trrlmtpd_tcp     blk_congestion_wait
13414 D    trrlmtpd_tcp     blk_congestion_wait
13416 D    trrlmtpd_tcp     blk_congestion_wait
13424 D    trrlmtpd_tcp     blk_congestion_wait
13435 D    trrlmtpd_tcp     blk_congestion_wait
13436 D    trrlmtpd_tcp     blk_congestion_wait
13441 D    trrlmtpd_tcp     blk_congestion_wait
13455 D    trrlmtpd_tcp     blk_congestion_wait

And when many process try to write/read on NFS, load of this server goes to +150.

Please, release a new kernel without this. I know that 2.6.9-68 already revert
this and because that i found the problem.

Thanks

Comment 1 Marcus Alves Grando 2007-11-22 21:02:10 UTC
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=205722

Comment 2 Marcus Alves Grando 2007-11-23 01:42:25 UTC
--- kernel-2.6.9-55.0.2.spec	2007-11-22 23:16:38.000000000 -0200
+++ kernel-2.6.9-55.0.12.spec	2007-11-22 23:37:06.000000000 -0200
@@ -30,7 +30,7 @@
 # that the kernel isn't the stock distribution kernel, for example by
 # adding some text to the end of the version number.
 #
-%define release 55.0.2.EL
+%define release 55.0.12.EL
 %define sublevel 9
 %define kversion 2.6.%{sublevel}
 %define rpmversion 2.6.%{sublevel}
@@ -332,6 +332,8 @@
 Patch282: linux-2.6.9-x86-acpi-skip-timer-nvidia.patch
 Patch283: linux-2.6.9-x8664-e820.patch
 Patch284: linux-2.6.9-x86-nvidia-hpet.patch
+Patch285: linux-2.6.9-x86_64-zero-extend-regs.patch
+Patch286: linux-2.6.9-x8664-mtrr-updates.patch
 
 # 300 - 399   ppc(64)
 Patch300: linux-2.6.2-ppc64-build.patch
@@ -977,6 +979,7 @@
 Patch2034: linux-2.6.9-busy-inodes.patch
 Patch2035: linux-2.6.9-vm-deadlock.patch
 Patch2036: linux-2.6.9-vm-swap-io-error.patch
+Patch2037: linux-2.6.9-vm-balance.patch
 
 # IDE bits.
 Patch2100: linux-2.6.9-ide-csb6-raid.patch
@@ -1146,6 +1149,7 @@
 Patch2577: linux-2.6.9-fs-prevent-inode-overflow.patch
 Patch2578: linux-2.6.9-fs-lustre-support.patch
 Patch2579: linux-2.6.9-fs-inode-accounting.patch
+Patch2580: linux-2.6.9-fs-leak.patch
 
 # Device Mapper patches
 Patch2600: linux-2.6.13-dm-swap-error.patch
@@ -1302,6 +1306,7 @@
 Patch4080: linux-2.6.9-core-dump.patch
 Patch4081: linux-2.6.9-compat.patch
 Patch4082: linux-2.6.9-bluetooth.patch
+Patch4083: linux-2.6.9-pdeath-signal-suid.patch
 
 # ALSA fixes.
 Patch4100: linux-2.6.9-alsa-vx222-newid.patch
@@ -1391,7 +1396,10 @@
 Patch5070: linux-2.6.9-CVE-2006-4538-ia64-corrupt-elf.patch
 Patch5071: linux-2.6.9-CVE-2006-5823-cramfs-zlib-inflate.patch
 Patch5072: linux-2.6.9-CVE-2006-6106-capi-size-check.patch
-
+Patch5073: linux-2.6.9-CVE-2007-2878-vfat-put_dirent32-dos.patch
+Patch5074: linux-2.6.9-CVE-2007-3739-stack-grow-limit.patch
+Patch5075: linux-2.6.9-CVE-2006-6921-dos-with-wedged-processes.patch
+Patch5076: linux-2.6.9-CVE-2007-4571-alsa-procfs.patch

Comment 3 Marcus Alves Grando 2007-11-23 11:29:03 UTC
Ok. Ignore part of 2.6.9-68 fix the problem. 2.6.9-68 does not fix the problem.
Since 2.6.9-55.0.2 all kernels does not work.

Regards

Comment 4 Jeff Layton 2007-11-23 12:29:27 UTC
To clarify, -55.0.2 is the last kernel that performs adequately for you?

Comment 5 Marcus Alves Grando 2007-11-23 12:50:08 UTC
(In reply to comment #4)
> To clarify, -55.0.2 is the last kernel that performs adequately for you?

Yes.

Comment 6 Larry Woodman 2007-11-26 16:01:43 UTC
Marcus, the subject of this BZ stated that patch
2037(linux-2.6.9-vm-balance.patch) caused the performance degradation yet the
first comment says the problem occuse after patch
2041(linux-2.6.9-vm-large-file-latency.patch) was applied.  Was it patch 2041
that caused the problem?

Thanks, Larry Woodman


Comment 7 Marcus Alves Grando 2007-11-27 13:56:13 UTC
(In reply to comment #6)
> Marcus, the subject of this BZ stated that patch
> 2037(linux-2.6.9-vm-balance.patch) caused the performance degradation yet the
> first comment says the problem occuse after patch
> 2041(linux-2.6.9-vm-large-file-latency.patch) was applied.  Was it patch 2041
> that caused the problem?
> 
> Thanks, Larry Woodman
> 

Ok, more precisely.

When i use 55.0.2 my processes that use NFS works fine, when i change to 55.0.6
and newers, my processes block in blk_congestion_wait and load increase fast.

Looking for diffs between 55.0.2 and 55.0.6 i suspect that patch2037 are the
problem.

More info of system:

# uname -a
Linux calomba.hst 2.6.9-55.0.6.ELsmp #1 SMP Tue Sep 4 21:36:00 EDT 2007 i686
i686 i386 GNU/Linux
# cat /proc/meminfo 
MemTotal:     16629324 kB
MemFree:       7858100 kB
Buffers:         40444 kB
Cached:         761656 kB
SwapCached:          0 kB
Active:        7891644 kB
Inactive:       663700 kB
HighTotal:    15855336 kB
HighFree:      7415488 kB
LowTotal:       773988 kB
LowFree:        442612 kB
SwapTotal:     2040244 kB
SwapFree:      2040244 kB
Dirty:           17836 kB
Writeback:           0 kB
Mapped:        7766056 kB
Slab:           139968 kB
CommitLimit:  10354904 kB
Committed_AS: 12755712 kB
PageTables:      50704 kB
VmallocTotal:   106488 kB
VmallocUsed:      3336 kB
VmallocChunk:   102632 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB
# cat /proc/loadavg 
235.43 100.06 38.13 1/753 10833

I can recompile 55.0.6 without patch2037 to see if problem is gone, if you need.

Regards

Comment 8 Larry Woodman 2007-11-27 15:29:29 UTC
OK, thanks for the data Marcus.  Trying without patch2037 would be helpful but
let me say that the linux-2.6.9-vm-balance.patch has two pieces: 1.) adds a new
tunable "/proc/sys/vm/pagecache" which forces the system to reclaim unmapped
pagecache memory but only if you lower /proc/sys/vm/pagecache from the default
of 100%.  2.) increases the default value of /proc/sys/vm/min_free_kbytes by a
factor of 4 to amke the system reclaim memory earlier and therefore prevent OOM
kills from happening.

Can you simply "cat /proc/sys/vm/min_free_kbytes", take the result and "echo
result > /proc/sys/vm/min_free_kbytes" and see if the problem goes away???

Finally, what hardware are you using, x86 or x86_64 ???


Larry Woodman



Comment 9 Marcus Alves Grando 2007-11-27 16:35:02 UTC
(In reply to comment #8)
> OK, thanks for the data Marcus.  Trying without patch2037 would be helpful but
> let me say that the linux-2.6.9-vm-balance.patch has two pieces: 1.) adds a new
> tunable "/proc/sys/vm/pagecache" which forces the system to reclaim unmapped
> pagecache memory but only if you lower /proc/sys/vm/pagecache from the default
> of 100%.  2.) increases the default value of /proc/sys/vm/min_free_kbytes by a
> factor of 4 to amke the system reclaim memory earlier and therefore prevent OOM
> kills from happening.
> 
> Can you simply "cat /proc/sys/vm/min_free_kbytes", take the result and "echo
> result > /proc/sys/vm/min_free_kbytes" and see if the problem goes away???

Ok. I test this and don't see any diference (load's increase equal). Default
value in my 55.0.2 is 949 and i set this via sysctl. Doesen't help.

> 
> Finally, what hardware are you using, x86 or x86_64 ???

That's a x86.

Another test?

Regards

Comment 10 Larry Woodman 2007-11-27 16:48:58 UTC
Yse, if you can rebuild the kernel please make this change to
linux-2.6.9-vm-balance.patch on line 253:

-+#ifdef CONFIG_HIGHMEM
++#ifdef 0


This removes a 32-bit test for NFS


+#ifdef CONFIG_HIGHMEM
+                       /*
+                        * Until NFS gets proper congestion control,
+                        * we disallow HIGHMEM so the writeback logic
+                        * limits the amount of dirty memory.  Otherwise,
+                        * writing large files results in OOM when the
+                        * lowmem is scarce.
+                        */
+                       mapping_set_gfp_mask(&inode->i_data, GFP_KERNEL);
+#endif


Can you rebuild and try this out for me???

Thanks, Larry


Comment 11 Marcus Alves Grando 2007-11-27 19:02:33 UTC
(In reply to comment #10)
> Yse, if you can rebuild the kernel please make this change to
> linux-2.6.9-vm-balance.patch on line 253:
> 
> -+#ifdef CONFIG_HIGHMEM
> ++#ifdef 0
> 
> 
> This removes a 32-bit test for NFS
> 
> 
> +#ifdef CONFIG_HIGHMEM
> +                       /*
> +                        * Until NFS gets proper congestion control,
> +                        * we disallow HIGHMEM so the writeback logic
> +                        * limits the amount of dirty memory.  Otherwise,
> +                        * writing large files results in OOM when the
> +                        * lowmem is scarce.
> +                        */
> +                       mapping_set_gfp_mask(&inode->i_data, GFP_KERNEL);
> +#endif
> 
> 
> Can you rebuild and try this out for me???
> 
> Thanks, Larry
> 

Ok. I'll recompile kernel and test now.

Thanks

Comment 12 Marcus Alves Grando 2007-11-27 20:48:39 UTC
Bingo. Without mapping_set_gfp_mask() NFS work's fine again.

I'll test with 2.6.9-67 now.

Thanks

Comment 13 Marcus Alves Grando 2007-11-27 22:47:22 UTC
(In reply to comment #12)
> Bingo. Without mapping_set_gfp_mask() NFS work's fine again.
> 
> I'll test with 2.6.9-67 now.

Work's fine on 2.6.9-67 when i disabe mapping_set_gfp_mask().

Regards


Comment 14 Marcus Alves Grando 2007-11-30 12:57:13 UTC
Larry,

Two questions:

That's a propper fix for that? And what's a schedule for a
errata/release_new_kernel?

Regards

Comment 15 Larry Woodman 2007-11-30 14:47:05 UTC
Marcus, I will fix that problem in an errata kernel by removing the
mapping_set_gfp_mask(&inode->i_data, GFP_KERNEL) call under default conditions.
 I cant simply remove it in all cases because it was added to prevent OOM
killing when the system is under al lot of lowmem pressure and NFS is very
actively writing.

The schedule is in early/January 2008.

Larry


Comment 17 RHEL Program Management 2008-01-08 17:07:31 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 20 Vivek Goyal 2008-01-30 19:35:17 UTC
Committed in 68.9. RPMS are available at http://people.redhat.com/vgoyal/rhel4/.

Comment 23 Larry Woodman 2008-03-25 20:57:16 UTC
Created attachment 299069 [details]
tested 4.5-zstream patch to fix this problem

4.5-zstream patch that adds /proc/sys/vm/nfs-writeback-lowmem-only to
conditionalize whether NFS writebacks are restricted to lowmem.

Comment 24 Larry Woodman 2008-03-25 20:58:35 UTC
Created attachment 299070 [details]
Tested 4.6 zstream patch that fixes this problem.

4.6-zstream patch that adds /proc/sys/vm/nfs-writeback-lowmem-only to
conditionalize whether NFS writebacks are restricted to lowmem.

Comment 25 Jeff Layton 2008-04-03 14:13:38 UTC
*** Bug 429925 has been marked as a duplicate of this bug. ***

Comment 26 Sev Binello 2008-04-03 14:26:16 UTC
We have been using a test version of the  kernel i.e.  2.6.9-68.11
We haven't seen the problem.
Based on the comments here, is it correct to infer that
this fix is now released and present in 2.6.9-68.9.
And, that we no longer need the test version ?

Comment 27 Jeff Layton 2008-04-03 14:46:36 UTC
No, both -68.9 and -68.11 are development builds. To my knowledge, the fix for
this is not released yet.

Comment 29 Don Domingo 2008-05-14 04:29:45 UTC
added to RHEL4.7 release notes under "Kernel-Related Updates":



<quote>


You can now restrict NFS writes to low memory. To do so, set
/proc/sys/vm/nfs-writeback-lowmem-only to 1 (this is set to 0 by default).

Previous releases did not include this capability. This caused NFS read
performance degradation in some cases, particularly when the system encountered
high volumes of NFS read/write requests.
</quote>



please advise if any revisions are in order. thanks!

Comment 30 Marcus Alves Grando 2008-05-16 18:38:09 UTC
Looks good to me.

Did you expect release new kernel soon? Or have some day to announce RH 4.7?

Regards

Comment 31 Don Domingo 2008-06-02 23:16:12 UTC
Hi,

the RHEL4.7 release notes deadline is on June 17, 2008 (Tuesday). they will
undergo a final proofread before being dropped to translation, at which point no
further additions or revisions will be entertained.

a mockup of the RHEL4.7 release notes can be viewed here:
http://intranet.corp.redhat.com/ic/intranet/RHEL4u7relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don

Comment 33 errata-xmlrpc 2008-07-24 19:22:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html