Bug 396081
Summary: | Since "Patch2037: linux-2.6.9-vm-balance.patch" my NFS performance is poorly | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Marcus Alves Grando <marcus> | ||||||
Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 4.0 | CC: | ddomingo, jbaron, jlayton, k.georgiou, lwoodman, mikel, mmayer, sev, sputhenp, staubach, steved | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | GSSApproved | ||||||||
Fixed In Version: | RHSA-2008-0665 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-07-24 19:22:00 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 391231, 438476, 438477 | ||||||||
Attachments: |
|
Description
Marcus Alves Grando
2007-11-22 20:50:23 UTC
--- kernel-2.6.9-55.0.2.spec 2007-11-22 23:16:38.000000000 -0200 +++ kernel-2.6.9-55.0.12.spec 2007-11-22 23:37:06.000000000 -0200 @@ -30,7 +30,7 @@ # that the kernel isn't the stock distribution kernel, for example by # adding some text to the end of the version number. # -%define release 55.0.2.EL +%define release 55.0.12.EL %define sublevel 9 %define kversion 2.6.%{sublevel} %define rpmversion 2.6.%{sublevel} @@ -332,6 +332,8 @@ Patch282: linux-2.6.9-x86-acpi-skip-timer-nvidia.patch Patch283: linux-2.6.9-x8664-e820.patch Patch284: linux-2.6.9-x86-nvidia-hpet.patch +Patch285: linux-2.6.9-x86_64-zero-extend-regs.patch +Patch286: linux-2.6.9-x8664-mtrr-updates.patch # 300 - 399 ppc(64) Patch300: linux-2.6.2-ppc64-build.patch @@ -977,6 +979,7 @@ Patch2034: linux-2.6.9-busy-inodes.patch Patch2035: linux-2.6.9-vm-deadlock.patch Patch2036: linux-2.6.9-vm-swap-io-error.patch +Patch2037: linux-2.6.9-vm-balance.patch # IDE bits. Patch2100: linux-2.6.9-ide-csb6-raid.patch @@ -1146,6 +1149,7 @@ Patch2577: linux-2.6.9-fs-prevent-inode-overflow.patch Patch2578: linux-2.6.9-fs-lustre-support.patch Patch2579: linux-2.6.9-fs-inode-accounting.patch +Patch2580: linux-2.6.9-fs-leak.patch # Device Mapper patches Patch2600: linux-2.6.13-dm-swap-error.patch @@ -1302,6 +1306,7 @@ Patch4080: linux-2.6.9-core-dump.patch Patch4081: linux-2.6.9-compat.patch Patch4082: linux-2.6.9-bluetooth.patch +Patch4083: linux-2.6.9-pdeath-signal-suid.patch # ALSA fixes. Patch4100: linux-2.6.9-alsa-vx222-newid.patch @@ -1391,7 +1396,10 @@ Patch5070: linux-2.6.9-CVE-2006-4538-ia64-corrupt-elf.patch Patch5071: linux-2.6.9-CVE-2006-5823-cramfs-zlib-inflate.patch Patch5072: linux-2.6.9-CVE-2006-6106-capi-size-check.patch - +Patch5073: linux-2.6.9-CVE-2007-2878-vfat-put_dirent32-dos.patch +Patch5074: linux-2.6.9-CVE-2007-3739-stack-grow-limit.patch +Patch5075: linux-2.6.9-CVE-2006-6921-dos-with-wedged-processes.patch +Patch5076: linux-2.6.9-CVE-2007-4571-alsa-procfs.patch Ok. Ignore part of 2.6.9-68 fix the problem. 2.6.9-68 does not fix the problem. Since 2.6.9-55.0.2 all kernels does not work. Regards To clarify, -55.0.2 is the last kernel that performs adequately for you? (In reply to comment #4) > To clarify, -55.0.2 is the last kernel that performs adequately for you? Yes. Marcus, the subject of this BZ stated that patch 2037(linux-2.6.9-vm-balance.patch) caused the performance degradation yet the first comment says the problem occuse after patch 2041(linux-2.6.9-vm-large-file-latency.patch) was applied. Was it patch 2041 that caused the problem? Thanks, Larry Woodman (In reply to comment #6) > Marcus, the subject of this BZ stated that patch > 2037(linux-2.6.9-vm-balance.patch) caused the performance degradation yet the > first comment says the problem occuse after patch > 2041(linux-2.6.9-vm-large-file-latency.patch) was applied. Was it patch 2041 > that caused the problem? > > Thanks, Larry Woodman > Ok, more precisely. When i use 55.0.2 my processes that use NFS works fine, when i change to 55.0.6 and newers, my processes block in blk_congestion_wait and load increase fast. Looking for diffs between 55.0.2 and 55.0.6 i suspect that patch2037 are the problem. More info of system: # uname -a Linux calomba.hst 2.6.9-55.0.6.ELsmp #1 SMP Tue Sep 4 21:36:00 EDT 2007 i686 i686 i386 GNU/Linux # cat /proc/meminfo MemTotal: 16629324 kB MemFree: 7858100 kB Buffers: 40444 kB Cached: 761656 kB SwapCached: 0 kB Active: 7891644 kB Inactive: 663700 kB HighTotal: 15855336 kB HighFree: 7415488 kB LowTotal: 773988 kB LowFree: 442612 kB SwapTotal: 2040244 kB SwapFree: 2040244 kB Dirty: 17836 kB Writeback: 0 kB Mapped: 7766056 kB Slab: 139968 kB CommitLimit: 10354904 kB Committed_AS: 12755712 kB PageTables: 50704 kB VmallocTotal: 106488 kB VmallocUsed: 3336 kB VmallocChunk: 102632 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB # cat /proc/loadavg 235.43 100.06 38.13 1/753 10833 I can recompile 55.0.6 without patch2037 to see if problem is gone, if you need. Regards OK, thanks for the data Marcus. Trying without patch2037 would be helpful but let me say that the linux-2.6.9-vm-balance.patch has two pieces: 1.) adds a new tunable "/proc/sys/vm/pagecache" which forces the system to reclaim unmapped pagecache memory but only if you lower /proc/sys/vm/pagecache from the default of 100%. 2.) increases the default value of /proc/sys/vm/min_free_kbytes by a factor of 4 to amke the system reclaim memory earlier and therefore prevent OOM kills from happening. Can you simply "cat /proc/sys/vm/min_free_kbytes", take the result and "echo result > /proc/sys/vm/min_free_kbytes" and see if the problem goes away??? Finally, what hardware are you using, x86 or x86_64 ??? Larry Woodman (In reply to comment #8) > OK, thanks for the data Marcus. Trying without patch2037 would be helpful but > let me say that the linux-2.6.9-vm-balance.patch has two pieces: 1.) adds a new > tunable "/proc/sys/vm/pagecache" which forces the system to reclaim unmapped > pagecache memory but only if you lower /proc/sys/vm/pagecache from the default > of 100%. 2.) increases the default value of /proc/sys/vm/min_free_kbytes by a > factor of 4 to amke the system reclaim memory earlier and therefore prevent OOM > kills from happening. > > Can you simply "cat /proc/sys/vm/min_free_kbytes", take the result and "echo > result > /proc/sys/vm/min_free_kbytes" and see if the problem goes away??? Ok. I test this and don't see any diference (load's increase equal). Default value in my 55.0.2 is 949 and i set this via sysctl. Doesen't help. > > Finally, what hardware are you using, x86 or x86_64 ??? That's a x86. Another test? Regards Yse, if you can rebuild the kernel please make this change to linux-2.6.9-vm-balance.patch on line 253: -+#ifdef CONFIG_HIGHMEM ++#ifdef 0 This removes a 32-bit test for NFS +#ifdef CONFIG_HIGHMEM + /* + * Until NFS gets proper congestion control, + * we disallow HIGHMEM so the writeback logic + * limits the amount of dirty memory. Otherwise, + * writing large files results in OOM when the + * lowmem is scarce. + */ + mapping_set_gfp_mask(&inode->i_data, GFP_KERNEL); +#endif Can you rebuild and try this out for me??? Thanks, Larry (In reply to comment #10) > Yse, if you can rebuild the kernel please make this change to > linux-2.6.9-vm-balance.patch on line 253: > > -+#ifdef CONFIG_HIGHMEM > ++#ifdef 0 > > > This removes a 32-bit test for NFS > > > +#ifdef CONFIG_HIGHMEM > + /* > + * Until NFS gets proper congestion control, > + * we disallow HIGHMEM so the writeback logic > + * limits the amount of dirty memory. Otherwise, > + * writing large files results in OOM when the > + * lowmem is scarce. > + */ > + mapping_set_gfp_mask(&inode->i_data, GFP_KERNEL); > +#endif > > > Can you rebuild and try this out for me??? > > Thanks, Larry > Ok. I'll recompile kernel and test now. Thanks Bingo. Without mapping_set_gfp_mask() NFS work's fine again. I'll test with 2.6.9-67 now. Thanks (In reply to comment #12) > Bingo. Without mapping_set_gfp_mask() NFS work's fine again. > > I'll test with 2.6.9-67 now. Work's fine on 2.6.9-67 when i disabe mapping_set_gfp_mask(). Regards Larry, Two questions: That's a propper fix for that? And what's a schedule for a errata/release_new_kernel? Regards Marcus, I will fix that problem in an errata kernel by removing the mapping_set_gfp_mask(&inode->i_data, GFP_KERNEL) call under default conditions. I cant simply remove it in all cases because it was added to prevent OOM killing when the system is under al lot of lowmem pressure and NFS is very actively writing. The schedule is in early/January 2008. Larry This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Committed in 68.9. RPMS are available at http://people.redhat.com/vgoyal/rhel4/. Created attachment 299069 [details]
tested 4.5-zstream patch to fix this problem
4.5-zstream patch that adds /proc/sys/vm/nfs-writeback-lowmem-only to
conditionalize whether NFS writebacks are restricted to lowmem.
Created attachment 299070 [details]
Tested 4.6 zstream patch that fixes this problem.
4.6-zstream patch that adds /proc/sys/vm/nfs-writeback-lowmem-only to
conditionalize whether NFS writebacks are restricted to lowmem.
*** Bug 429925 has been marked as a duplicate of this bug. *** We have been using a test version of the kernel i.e. 2.6.9-68.11 We haven't seen the problem. Based on the comments here, is it correct to infer that this fix is now released and present in 2.6.9-68.9. And, that we no longer need the test version ? No, both -68.9 and -68.11 are development builds. To my knowledge, the fix for this is not released yet. added to RHEL4.7 release notes under "Kernel-Related Updates": <quote> You can now restrict NFS writes to low memory. To do so, set /proc/sys/vm/nfs-writeback-lowmem-only to 1 (this is set to 0 by default). Previous releases did not include this capability. This caused NFS read performance degradation in some cases, particularly when the system encountered high volumes of NFS read/write requests. </quote> please advise if any revisions are in order. thanks! Looks good to me. Did you expect release new kernel soon? Or have some day to announce RH 4.7? Regards Hi, the RHEL4.7 release notes deadline is on June 17, 2008 (Tuesday). they will undergo a final proofread before being dropped to translation, at which point no further additions or revisions will be entertained. a mockup of the RHEL4.7 release notes can be viewed here: http://intranet.corp.redhat.com/ic/intranet/RHEL4u7relnotesmockup.html please use the aforementioned link to verify if your bugzilla is already in the release notes (if it needs to be). each item in the release notes contains a link to its original bug; as such, you can search through the release notes by bug number. Cheers, Don An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html |