Bug 163555 - RHES4 NFS client cannot create files larger than physical mem size without oom-killer attacking.
Summary: RHES4 NFS client cannot create files larger than physical mem size without oo...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Steve Dickson
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-07-18 20:32 UTC by Alexander N. Spitzer
Modified: 2018-10-19 20:43 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-04-23 10:57:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
messages file with oom output (189.74 KB, text/plain)
2006-10-05 13:31 UTC, Walt Kopy
no flags Details

Description Alexander N. Spitzer 2005-07-18 20:32:37 UTC
Description of problem:
Linux ES4 as an NFS client cannot create files larger that physical mem size
without oom-killer attacking.

Version-Release number of selected component (if applicable):
Redhat ES 4 Update 1 (also happens in ES4 Update 0, and FC3)

How reproducible:
100%

Steps to Reproduce:
1. mount NFS filesystem with to following options:
rw,fg,tcp,nointr,hard,vers=3,wsize=32768,rsize=32768,timeo=600,retrans=2

2. dd if=/dev/zero of=/nfsmounted/fs bs=8192 count=1250000

3. tail /var/log/messages

when file gets to around the size of physical mem on the machine, oom-killer
starts killing processes.
  
Actual results:
oom-killer starts randomly killing processes.

Expected results:
file would be successfully written to nfs mount.

Additional info:
If the NFS server that you are testing with is Linux, export the filesystem
as "async"

Although this test uses "dd", this behavior has actually blocked us from using
Oracle 9i. When having Oracle create a 20gb dbf file, the same thing happens.

Comment 2 D Byrne 2005-09-09 22:36:45 UTC
I appear to be having the same problem, with nfs options
vers=3,rw,rsize=8192,wsize=8192, also reproducible 100%.

Comment 3 Alexander N. Spitzer 2005-09-11 13:18:56 UTC
This but can be fixed by doing the following:

echo 100 > /proc/sys/vm/lower_zone_protection





Comment 6 Larry Woodman 2006-03-21 19:49:51 UTC
Setting /proc/sys/vm/lower_zone_protection to 100 is a perfectly valid way to
fix the OOMkilling of processes when the majority of file systems are NFS
mounted.  The lower_zone_protection parameter increases the free page threshold
by 100 thereby starting page reclamation earlier and therefore preventing NFS
from getting so far behind the kernel's memory demmands.  This will eliminate
most of the OOMkills when using NFS under a heavy memory load.

Larry Woodman


Comment 7 Steve Dickson 2006-04-23 10:57:50 UTC
Closing as NOTABUG since the fix is to adjust the value in lower_zone_protection
as noted in comment #6

Comment 8 Alexander N. Spitzer 2006-04-23 22:17:55 UTC
I would still mark this as a bug as the default installation will crash under
normal conditions.

I suggest adding something like the following to the standard RH nfs init script
(/etc/init.d/nfs) :

echo "Raising Lower Zone Protection"
echo 100 > /proc/sys/vm/lower_zone_protection

Comment 9 Walt Kopy 2006-10-04 14:59:29 UTC
We have experienced the same problem on 64-bit machines where 
lower_zone_protection does not apply. This appears to be a continuing
and reproducible problem. Should a new case be created or can this one
be reopened?

Comment 10 Larry Woodman 2006-10-04 17:42:20 UTC
Walt, can you attach the show_mem() output that gets written to the
/var/log/messages file and console from a 64-bit machine?

Thanks, Larry Woodman


Comment 11 Walt Kopy 2006-10-05 13:32:00 UTC
Created attachment 137817 [details]
messages file with oom output

Comment 12 Walt Kopy 2006-10-05 13:40:39 UTC
Larry, I have attached the exerpt from the messages file you requested. I do
not have console output. The machine has 2 opteron cpus with 10 GB of ram.
At the moment oom can be recreated reliably so I can describe the environment
in more detail if necessary and provide additional information/test runs.

Do you know of any vm tunables that could minimize the problem? I am finding
that reducing the value of dirty_ratio is helpful.
 
I believe this is the same problem reported in bugzilla 176650. 


Note You need to log in before you can comment on or make changes to this bug.