Bug 163555 - RHES4 NFS client cannot create files larger than physical mem size without oom-killer attacking.
RHES4 NFS client cannot create files larger than physical mem size without oo...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Steve Dickson
Ben Levenson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-07-18 16:32 EDT by Alexander N. Spitzer
Modified: 2010-10-21 23:09 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-04-23 06:57:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
messages file with oom output (189.74 KB, text/plain)
2006-10-05 09:31 EDT, Walt Kopy
no flags Details

  None (edit)
Description Alexander N. Spitzer 2005-07-18 16:32:37 EDT
Description of problem:
Linux ES4 as an NFS client cannot create files larger that physical mem size
without oom-killer attacking.

Version-Release number of selected component (if applicable):
Redhat ES 4 Update 1 (also happens in ES4 Update 0, and FC3)

How reproducible:
100%

Steps to Reproduce:
1. mount NFS filesystem with to following options:
rw,fg,tcp,nointr,hard,vers=3,wsize=32768,rsize=32768,timeo=600,retrans=2

2. dd if=/dev/zero of=/nfsmounted/fs bs=8192 count=1250000

3. tail /var/log/messages

when file gets to around the size of physical mem on the machine, oom-killer
starts killing processes.
  
Actual results:
oom-killer starts randomly killing processes.

Expected results:
file would be successfully written to nfs mount.

Additional info:
If the NFS server that you are testing with is Linux, export the filesystem
as "async"

Although this test uses "dd", this behavior has actually blocked us from using
Oracle 9i. When having Oracle create a 20gb dbf file, the same thing happens.
Comment 2 D Byrne 2005-09-09 18:36:45 EDT
I appear to be having the same problem, with nfs options
vers=3,rw,rsize=8192,wsize=8192, also reproducible 100%.
Comment 3 Alexander N. Spitzer 2005-09-11 09:18:56 EDT
This but can be fixed by doing the following:

echo 100 > /proc/sys/vm/lower_zone_protection



Comment 6 Larry Woodman 2006-03-21 14:49:51 EST
Setting /proc/sys/vm/lower_zone_protection to 100 is a perfectly valid way to
fix the OOMkilling of processes when the majority of file systems are NFS
mounted.  The lower_zone_protection parameter increases the free page threshold
by 100 thereby starting page reclamation earlier and therefore preventing NFS
from getting so far behind the kernel's memory demmands.  This will eliminate
most of the OOMkills when using NFS under a heavy memory load.

Larry Woodman
Comment 7 Steve Dickson 2006-04-23 06:57:50 EDT
Closing as NOTABUG since the fix is to adjust the value in lower_zone_protection
as noted in comment #6
Comment 8 Alexander N. Spitzer 2006-04-23 18:17:55 EDT
I would still mark this as a bug as the default installation will crash under
normal conditions.

I suggest adding something like the following to the standard RH nfs init script
(/etc/init.d/nfs) :

echo "Raising Lower Zone Protection"
echo 100 > /proc/sys/vm/lower_zone_protection
Comment 9 Walt Kopy 2006-10-04 10:59:29 EDT
We have experienced the same problem on 64-bit machines where 
lower_zone_protection does not apply. This appears to be a continuing
and reproducible problem. Should a new case be created or can this one
be reopened?
Comment 10 Larry Woodman 2006-10-04 13:42:20 EDT
Walt, can you attach the show_mem() output that gets written to the
/var/log/messages file and console from a 64-bit machine?

Thanks, Larry Woodman
Comment 11 Walt Kopy 2006-10-05 09:32:00 EDT
Created attachment 137817 [details]
messages file with oom output
Comment 12 Walt Kopy 2006-10-05 09:40:39 EDT
Larry, I have attached the exerpt from the messages file you requested. I do
not have console output. The machine has 2 opteron cpus with 10 GB of ram.
At the moment oom can be recreated reliably so I can describe the environment
in more detail if necessary and provide additional information/test runs.

Do you know of any vm tunables that could minimize the problem? I am finding
that reducing the value of dirty_ratio is helpful.
 
I believe this is the same problem reported in bugzilla 176650. 

Note You need to log in before you can comment on or make changes to this bug.