Description of problem: Linux ES4 as an NFS client cannot create files larger that physical mem size without oom-killer attacking. Version-Release number of selected component (if applicable): Redhat ES 4 Update 1 (also happens in ES4 Update 0, and FC3) How reproducible: 100% Steps to Reproduce: 1. mount NFS filesystem with to following options: rw,fg,tcp,nointr,hard,vers=3,wsize=32768,rsize=32768,timeo=600,retrans=2 2. dd if=/dev/zero of=/nfsmounted/fs bs=8192 count=1250000 3. tail /var/log/messages when file gets to around the size of physical mem on the machine, oom-killer starts killing processes. Actual results: oom-killer starts randomly killing processes. Expected results: file would be successfully written to nfs mount. Additional info: If the NFS server that you are testing with is Linux, export the filesystem as "async" Although this test uses "dd", this behavior has actually blocked us from using Oracle 9i. When having Oracle create a 20gb dbf file, the same thing happens.
I appear to be having the same problem, with nfs options vers=3,rw,rsize=8192,wsize=8192, also reproducible 100%.
This but can be fixed by doing the following: echo 100 > /proc/sys/vm/lower_zone_protection
Setting /proc/sys/vm/lower_zone_protection to 100 is a perfectly valid way to fix the OOMkilling of processes when the majority of file systems are NFS mounted. The lower_zone_protection parameter increases the free page threshold by 100 thereby starting page reclamation earlier and therefore preventing NFS from getting so far behind the kernel's memory demmands. This will eliminate most of the OOMkills when using NFS under a heavy memory load. Larry Woodman
Closing as NOTABUG since the fix is to adjust the value in lower_zone_protection as noted in comment #6
I would still mark this as a bug as the default installation will crash under normal conditions. I suggest adding something like the following to the standard RH nfs init script (/etc/init.d/nfs) : echo "Raising Lower Zone Protection" echo 100 > /proc/sys/vm/lower_zone_protection
We have experienced the same problem on 64-bit machines where lower_zone_protection does not apply. This appears to be a continuing and reproducible problem. Should a new case be created or can this one be reopened?
Walt, can you attach the show_mem() output that gets written to the /var/log/messages file and console from a 64-bit machine? Thanks, Larry Woodman
Created attachment 137817 [details] messages file with oom output
Larry, I have attached the exerpt from the messages file you requested. I do not have console output. The machine has 2 opteron cpus with 10 GB of ram. At the moment oom can be recreated reliably so I can describe the environment in more detail if necessary and provide additional information/test runs. Do you know of any vm tunables that could minimize the problem? I am finding that reducing the value of dirty_ratio is helpful. I believe this is the same problem reported in bugzilla 176650.