Bug 751580

Summary: Copying large data causes system to not respond.
Product: Red Hat Enterprise Linux 5 Reporter: olaf
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 5.7   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-02 13:15:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description olaf 2011-11-05 21:49:50 UTC
Description of problem:

Copying large data causes system to not respond.
This is caused by memory being used by IO buffers.
I see that almost all memory is in "buffers", "writeback" and "dirty".
The memory marked as "writeback" has several GB!!
When some other application needs to do a sync, it hangs until the process writing data to disk finishes.
It causes also sometime low memory problems - I also got hit by OOM Killer one time.
It is crazy to put eg. 5GB as writeback when the target device has ~30 MB/s writes capability.
I tried to limit the memory being used and lowered the /proc/sys/vm/dirty_ratio and /proc/sys/vm_dirty_background_ratio.
BTW. it looks that lowering the value below "5" makes no sense, as for values 1-4 the amount of used memory is the same as for "5" value.
I was able to limit it further using dirty_bytes and dirty_background_bytes.
It helps with the memory available, but the sync still waits until the process doing the large copy finishes.

Version-Release number of selected component (if applicable):
kernel-2.6.18-274.7.1.el5

How reproducible:
Always

Steps to Reproduce:
1. Do "dd if=/dev/zero of=/dev/deviceX bs=10K count=50M"
2. Do "sync"
3. The "sync" finishes after the command "dd" finishes, what takes a lot of time
  
Actual results:
The system appears to hang.
The kernel buffers writes using unreasonable large memory. The amount of memory is dependent on total memory size.
The other processes are hung until the process doing large write finishes.
As many processes do "sync" from time to time, it causes the system as a whole to not respond.

Expected results:
The system is responsive.
The memory used for write buffers (dirty/writeback) should depend on target device throughput. I suppose only per device buffer would make it possible. If the system has a device that is capable of 100 MB/s and other 5 MB/s writes, it shouldn't buffer the same amount of data.
It should be possible to do "sync" even if there are large data written to a device. In the worst case it could cause the kernel to block writes, finish writes for data marked as "dirty/writeback", perform sync, unblock writes.

Additional info:
The system has 12GB of RAM. At the beginning of the actions described above, about 10G was free.

Comment 1 olaf 2011-11-05 22:18:18 UTC
Maybe it is related to:
https://lkml.org/lkml/2010/4/4/86
https://lkml.org/lkml/2010/8/1/40

Comment 2 RHEL Program Management 2014-03-07 12:46:06 UTC
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 3 RHEL Program Management 2014-06-02 13:15:50 UTC
Thank you for submitting this request for inclusion in Red Hat Enterprise Linux 5. We've carefully evaluated the request, but are unable to include it in RHEL5 stream. If the issue is critical for your business, please provide additional business justification through the appropriate support channels (https://access.redhat.com/site/support).