Bug 865755
Summary: | ext3/4: Writing large file on large memory system with slow disks makes machine unusable | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Michael Weiser <m.weiser> |
Component: | kernel | Assignee: | Carlos Maiolino <cmaiolin> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 5.8 | CC: | cmaiolin, esandeen, kernel.shubham, lczerner, lutz.willek, rwheeler, vincent |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-03-20 18:50:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Michael Weiser
2012-10-12 10:48:36 UTC
Hi, can you let me know the values you tested in /proc/sys/vm/dirty_ratio? I was able to reproduce the same behaviour you're seeing, and this looks the system gets busy due the amount of data being written back to disk, making the IO system really busy, which isn't unexpected in systems with a huge amount of RAM. RHEL6 has a better IO mechanism which mitigate the problem, that's why you are not facing this problem on rhel6. A change in dirty_ratio should mitigate the problem, letting the machine usable while also writing to the root filesystem. This happens due the amount of data being written back per flush. I was able to reproduce this behaviour in a machine with 260GiB of RAM and dirty_ratio set with the default value (%40), and fixed it setting the dirty_ratio to 10 with: #echo 10 > /proc/sys/vm/dirty_ratio This should fix the lockups you're seeing, --Carlos I'm pretty sure (as sure as I can be after five months) I did test everything from 5 to 95 in increments of 10 without any change. I can't test any more since the machines went into production in the meantime. However, our customer has decided not to pursue the issue. Seemlingly, it doesn't pose as big a problem in production as initially anticipated. You can close this bug. Ok, I'm closing this bug, however, few free to reopen it if it becomes an issue again. --Carlos |