Red Hat Bugzilla – Bug 464792
Disk IO slows down when memory usage is high with vm.overcommit_memory=2
Last modified: 2008-12-13 12:02:42 EST
Created attachment 318083 [details]
lspci -vvv output
Description of problem:
I don't know that I am going to be able to give you enough information to track this down, but I'll describe what I am seeing so as to get the issue on record.
I have a dual core xeon with 2GB of real memory and 10Gb of real memory configured not to use the OOM killer (vm.overcommit_memory = 2 and vm.overcommit_ratio = 50). I boot with the nomodeset (to avoid display problem) and the elevator=deadline (to try to get better interactive response) kernel parameters.
Occassionally when I do yum updates yum's memory usage baloons to over 2Gb (this is supposedly a bug in rpm libs) and when this happens I often see problems. When I am remote I can get normal ping responses from the machine but opening new ssh or smtp sessions will hang. When I am local the mouse will move extremely slowly and things will eventually appear to hang. On at least one case I was able to kill off yum and things seemed to return to normal.
The last kernel I saw this happen on was 2.6.27-0.354.rc7.git3.fc10.x86_64 though I am now using 2.6.27-0.372.rc8.fc10.x86_64.
Version-Release number of selected component (if applicable):
It isn't easy to reproduce.
Steps to Reproduce:
I had this happen again with kernel 2.6.27-0.372.rc8.fc10.x86_64. Updating lots of kde components seems to be a good way to trigger the yum/rpmlibs bug which triggers the apparent kernel bug.
In this case my console X session slowly ground to a halt and I needed to reboot the system as things were dead before i could kill the yum process.
Created attachment 318096 [details]
I forgot to add some relevant information about file system and block devices.
All my filesystems are ext3. Everything (including swap) is using dm-crypt on top of md raid 1.
I have now seen this with the 382 kernel. Pings still got a response, but the ssh sessions I had open (one running yum and another just sitting with a shell prompt) did not respond.
I have not seen this happen with my home system which uses filesystems on raid devices, but not encrypted devices (except for temporarily mounted usb drives).
I will have a good chance to get this to happen again Monday morning when I get access to the machine again. Most likely trying the yum update again will get the lockup to happen again. If you have something that I can do to collect more information about the problem please let me know. I'll be checking this bug report before trying any updates.
This was not happening with f9 and I did see this fairly soon after changing to Rawhide in September. So it is likely a regression in the 2.6.26 or 2.6.27 kernels. I didn't see an applicable bug on the kernel regressions list though.
Sounds like lack of low memory for I/O.
Is there something that would be helpful for me to do to help diagnose this?
(In reply to comment #8)
> I have a dual core xeon with 2GB of real memory and 10Gb of real memory
You mean 2GB real and 10GB of swap?
Yes, 2GB real (ECC) memory and 10GB swap (on an encrypted mirrored partition).
I'm not seeing how this is a kernel bug. You have an app that's eating all your memory (and then swap). As I understand it, overcommit=2 will only allow 50% (1GB in your case) of real memory to be 'committed' at any time, so the allocation that follows once that has been met will hit swap, which is obviously going to impact disk IO throughput.
That it didn't show up in F9 doesn't necessarily mean it's a kernel regression. Yum/rpm may have developed a leak in rawhide for eg.
Disagree: in no overcommit mode we shouldn't be thrashing that badly and hanging - that means either the overcommit isn't working or the VM has gone to crap.
The underlying bugs are long long standing kernel vm bug (or bugs to be more accurate). Also overcommit=2 will ensure swap backing exists but will not use it unless we are out of physical ram, it ensures we have room to swap stuff out nicely. It should constrain the scale of the resulting mess and is worth trying.
There are two key bugs in the general area I know about:
1. the ext3 journal thread has no ioprio set so the CFQ scheduler everyone seems to like to default to gets confused and tries to punish it as an io hog, ditto the writeout threads. Arjan posted fixes for this many months ago but they are still not upstream. Makes my I/O far more stable and way faster under load. If we don't apply those fixes we shouldn't ship CFQ as its broken out of the box
2. under load the vm writeout threads start randomly writing back chunks of memory which ruins the nice ordered write out from the fs and trashes your disk performance. No easy fix and it seems some of the vm people have entirely lost the plot here. And to maximise damage #1 kicks on the write back threads in to mess up the effect of #2 further.
On a box without an IOMMU paging to try and get memory under the 4GB boundary makes the entire thing even worse. I wouldn't be suprised if all the extra allocation and poking by the encryption code makes the mess bigger again.
However testing with overcommit =2 and deadline scheduling ought to have avoided problem #1 for the most part. The lack of an IOMMU will cause thrashing of some sort with a 32bit capable disk controller and no IOMMU but even under high allocations and 2GB memory usage on a 10GB box it shouldn't be becoming unusable.
Is the kernel in use 32 or 64bit ?
Are the disks on the PATA port or the AHCI SATA ports ?
I am running a x86_64 kernel.
I didn't build the system myself, so am not 100% sure what the disks were plugged in to. Is there a command I can run to get a definitive answer here or do I need to open the box up and look at things?
I am not sure that I used up all of my allocation when this occured. swap + 50% of real memory is 11GB. I was seeing a difference when yum was using about 2GB of virtual memory as opposed to about .5GB of virtual memory. If I am running that close to the maximum memory use that an extra 1.5 puts me over, I would expect to be seeing problems from processes not being able to run on a regular basis.
The dmesg boot log will tell you which controller has which disk
Created attachment 321182 [details]
It looks like the devices are on a scsi bus and hence are attached to the ahci controller, but since I am not an expert at reading the dmesg info, I've attached it.
(In reply to comment #11)
> There are two key bugs in the general area I know about:
> 1. the ext3 journal thread has no ioprio set so the CFQ scheduler everyone
> seems to like to default to gets confused and tries to punish it as an io hog,
> ditto the writeout threads. Arjan posted fixes for this many months ago but
> they are still not upstream. Makes my I/O far more stable and way faster under
> load. If we don't apply those fixes we shouldn't ship CFQ as its broken out of
> the box
AKPM NAKed that approach. See:
Yeah Andrew whines about it and proposes other stuff that doesn't work. Not my problem, my kernels all have that fixed and they flatten Red Hat ones on usability.
Anyway this system has a SAS MPT Fusion controller for the disks. Fusion is 64bit so this workload is showing a real kernel problem and its not 32bit unbalancing, something is busted.
I am not sure if this is another aspect of the same problem or a different one.
Recently I have been having stalls while transferring data to an encrypted usb device. However the system recovers from these when the transfer finishes. Occasionally I will have the X display freeze up (the curser doesn't track the mouse and no other changes affect the screen) for times on the order of a minute and then things will happening suggesting that at least some input was captured during the freeze. After some pauses the X server will restart. I don't know if it is crashing or if something decides it isn't running any more and restarts it.
Created attachment 321555 [details]
After the latest xorg restart I took a look in /var/log/messages and found that there was relevant information. There was plenty of swap space available. There were some references to low and high memory, but I don't know enough to tell if there was a shortage of a particular kind of memory causing the problem.
I haven't seen the machine lock up unrecoverably in a long time, but when copying 100s for megabytes of data to my encrypted usb drive I regularly see processes stall, but not all at the same time. And it isn't unusual for X to restart. When I've checked swap usage its virtually nothing.
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.
More information and reason for this action is here:
Recently things seem to be improving, but since the problem didn't always manifest itself when doing a lot of file i/o I am not sure it is really fixed.
I am currently running 220.127.116.11-137.fc10.x86_64.
I think it has been long enough that we can consider this fixed. The worst I have seen in the last week was subsecond delays in music being played and that might just be normal expected disk contention. I haven't had any minute+ periods without response nor have I seen X crashes due to failure to allocate memory nor have I had a yum update lock up the system.