Bug 204881
Summary: | md RAID1 writes eat up 100% CPU, high wa% | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Trevor Cordes <trevor> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED NOTABUG | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5 | CC: | ade.rixon, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-04-12 14:47:09 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Trevor Cordes
2006-09-01 00:53:54 UTC
Changing to proper owner, kernel-maint. A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you. Bug still present as of 2.6.18-1.2200.fc5. Bug still present as of 2.6.18-1.2200.fc5. I didn't see this until I booted kernel-2.6.18-1.2239.fc5, at which point the system started a resync of my 40GB RAID1 partition that eventually ground to a halt somewhere around 50%, locking everything up. Reverting to 2200, the problem didn't recur. can you try just a dd to a single disk, (ie, don't start up the raid set). and see if the same effect occurs ? If it does, it points at a problem with your IO controller. If it goes away, it's definitely an md problem. After this, also try two dd's in parallel operating against each of the disks, and see if that also induces the same effect. OK, good idea. I tried it to a single disk (I swapoffed the swap space and just dd'd to that), and it does do the same behaviour. So I guess I was barking up the wrong tree a bit. The problem isn't md, the problem is lower than that. I had been pretty sure it was md because my RAID6 array on the same disks/controllers does not exhibit this problem, just the RAID1 arrays (and now no-raid). I don't really get it. Why am I seeing semi-PIO behaviour on a relatively modern system? This happens (confirmed) on 3 other systems ranging from P100's to Celeron 1.7's. My main test system is P4 2.4 on a E7201 board (ICH5 I think). #hdparm /dev/hdc5 /dev/hdc5: multcount = 16 (on) IO_support = 3 (32-bit w/sync) unmaskirq = 1 (on) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 36483/255/63, sectors = 5124672, start = 580798008 As you can see, I've been sure to use DMA, unmasq-irq, etc. My test system is weird in that it has 4 PCI I/O controller cards, but the other test systems are just using standard onboard Intel southbridge. All have DMA on. So I suppose the bug should be revised, and perhaps it's not a bug? But should linux really go to near-100% wait to write out huge files at the expense of interactive performance? I suppose it's a latency/throughput tradeoff. You may be able to tune it with the tunables in /proc/sys/vm/ so that it doesn't write stuff out to disk so regularly, but I'm not even sure that's going to be the magic bullet you seek. I'd close this bug. It's really a NOTABUG, and brain-deadedness / invalid expectations on my part. Sorry. Anyone seeing this bug, check out kernel bug: http://bugzilla.kernel.org/show_bug.cgi?id=7372 And try changing all your /sys/block/sda/queue/nr_requests (replace sda with all your hard drives) to 16. Default is 128. 128 definitely causes starvation for my issue. 16 caused symptoms to completely disappear on lightly loaded server. Still a bit glitchy when heavily loaded, but much more bearable. See my comments in the kernel bug above. |