Bug 221619
Summary: | FC6 reliably hangs with heavy xfs filesystem activity on a LUKS volume | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | David Keaton <dmk> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> |
Severity: | urgent | Docs Contact: | |
Priority: | medium | ||
Version: | 6 | CC: | esandeen, wtogami |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 2.6.20-1.2933.fc6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-03-21 19:16:04 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Keaton
2007-01-05 17:36:54 UTC
if there's anything you can do to capture the system state when this happens, that would be good - either a crashdump, or sysrq-type info (m for memory, t for thread traces...) Also can you say whether this is specific to xfs? Might be worth testing on ext3 as well, just for comparison. If it looks xfs specific, maybe I can get the sgi guys to take a look. Thanks, -Eric There does not appear to be any way to capture additional state. The system just halts and refuses to accept input. A power cycle is necessary to do anything with it. I tried reiserfs a while back and that fails in a different way. It just goes into hours and hours of constant disk activity as if it is in an infinite loop. There is something fundamental failing, and the way the filesystems react to that is a different problem. Hmm so the sysrq key doesn't work either? Pity... need more info to see what's going on. Any messages on the console? Have you tried it on ext3? It's Red Hat's favorite filesystem after all. :) Nothing gets written to the screen. The screen doesn't change, so it looks the same as it does when X windows is running, but there is no response to any attempt at input and no disk activity. Never tried it on ext3; that filesystem is too slow when not freshly created to meet my needs. Ok; I guess my only other suggestions (other than finding me some time to reproduce it!) would be to try reproducing it on a text console rather than X, so that you can see any messages, or perhaps set up a serial console to capture messages, and try the sysrq key from either of those to gather system state or initiate a crashdump... Thanks, -Eric Without X, there would probably have to be something else with a large working set in memory, so that may take some doing to reproduce the problem. I don't have the same system configuration available anymore since I gave up on FC6 and went back to FC5. Reporter cannot test fixes because he has gone back to FC5. Current FC6 kernel is 2.6.20-1.2933.fc6... Reporter will test FC6 if a fix becomes available. It is possible that this bug is related to bug 221621. In that case, a likely explanation is that the system hangs in certain situations when there is a queue of pages to be flushed to disk. This can happen when there is a large amount of disk activity, or when a large proportion of main memory is in use. First you need to test kernel 2.6.20-1.2933.fc6, which is going into testing tomorrow. There is no way to know in advance if the 6000+ changes that went into that kernel fixed the problem unless you test. I will not always be able to do this when a fix has not been attempted, especially on such short notice, so please plan ahead. I have reinstalled FC6 and updated to the latest kernel, which yum claims is 2.6.20-1.2925.fc6. Please advise how to get the 2933 version. 2933 is in updates-testing: yum --enablerepo=updates-testing install kernel You might have better luck using RPM to install it manually, though. Thanks. Updating with yum worked fine. This bug had been very easy to reproduce. Now with 1 1/2 days of testing, I cannot make it fail. It appears to be fixed as of 2.6.20-1.2933.fc6. Good job. |