Bug 221619 - FC6 reliably hangs with heavy xfs filesystem activity on a LUKS volume
FC6 reliably hangs with heavy xfs filesystem activity on a LUKS volume
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
i386 Linux
medium Severity urgent
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
: Reopened
Depends On:
  Show dependency treegraph
Reported: 2007-01-05 12:36 EST by David Keaton
Modified: 2007-11-30 17:11 EST (History)
2 users (show)

See Also:
Fixed In Version: 2.6.20-1.2933.fc6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-03-21 15:16:04 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description David Keaton 2007-01-05 12:36:54 EST
Description of problem:
The whole system freezes and requires a power cycle when there is heavy disk
activity on an xfs filesystem that is on an encrypted LUKS volume.

Version-Release number of selected component (if applicable):

How reproducible:
Force heavy filesystem activity on xfs over LUKS.

Steps to Reproduce:
1. Create and mount an xfs filesystem on an encrypted LUKS volume.
2. Rsync about 50GB of large and small files to the filesystem over the network
on a machine with 512MB RAM.
3. Log in and out several times both on the console and via ssh while the rsync
is running.
Actual results:
Eventually the system hangs every time.

Expected results:
Slow response due to heavy disk activity but eventually the rsync finishes and
the system returns to normal.

Additional info:
Comment 1 Eric Sandeen 2007-01-27 00:36:28 EST
if there's anything you can do to capture the system state when this happens,
that would be good - either a crashdump, or sysrq-type info (m for memory, t for
thread traces...)

Also can you say whether this is specific to xfs?  Might be worth testing on
ext3 as well, just for comparison.  If it looks xfs specific, maybe I can get
the sgi guys to take a look.

Comment 2 David Keaton 2007-01-27 00:53:48 EST
There does not appear to be any way to capture additional state.  The system
just halts and refuses to accept input.  A power cycle is necessary to do
anything with it.

I tried reiserfs a while back and that fails in a different way.  It just goes
into hours and hours of constant disk activity as if it is in an infinite loop.
 There is something fundamental failing, and the way the filesystems react to
that is a different problem.
Comment 3 Eric Sandeen 2007-01-27 00:56:46 EST
Hmm so the sysrq key doesn't work either?  Pity... need more info to see what's
going on.  Any messages on the console?

Have you tried it on ext3?  It's Red Hat's favorite filesystem after all. :)
Comment 4 David Keaton 2007-01-27 01:02:37 EST
Nothing gets written to the screen.  The screen doesn't change, so it looks the
same as it does when X windows is running, but there is no response to any
attempt at input and no disk activity.

Never tried it on ext3; that filesystem is too slow when not freshly created to
meet my needs.
Comment 5 Eric Sandeen 2007-01-27 01:09:26 EST
Ok; I guess my only other suggestions (other than finding me some time to
reproduce it!) would be to try reproducing it on a text console rather than X,
so that you can see any messages, or perhaps set up a serial console to capture
messages, and try the sysrq key from either of those to gather system state or
initiate a crashdump...

Comment 6 David Keaton 2007-01-27 01:17:41 EST
Without X, there would probably have to be something else with a large working
set in memory, so that may take some doing to reproduce the problem.

I don't have the same system configuration available anymore since I gave up on
FC6 and went back to FC5.
Comment 7 Chuck Ebbert 2007-03-19 16:59:07 EDT
Reporter cannot test fixes because he has gone back to FC5.
Current FC6 kernel is 2.6.20-1.2933.fc6...
Comment 8 David Keaton 2007-03-19 17:20:20 EDT
Reporter will test FC6 if a fix becomes available.
Comment 9 David Keaton 2007-03-19 17:24:03 EDT
It is possible that this bug is related to bug 221621.  In that case, a likely
explanation is that the system hangs in certain situations when there is a queue
of pages to be flushed to disk.  This can happen when there is a large amount of
disk activity, or when a large proportion of main memory is in use.
Comment 10 Chuck Ebbert 2007-03-19 17:38:23 EDT
First you need to test kernel 2.6.20-1.2933.fc6, which is going into testing
tomorrow. There is no way to know in advance if the 6000+ changes that
went into that kernel fixed the problem unless you test.
Comment 11 David Keaton 2007-03-19 20:50:01 EDT
I will not always be able to do this when a fix has not been attempted,
especially on such short notice, so please plan ahead.

I have reinstalled FC6 and updated to the latest kernel, which yum claims is
2.6.20-1.2925.fc6.  Please advise how to get the 2933 version.
Comment 12 Chuck Ebbert 2007-03-20 09:34:13 EDT
2933 is in updates-testing:

yum --enablerepo=updates-testing install kernel

You might have better luck using RPM to install it manually, though.
Comment 13 David Keaton 2007-03-21 15:16:04 EDT
Thanks.  Updating with yum worked fine.

This bug had been very easy to reproduce.  Now with 1 1/2 days of testing, I
cannot make it fail.  It appears to be fixed as of 2.6.20-1.2933.fc6.  Good job.

Note You need to log in before you can comment on or make changes to this bug.