221619 – FC6 reliably hangs with heavy xfs filesystem activity on a LUKS volume

Bug 221619 - FC6 reliably hangs with heavy xfs filesystem activity on a LUKS volume

Summary: FC6 reliably hangs with heavy xfs filesystem activity on a LUKS volume

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	6
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	urgent
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-01-05 17:36 UTC by David Keaton
Modified:	2007-11-30 22:11 UTC (History)
CC List:	2 users (show)
Fixed In Version:	2.6.20-1.2933.fc6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-03-21 19:16:04 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description David Keaton 2007-01-05 17:36:54 UTC

Description of problem:
The whole system freezes and requires a power cycle when there is heavy disk
activity on an xfs filesystem that is on an encrypted LUKS volume.

Version-Release number of selected component (if applicable):
2.6.18-1.2868.fc6

How reproducible:
Force heavy filesystem activity on xfs over LUKS.

Steps to Reproduce:
1. Create and mount an xfs filesystem on an encrypted LUKS volume.
2. Rsync about 50GB of large and small files to the filesystem over the network
on a machine with 512MB RAM.
3. Log in and out several times both on the console and via ssh while the rsync
is running.
  
Actual results:
Eventually the system hangs every time.

Expected results:
Slow response due to heavy disk activity but eventually the rsync finishes and
the system returns to normal.

Additional info:

Comment 1 Eric Sandeen 2007-01-27 05:36:28 UTC

if there's anything you can do to capture the system state when this happens,
that would be good - either a crashdump, or sysrq-type info (m for memory, t for
thread traces...)

Also can you say whether this is specific to xfs?  Might be worth testing on
ext3 as well, just for comparison.  If it looks xfs specific, maybe I can get
the sgi guys to take a look.

Thanks,
-Eric

Comment 2 David Keaton 2007-01-27 05:53:48 UTC

There does not appear to be any way to capture additional state.  The system
just halts and refuses to accept input.  A power cycle is necessary to do
anything with it.

I tried reiserfs a while back and that fails in a different way.  It just goes
into hours and hours of constant disk activity as if it is in an infinite loop.
 There is something fundamental failing, and the way the filesystems react to
that is a different problem.

Comment 3 Eric Sandeen 2007-01-27 05:56:46 UTC

Hmm so the sysrq key doesn't work either?  Pity... need more info to see what's
going on.  Any messages on the console?

Have you tried it on ext3?  It's Red Hat's favorite filesystem after all. :)

Comment 4 David Keaton 2007-01-27 06:02:37 UTC

Nothing gets written to the screen.  The screen doesn't change, so it looks the
same as it does when X windows is running, but there is no response to any
attempt at input and no disk activity.

Never tried it on ext3; that filesystem is too slow when not freshly created to
meet my needs.

Comment 5 Eric Sandeen 2007-01-27 06:09:26 UTC

Ok; I guess my only other suggestions (other than finding me some time to
reproduce it!) would be to try reproducing it on a text console rather than X,
so that you can see any messages, or perhaps set up a serial console to capture
messages, and try the sysrq key from either of those to gather system state or
initiate a crashdump...

Thanks,
-Eric

Comment 6 David Keaton 2007-01-27 06:17:41 UTC

Without X, there would probably have to be something else with a large working
set in memory, so that may take some doing to reproduce the problem.

I don't have the same system configuration available anymore since I gave up on
FC6 and went back to FC5.

Comment 7 Chuck Ebbert 2007-03-19 20:59:07 UTC

Reporter cannot test fixes because he has gone back to FC5.
Current FC6 kernel is 2.6.20-1.2933.fc6...

Comment 8 David Keaton 2007-03-19 21:20:20 UTC

Reporter will test FC6 if a fix becomes available.

Comment 9 David Keaton 2007-03-19 21:24:03 UTC

It is possible that this bug is related to bug 221621.  In that case, a likely
explanation is that the system hangs in certain situations when there is a queue
of pages to be flushed to disk.  This can happen when there is a large amount of
disk activity, or when a large proportion of main memory is in use.

Comment 10 Chuck Ebbert 2007-03-19 21:38:23 UTC

First you need to test kernel 2.6.20-1.2933.fc6, which is going into testing
tomorrow. There is no way to know in advance if the 6000+ changes that
went into that kernel fixed the problem unless you test.

Comment 11 David Keaton 2007-03-20 00:50:01 UTC

I will not always be able to do this when a fix has not been attempted,
especially on such short notice, so please plan ahead.

I have reinstalled FC6 and updated to the latest kernel, which yum claims is
2.6.20-1.2925.fc6.  Please advise how to get the 2933 version.

Comment 12 Chuck Ebbert 2007-03-20 13:34:13 UTC

2933 is in updates-testing:

yum --enablerepo=updates-testing install kernel

You might have better luck using RPM to install it manually, though.

Comment 13 David Keaton 2007-03-21 19:16:04 UTC

Thanks.  Updating with yum worked fine.

This bug had been very easy to reproduce.  Now with 1 1/2 days of testing, I
cannot make it fail.  It appears to be fixed as of 2.6.20-1.2933.fc6.  Good job.

Note You need to log in before you can comment on or make changes to this bug.