436087 – panic in shrink

Bug 436087 - panic in shrink

Summary: panic in shrink

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	8
Hardware:	All
OS:	Linux
Priority:	low
Severity:	urgent
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-03-05 11:32 UTC by JW
Modified:	2008-03-07 03:37 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-03-07 02:40:34 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description JW 2008-03-05 11:32:49 UTC

Description of problem:
Kernel panics during modest disk activity

Version-Release number of selected component (if applicable):
kernel-2.6.23.1-49

How reproducible:
Always

Steps to Reproduce:
1. create ext3 journaled filesystem
2. create lots of files
3. wait for kernel to crash
  
Actual results:
kernel crashes in at least two possible ways

Expected results:
kernel must not crash! kernel must be stress tested before being released!!

Additional info:
Instance 1:
  d_kill+0xe/0x32
  prune_one_dentry+0x30/0xb9
  prune_dcache+0xe6/0x125
  shrink_dcache_memory+0x1c/0x34
  shrink_slab+0xd5/0x134
  kswapd+0x2a7/0x40a
  autoremove_wake_function+0x0/0x35
  kswapd+0x38/0x409
  kthread+0x38/0x5e
  kthread+0x0/0x5e
  kernel_thread_helper+0x7/0x10
  EIP: list_del+0x26/0x5d

Instance 2:
  journal_try_to_free_buffer+0x5c/0x137 [jbd]
  free_buffer_head+0x18/0x2c
  ext3_realease_page+0x0/0x7b [ext3]
  try_to_release_page+0x30/0x42
  __invalidate_mapping_pages+0x79/0xec
  invalidate_mapping_pages+0xf/0x11
  shrink_icache_memory
  shrink_slab
  kswapd
  autoremove_wake_function
  kswapd
  kswapd
  EIP: journal_grab_journal_head+0xf/0x3e

Comment 1 Dave Jones 2008-03-05 17:42:18 UTC

hmm, so you can repeat this on demand ?
If so, can you try the 2.6.24 based kernel from updates-testing ?

Adding Eric to Cc as maybe he's seen something similar in ext3 development.

Comment 2 Eric Sandeen 2008-03-05 18:00:18 UTC

no, I don't *think* I've seen this.  It looks vaguely like bug #428329.  But
then, it's not a lot to go on.  Could you include the full oopses rather than
the heavily-edited versions?

Also, testing on a debug kernel (yum install kernel-debug) might yield clues.

Comment 3 JW 2008-03-05 22:14:36 UTC

I have gone back to kernel-2.6.23.1-21.fc7 because I have had this running on
other hardware for several months now. Pity that it has disappeared from nearly
every mirror though.  Why do good rpms in updates repository get deleted and
replaced by inferior ones (more patches != better patches)?

Comment 4 Eric Sandeen 2008-03-05 22:24:51 UTC

Hm, I'll be sure to suggest to the kernel maintainers that they discontinue
their irrational, single-minded quest for more and more patches.... 

But anyway, you can usually find older rpms on koji.fedoraproject.org, for
example http://koji.fedoraproject.org/packages/kernel/2.6.23.1/21.fc7/

If you'd like to see the problem resolved so that you don't have to stick with
fc7 kernels, posting the full oops output, or reproducing the problem on the
debug kernel as I suggested would be a great help.  Otherwise I'm not sure we
can hope for a resolution, with the limited information provided.

Thanks,
-Eric

Comment 5 Chuck Ebbert 2008-03-05 23:46:29 UTC

Please post the complete oops messages.

Comment 6 JW 2008-03-07 02:27:03 UTC

I have gone back to FC7 kernel (kernel-2.6.23.1-21) which is nice and stable.
No problems whatsoever over last couple of days (FC8 kernel crashed at least
once every day).

Cannot vouch for current FC7 kernel update (kernel-2.6.23.15-80) because the
numbering has gone from 2.6.23.1 to 2.6.23.15 which doesn't make a lot of sense.

Somebody should make 2.6.23.1-21 available again because it is good.

Comment 7 Eric Sandeen 2008-03-07 02:40:34 UTC

Without more information, we cannot proceed on this bug.

If you can provide the requested data (full oops output, preferably from a debug
kernel), please re-open.

Comment 8 JW 2008-03-07 03:09:06 UTC

If you are comfortable ignoring the stack traces that I have provided then go
ahead and close this bug and pretend that FC8 kernel is fantastic. That is your
choice, not mine.

It certainly is a clever way to eliminate kernel flaws.

Comment 9 Eric Sandeen 2008-03-07 03:37:04 UTC

It is clearly a bug, (well, or perhaps bad memory or whatnot, so far we really
cannot tell) and I'd very much like to fix it if possible.  I've not seen it
reported elsewhere, and you seem to be somewhat uniquely able to hit it. 
However, the information you have provided is not enough to go on.  "something
went wrong down this path" just isn't enough to go on.  Maybe it was a null
pointer.  Maybe it was a bad pointer.  All I can do is guess.

There is a reason that the kernel provides copious amounts of information on an
oops; it is so an attempt can be made at debugging.  You have provided perhaps
20% of the oops info, and seem to be unwilling or unable to run this one more
time to gather the information requested.

I have no testcase; I have no oops output.  I have a stacktrace, and that is
all.  I can't see registers, I don't know what modules were loaded, I don't know
if your kernel was tainted, etc etc.  I don't even know what the actual first
line of the oops was.  If you don't want me to close this bug, I need just a bit
more info from you, as the reporter.  I cannot magically infer the missing
information.

I have a hunch that it may be related to another bug I've seen, which in turn
may be related to suspend problems.  But, there is simply not enough here to be
able to tell.

If you can provide the requested info by running the problematic kernel once
more, preferably the debug variant, I am more than willing to spend time digging
into this bug.

Note You need to log in before you can comment on or make changes to this bug.