Bug 177357

Summary: kswapd caused kernel oops when shrinking dcache
Product: Red Hat Enterprise Linux 4 Reporter: Jim King <jrk>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: aaron, aviro, Colin.Simpson, daniel, hollowec, hoover, i-kitayama, jbaron, joshuadfranklin, jwest, wanderso
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-07 04:47:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
console log from january crash
none
console log from december crash none

Description Jim King 2006-01-10 00:35:06 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.8) Gecko/20050427 Camino/0.8.4

Description of problem:
About once every month, we get a kernel oops in kswapd0. All stacks show the problem being somewhere in shrinking the dcache. Two console logs from 2 different events included as attachments.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-22.0.1.EL

How reproducible:
Sometimes

Steps to Reproduce:
1. Let RHEL4 fileserver run for over a month
2.
3.
  

Additional info:

Comment 1 Jim King 2006-01-10 00:38:06 UTC
Created attachment 122978 [details]
console log from january crash

Comment 2 Jim King 2006-01-10 00:38:36 UTC
Created attachment 122979 [details]
console log from december crash

Comment 4 Jason Baron 2006-01-17 03:09:36 UTC
There has been some discussion of this issue on lkml, please see:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113745093326567&w=2



Comment 5 Bill Hoover 2006-04-19 13:43:37 UTC
I seem to also just have had a recent rash of these.  Has there been any
progress at all?  Is there anything in the updated kernel in U3 that is expected
to address this issue?

Comment 6 Jason Baron 2006-04-19 14:09:21 UTC
hi Bill,

Yes we are aware that this is a hot issue and we are actively persuing a fix.
No, U3 does not fix this issue. I'm marking this as a duplicate of bug #173843.

thanks,

-Jason

Comment 7 Jason Baron 2006-04-19 14:10:31 UTC

*** This bug has been marked as a duplicate of 173843 ***

Comment 8 Daniel J Blueman 2007-01-25 11:35:19 UTC
The bug this has been marked a duplicate of, is another and separate issue;
there has been no unmounting in the case I've been seeing (matching other
reports here), so shrink_dcache_parent() is not being called from kill_super()
on unmount.

Since this is with post-U4 2.6.9-42.0.3, it is after the "[fix] committed in
stream U4 build 35.1", above, and I am not seeing the 'busy-inodes on unmount'
message either, and the fix for the bug this is marked as a duplicate of is
present, yet I'm still seeing the crash.

I'm hitting this kswapd->prune_dcache bug quite frequently - around once a week,
on a few machines under constant load and high memory pressure.

Configuration is stock RHEL4 U4 + latest errata 2.6.9-42.0.3, dual-SMP, x86-64,
4GB memory

Crash signature is:

__down_read_trylock+18      prune_dcache+568
shrink_dcache_memory+20     shrink_slab+188
balance_pgdat+538           kswapd+252
autoremove_wake_function+0  autoremove_wake_function+0
child_rip+8                 kswapd+0
child_rip+0

RIP: _spin_lock_irqsave+40

Can this be reopened please, to preserve the reports from other users?

Let me know if getting a crash-dump and making it available to someone would help.

Comment 9 Daniel J Blueman 2007-01-25 11:41:21 UTC
Bug 224134 is a duplicate of this:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=224134


Comment 10 Jason Baron 2007-02-05 21:58:11 UTC
*** Bug 224134 has been marked as a duplicate of this bug. ***

Comment 11 Need Real Name 2007-03-01 13:58:56 UTC
I too am seeing this on machines running the 42.0.2 kernel. Any progress on this
in terms of patches to test?

Comment 12 Daniel J Blueman 2007-03-01 14:06:28 UTC
It would be a good data-point to establish if the newer kernels suffer the same
issue:

http://people.redhat.com/~jbaron/rhel4/RPMS.kernel/

Comment 17 RHEL Program Management 2007-05-09 10:59:07 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 18 Aaron Straus 2007-06-19 18:45:23 UTC
*** Bug 177122 has been marked as a duplicate of this bug. ***

Comment 19 RHEL Program Management 2007-09-07 19:46:19 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 21 Chris Hollowell 2007-10-11 20:27:10 UTC
Would it be possible to get a test patch posted here before release?

Comment 23 Larry Woodman 2007-11-02 17:13:09 UTC
Can someone who can reproduce this problem see if it can still happens with
RHEL4-U6?  If it does crash, please get a dump, I hav never been able to 
reproduce this problem internally.

Larry Woodman