Bug 224134 - kswapd->prune_dcache crash on 2.6.9-42.0.3.EL
kswapd->prune_dcache crash on 2.6.9-42.0.3.EL
Status: CLOSED DUPLICATE of bug 177357
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-01-24 06:44 EST by Colin Simpson
Modified: 2007-11-30 17:07 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-02-05 16:57:26 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Colin Simpson 2007-01-24 06:44:56 EST
Description of problem:

Kernel panic on 2.6.9-42.0.3.EL. Perhaps is related to a high load. The only
reason I have for this is that two systems running the same internal app seem to
have had a similar kernel panic. 
Jan 17 09:39:03 wheelhouse kernel: Unable to handle kernel paging request at
virtual address 0005005a
Jan 17 09:39:03 wheelhouse kernel:  printing eip:
Jan 17 09:39:03 wheelhouse kernel: c018b280
Jan 17 09:39:03 wheelhouse kernel: *pde = 0fd3e067
Jan 17 09:39:03 wheelhouse kernel: Oops: 0000 [#1]
Jan 17 09:39:03 wheelhouse kernel: Modules linked in: nfs nfsd exportfs nfs_acl
parport_pc lp parport autofs4 lockd sunrp
c ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mirror dm_mod
button battery ac nvidia(U) i2c_core md5 ip
v6 joydev uhci_hcd ehci_hcd hw_random snd_intel8x0 snd_ac97_codec snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_al
loc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore e1000 ext3 jbd
ata_piix libata sd_mod scsi_mod
Jan 17 09:39:03 wheelhouse kernel: CPU:    0
Jan 17 09:39:04 wheelhouse kernel: EIP:    0060:[<c018b280>]    Tainted: P      VLI
Jan 17 09:39:04 wheelhouse kernel: EFLAGS: 00010202   (2.6.9-42.0.3.EL) 
Jan 17 09:39:04 wheelhouse kernel: EIP is at iput+0x25/0x61
Jan 17 09:39:04 wheelhouse kernel: eax: 00050046   ebx: cb2d37dc   ecx: e0d603f3
  edx: cb2d37dc
Jan 17 09:39:04 wheelhouse kernel: esi: cb2d37dc   edi: cf982040   ebp: 00000007
  esp: dfd1fee4
Jan 17 09:39:04 wheelhouse kernel: ds: 007b   es: 007b   ss: 0068
Jan 17 09:39:04 wheelhouse kernel: Process kswapd0 (pid: 46, threadinfo=dfd1f000
task=dfcfe0b0)
Jan 17 09:39:04 wheelhouse kernel: Stack: d065dac8 c0186408 00000000 00000000
000000e1 00000000 dff6e9e0 c0186ec9 
Jan 17 09:39:04 wheelhouse kernel:        c015630b 00039d00 00000000 00000066
00000000 00000906 000000d0 00000020 
Jan 17 09:39:04 wheelhouse kernel:        c036c860 00000000 c036c860 00000000
c015794b dfd1ff60 00000906 dfd1ff9c 
Jan 17 09:39:04 wheelhouse kernel: Call Trace:
Jan 17 09:39:04 wheelhouse kernel:  [<c0186408>] prune_dcache+0x501/0x70c
Jan 17 09:39:04 wheelhouse kernel:  [<c0186ec9>] shrink_dcache_memory+0x16/0x2d
Jan 17 09:39:04 wheelhouse kernel:  [<c015630b>] shrink_slab+0xf7/0x14c
Jan 17 09:39:04 wheelhouse kernel:  [<c015794b>] balance_pgdat+0x1b3/0x2cb
Jan 17 09:39:04 wheelhouse kernel:  [<c0157b1c>] kswapd+0xb9/0xbb
Jan 17 09:39:04 wheelhouse kernel:  [<c0121853>] autoremove_wake_function+0x0/0x2d
Jan 17 09:39:04 wheelhouse kernel:  [<c0318d7e>] ret_from_fork+0x6/0x14
Jan 17 09:39:04 wheelhouse kernel:  [<c0121853>] autoremove_wake_function+0x0/0x2d
Jan 17 09:39:04 wheelhouse kernel:  [<c0157a63>] kswapd+0x0/0xbb
Jan 17 09:39:04 wheelhouse kernel:  [<c01041dd>] kernel_thread_helper+0x5/0xb
Jan 17 09:39:04 wheelhouse kernel: Code: ff e9 72 fd ff ff 53 85 c0 89 c3 74 58
83 bb 9c 01 00 00 20 8b 80 d4 00 00 00 8b
 40 24 75 08 0f 0b 54 04 76 08 33 c0 85 c0 74 0b <8b> 50 14 85 d2 74 04 89 d8 ff
d2 8d 43 1c ba f0 e0 36 c0 e8 18 
Jan 17 09:39:04 wheelhouse kernel:  <0>Fatal exception: panic in 5 seconds

And the second system,
Jan 23 13:56:58 clubba kernel: Unable to handle kernel paging request at virtual
address 0040f82e
Jan 23 13:56:58 clubba kernel:  printing eip:
Jan 23 13:56:58 clubba kernel: c0171af6
Jan 23 13:56:58 clubba kernel: *pde = 00000000
Jan 23 13:56:58 clubba kernel: Oops: 0000 [#1]
Jan 23 13:56:58 clubba kernel: SMP 
Jan 23 13:56:58 clubba kernel: Modules linked in: vmnet(U) vmmon(U) nvidia(U)
nfs nfsd exportfs nfs_acl parport_pc lp parport auto
fs4 i2c_dev i2c_core lockd sunrpc ipt_REJECT ipt_state ip_conntrack
iptable_filter ip_tables dm_mirror dm_mod button battery ac md
5 ipv6 joydev uhci_hcd ehci_hcd hw_random snd_intel8x0 snd_ac97_codec
snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc s
nd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore tg3 floppy ext3 jbd
ata_piix libata sd_mod scsi_mod
Jan 23 13:56:58 clubba kernel: CPU:    1
Jan 23 13:56:58 clubba kernel: EIP:    0060:[<c0171af6>]    Tainted: P      VLI
Jan 23 13:56:58 clubba kernel: EFLAGS: 00010202   (2.6.9-42.0.3.ELsmp) 
Jan 23 13:56:58 clubba kernel: EIP is at iput+0x25/0x61
Jan 23 13:56:58 clubba kernel: eax: 0040f81a   ebx: e8fcebc4   ecx: f8c01e4a  
edx: e8fcebc4
Jan 23 13:56:58 clubba kernel: esi: ebb3a344   edi: ebb3a34c   ebp: e8fcebc4  
esp: f7cfaee0
Jan 23 13:56:58 clubba kernel: ds: 007b   es: 007b   ss: 0068
Jan 23 13:56:58 clubba kernel: Process kswapd0 (pid: 53, threadinfo=f7cfa000
task=f7d1b7b0)
Jan 23 13:56:58 clubba kernel: Stack: c221e640 c016f6b6 00000000 00000062
00000000 00000080 00000000 f7ffe9a0 
Jan 23 13:56:58 clubba kernel:        c016fa7c c01497f8 0004e200 00000000
00000001 00000000 00031fc4 000000d0 
Jan 23 13:56:58 clubba kernel:        00000020 c032a380 00000001 c0328f80
00000001 c014aa93 c02d26bd 00031fc4 
Jan 23 13:56:58 clubba kernel: Call Trace:
Jan 23 13:56:58 clubba kernel:  [<c016f6b6>] prune_dcache+0x29d/0x31b
Jan 23 13:56:58 clubba kernel:  [<c016fa7c>] shrink_dcache_memory+0x16/0x2d
Jan 23 13:56:58 clubba kernel:  [<c01497f8>] shrink_slab+0xf8/0x161
Jan 23 13:56:58 clubba kernel:  [<c014aa93>] balance_pgdat+0x1e1/0x30e
Jan 23 13:56:58 clubba kernel:  [<c02d26bd>] schedule+0x86d/0x8db
Jan 23 13:56:58 clubba kernel:  [<c0120420>] prepare_to_wait+0x12/0x4c
Jan 23 13:56:58 clubba kernel:  [<c014ac8a>] kswapd+0xca/0xcc
Jan 23 13:56:58 clubba kernel:  [<c01204f5>] autoremove_wake_function+0x0/0x2d
Jan 23 13:56:58 clubba kernel:  [<c02d46e6>] ret_from_fork+0x6/0x14
Jan 23 13:56:58 clubba kernel:  [<c01204f5>] autoremove_wake_function+0x0/0x2d
Jan 23 13:56:58 clubba kernel:  [<c014abc0>] kswapd+0x0/0xcc
Jan 23 13:56:58 clubba kernel:  [<c01041f5>] kernel_thread_helper+0x5/0xb
Jan 23 13:56:58 clubba kernel: Code: ff e9 e5 fe ff ff 53 85 c0 89 c3 74 58 83
bb 3c 01 00 00 20 8b 80 a4 00 00 00 8b 40 24 75 08 
0f 0b 54 04 34 c3 2e c0 85 c0 74 0b <8b> 50 14 85 d2 74 04 89 d8 ff d2 8d 43 1c
ba 70 de 32 c0 e8 62 
Jan 23 13:56:58 clubba kernel:  <0>Fatal exception: panic in 5 seconds



Version-Release number of selected component (if applicable):
kernel 2.6.9-42.0.3

How reproducible:
Not very, as I'm not quite sure what caused it.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Daniel J Blueman 2007-01-25 06:43:38 EST
Multiple people have reported this in bug 177357, which was (incorrectly) marked
as a duplicate of another problem and closed.

I'm hitting this kswapd->prune_dcache bug quite frequently - around once a week,
on a few machines under constant processor-bound load and high(er) memory pressure.

Configuration is stock RHEL4 U4 + latest errata 2.6.9-42.0.3, dual-SMP, x86-64
(so not only i686 as above), 4GB memory

Crash signature is:

__down_read_trylock+18      prune_dcache+568
shrink_dcache_memory+20     shrink_slab+188
balance_pgdat+538           kswapd+252
autoremove_wake_function+0  autoremove_wake_function+0
child_rip+8                 kswapd+0
child_rip+0

RIP: _spin_lock_irqsave+40

Let me know if getting a crash-dump and making it available to someone would help.

[suggest changing summary to "kswapd->prune_dcache crash on 2.6.9-42.0.3.EL"]
Comment 2 Colin Simpson 2007-01-25 06:53:08 EST
Changed Summary. I'd struggle to get a crash dump as it seems to hit machines
randomly. Has someone passed this through a RHN subscription to escalate? I will
if no one else has.
Comment 3 Daniel J Blueman 2007-01-25 06:58:44 EST
It is crashing a number of systems I've seen, randomly, so would suggest a race.

I have asked for bug 177357 to be re-opened, since there are others reporting it
too there; it's probably a good idea to escalate this.
Comment 4 Jason Baron 2007-02-05 16:57:26 EST

*** This bug has been marked as a duplicate of 177357 ***

Note You need to log in before you can comment on or make changes to this bug.