Bug 185978 - Kernel panic in sysfs
Summary: Kernel panic in sysfs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: ia64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Prarit Bhargava
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 223796
TreeView+ depends on / blocked
 
Reported: 2006-03-20 17:19 UTC by Nick Dokos
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version: RHBA-2007-0791
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-15 16:13:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Stack trace of kernel panic (3.49 KB, text/plain)
2006-03-20 17:19 UTC, Nick Dokos
no flags Details
Even better kernel stack trace with DEBUG on in kobject and sysfs (3.99 KB, application/octet-stream)
2007-01-19 17:01 UTC, Prarit Bhargava
no flags Details
Fix for this issue (872 bytes, patch)
2007-01-20 13:28 UTC, Prarit Bhargava
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0791 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 4 Update 6 2007-11-14 18:25:55 UTC

Description Nick Dokos 2006-03-20 17:19:28 UTC
Description of problem: Kernel panic in sysfs - see attachmed for stack trace.
I have seen the problem on a 16-CPU HP rx8620 (and also on the same
machine but with only 8 cpus configured). It seems to be a race either
in sysfs generic code or in efibootmgr-related kernel code.


Version-Release number of selected component (if applicable):
RHEL4 Update 3 - kernel 2.6.9-34

How reproducible: Every time


Steps to Reproduce:
1.As root, execute the following command:

    efibootmgr -T & efibootmgr -T

putting the first instance in the background and simultaneously
executing the second.

2. 
3.
  
Actual results: Panic - see attachment for stack trace


Expected results: No panic


Additional info:

Comment 1 Nick Dokos 2006-03-20 17:19:30 UTC
Created attachment 126355 [details]
Stack trace of kernel panic

Comment 2 Prarit Bhargava 2007-01-17 15:01:13 UTC
Adding dchapman and myself.  Doug, I tried this on my rx2620 and didn't see any
problems.  Do you have access to a larger box that we could try this out on?

[root@dhcp83-14 linux-2.6.9]# efibootmgr -T & efibootmgr -T
[1] 5885
BootCurrent: 0006
BootOrder: 0006,0005,0004,0000,0001,0002,0003
Boot0000* Internal Bootable DVD
Boot0001* Core LAN Gb A
Boot0002* Core LAN Gb B
Boot0003* EFI Shell [Built-in]
Boot0004* Fedora Core for ia64
Boot0005* Red Hat Enterprise Linux Server
Boot0006* Red Hat Enterprise Linux AS
BootCurrent: 0006
BootOrder: 0006,0005,0004,0000,0001,0002,0003
Boot0000* Internal Bootable DVD
Boot0001* Core LAN Gb A
Boot0002* Core LAN Gb B
Boot0003* EFI Shell [Built-in]
Boot0004* Fedora Core for ia64
Boot0005* Red Hat Enterprise Linux Server
Boot0006* Red Hat Enterprise Linux AS
[1]+  Done                    efibootmgr -T

P.

Comment 3 Doug Chapman 2007-01-17 15:37:35 UTC
On my 64p system I have in Nashua this does panic (tried it under RHEL5 so
doesn't appear to be RHEL4 specific).

We might be able to reproduce on kona but the MP seems to be dead on it. 
Perhaps olympia will see the problem.  I will test it there.

I don't seem to see this on my smaller systems either.  I see Nick hit this on
an rx8620 so likely this needs more cpus to hit it or might even be NUMA specific.


Comment 4 Doug Chapman 2007-01-17 16:26:18 UTC
Prarit,

olympia1.lab panics with this also.  I installed RHEL4U4 on it and it is
reserved in my name for I think 8 hours.

- Doug


Comment 5 Prarit Bhargava 2007-01-19 15:44:46 UTC
Doug, I've been trying olympia1.lab for most of the morning and can't seem to
get this to panic.  I am running a pre RHEL4-U5 kernel.

Did you just run efibootmgr -T & efibootmgr -T or did you do something else?

P.

Comment 6 Prarit Bhargava 2007-01-19 15:46:15 UTC
Ah, never mind.

Got the whole panic this time too :)

kernel BUG at include/linux/dcache.h:282!
efibootmgr[11659]: bugcheck! 0 [1]
Modules linked in: md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) autofs4(U)
nfs(U) lockd(U) nfs_acl(U) sunrpc(U) ds(U) yenta_socket(U) pcmcia_core(U)
vfat(U) fat(U) button(U) tg3(U) sr_mod(U) dm_snapshot(U) dm_zero(U) dm_mirror(U)
ext3(U) jbd(U) dm_mod(U) cciss(U) sym53c8xx(U) scsi_transport_spi(U) sd_mod(U)
scsi_mod(U)

Pid: 11659, CPU 0, comm:           efibootmgr
psr : 0000101008126030 ifs : 800000000000050d ip  : [<a0000001001bcd80>]    Not
tainted
ip is at sysfs_remove_dir+0x340/0x360
unat: 0000000000000000 pfs : 000000000000050d rsc : 0000000000000003
rnat: 8000000000000000 bsps: 0000000000000000 pr  : 000000000555a959
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001001bcd80 b6  : a000000100015f80 b7  : a000000100260ba0
f6  : 1003e0000000000001200 f7  : 1003e8080808080808081
f8  : 1003e00000000000023dc f9  : 1003e000000000e580000
f10 : 1003e00000000356f424c f11 : 1003e44b831eee7285baf
r1  : a0000001009cbef0 r2  : 0000000000000010 r3  : 0000000000000010
r8  : 000000000000002a r9  : 00000000000000fd r10 : a0000001007de610
r11 : 0000000000000001 r12 : e000070ff66ffe00 r13 : e000070ff66f8000
r14 : 0000000000004000 r15 : a00000010075fbc0 r16 : a00000010075fbc8
r17 : e0000720f8657de8 r18 : a0000001009f3550 r19 : a0000001009f3550
r20 : 0000000000000004 r21 : 0000000000000000 r22 : 0000000000000000
r23 : 0000000000000000 r24 : 0000000000000000 r25 : 0000000000000004
r26 : e0000000002c8dd0 r27 : 0000000000000000 r28 : e000070ff66f8dd4
r29 : e0000000002c8dd4 r30 : e0000720f8650050 r31 : 00000000356f424c

Call Trace:
 [<a000000100016da0>] show_stack+0x80/0xa0
                                sp=e000070ff66ff970 bsp=e000070ff66f91b0
 [<a0000001000176b0>] show_regs+0x890/0x8c0
                                sp=e000070ff66ffb40 bsp=e000070ff66f9168
 [<a00000010003ecd0>] die+0x150/0x240
                                sp=e000070ff66ffb60 bsp=e000070ff66f9128
 [<a00000010003ee00>] die_if_kernel+0x40/0x60
                                sp=e000070ff66ffb60 bsp=e000070ff66f90f8
 [<a00000010003efa0>] ia64_bad_break+0x180/0x600
                                sp=e000070ff66ffb60 bsp=e000070ff66f90d0
 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260
                                sp=e000070ff66ffc30 bsp=e000070ff66f90d0
 [<a0000001001bcd80>] sysfs_remove_dir+0x340/0x360
                                sp=e000070ff66ffe00 bsp=e000070ff66f9068
 [<a00000010024b760>] kobject_del+0x40/0x80
                                sp=e000070ff66ffe00 bsp=e000070ff66f9048
 [<a00000010024b7c0>] kobject_unregister+0x20/0x60
                                sp=e000070ff66ffe00 bsp=e000070ff66f9028
 [<a00000010045b2c0>] efivar_delete+0x3c0/0x440
                                sp=e000070ff66ffe00 bsp=e000070ff66f8fb8
 [<a0000001001ba4c0>] subsys_attr_store+0x80/0xa0
                                sp=e000070ff66ffe20 bsp=e000070ff66f8f80
 [<a0000001001baad0>] sysfs_write_file+0x230/0x2e0
                                sp=e000070ff66ffe20 bsp=e000070ff66f8f30
 [<a0000001001246b0>] vfs_write+0x290/0x360
                                sp=e000070ff66ffe20 bsp=e000070ff66f8ee0
 [<a0000001001248d0>] sys_write+0x70/0xe0
                                sp=e000070ff66ffe20 bsp=e000070ff66f8e68
 [<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20
                                sp=e000070ff66ffe30 bsp=e000070ff66f8e68
 [<a000000000010640>] 0xa000000000010640
                                sp=e000070ff6700000 bsp=e000070ff66f8e68
Kernel panic - not syncing: Fatal exception


Comment 7 Prarit Bhargava 2007-01-19 16:35:09 UTC
Lowering the severity on this -- I cannot reproduce this reliably.  I've now run
through 30000+ iterations of the "reproducer".

There is a bug though because I hit the issue once ....

P.

Comment 8 Prarit Bhargava 2007-01-19 17:01:47 UTC
Created attachment 146010 [details]
Even better kernel stack trace with DEBUG on in kobject and sysfs

Comment 9 Prarit Bhargava 2007-01-19 18:23:44 UTC
The real reproducer here is to do 

efibootmgr -t 10; efibootmgr -T & efibootmgr -T

This causes near 100% reproducibility...

P.

Comment 10 Prarit Bhargava 2007-01-20 13:28:27 UTC
Created attachment 146059 [details]
Fix for this issue

I've sent this to Matt Domsch for a quick review before posting it to
rhkernel-list and LKML.

P.

Comment 11 Prarit Bhargava 2007-01-22 15:32:02 UTC
Initial upstream submit here:

http://marc.theaimsgroup.com/?l=linux-kernel&m=116947959710487&w=2

P.

Comment 12 RHEL Program Management 2007-01-23 14:25:41 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 13 RHEL Program Management 2007-04-18 23:01:15 UTC
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 14 Jason Baron 2007-05-18 16:09:42 UTC
/people.redhat.com/~jbaron/rhel4/


Comment 17 errata-xmlrpc 2007-11-15 16:13:50 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html



Note You need to log in before you can comment on or make changes to this bug.