Description of problem: Kernel panic in sysfs - see attachmed for stack trace. I have seen the problem on a 16-CPU HP rx8620 (and also on the same machine but with only 8 cpus configured). It seems to be a race either in sysfs generic code or in efibootmgr-related kernel code. Version-Release number of selected component (if applicable): RHEL4 Update 3 - kernel 2.6.9-34 How reproducible: Every time Steps to Reproduce: 1.As root, execute the following command: efibootmgr -T & efibootmgr -T putting the first instance in the background and simultaneously executing the second. 2. 3. Actual results: Panic - see attachment for stack trace Expected results: No panic Additional info:
Created attachment 126355 [details] Stack trace of kernel panic
Adding dchapman and myself. Doug, I tried this on my rx2620 and didn't see any problems. Do you have access to a larger box that we could try this out on? [root@dhcp83-14 linux-2.6.9]# efibootmgr -T & efibootmgr -T [1] 5885 BootCurrent: 0006 BootOrder: 0006,0005,0004,0000,0001,0002,0003 Boot0000* Internal Bootable DVD Boot0001* Core LAN Gb A Boot0002* Core LAN Gb B Boot0003* EFI Shell [Built-in] Boot0004* Fedora Core for ia64 Boot0005* Red Hat Enterprise Linux Server Boot0006* Red Hat Enterprise Linux AS BootCurrent: 0006 BootOrder: 0006,0005,0004,0000,0001,0002,0003 Boot0000* Internal Bootable DVD Boot0001* Core LAN Gb A Boot0002* Core LAN Gb B Boot0003* EFI Shell [Built-in] Boot0004* Fedora Core for ia64 Boot0005* Red Hat Enterprise Linux Server Boot0006* Red Hat Enterprise Linux AS [1]+ Done efibootmgr -T P.
On my 64p system I have in Nashua this does panic (tried it under RHEL5 so doesn't appear to be RHEL4 specific). We might be able to reproduce on kona but the MP seems to be dead on it. Perhaps olympia will see the problem. I will test it there. I don't seem to see this on my smaller systems either. I see Nick hit this on an rx8620 so likely this needs more cpus to hit it or might even be NUMA specific.
Prarit, olympia1.lab panics with this also. I installed RHEL4U4 on it and it is reserved in my name for I think 8 hours. - Doug
Doug, I've been trying olympia1.lab for most of the morning and can't seem to get this to panic. I am running a pre RHEL4-U5 kernel. Did you just run efibootmgr -T & efibootmgr -T or did you do something else? P.
Ah, never mind. Got the whole panic this time too :) kernel BUG at include/linux/dcache.h:282! efibootmgr[11659]: bugcheck! 0 [1] Modules linked in: md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) autofs4(U) nfs(U) lockd(U) nfs_acl(U) sunrpc(U) ds(U) yenta_socket(U) pcmcia_core(U) vfat(U) fat(U) button(U) tg3(U) sr_mod(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) cciss(U) sym53c8xx(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U) Pid: 11659, CPU 0, comm: efibootmgr psr : 0000101008126030 ifs : 800000000000050d ip : [<a0000001001bcd80>] Not tainted ip is at sysfs_remove_dir+0x340/0x360 unat: 0000000000000000 pfs : 000000000000050d rsc : 0000000000000003 rnat: 8000000000000000 bsps: 0000000000000000 pr : 000000000555a959 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001001bcd80 b6 : a000000100015f80 b7 : a000000100260ba0 f6 : 1003e0000000000001200 f7 : 1003e8080808080808081 f8 : 1003e00000000000023dc f9 : 1003e000000000e580000 f10 : 1003e00000000356f424c f11 : 1003e44b831eee7285baf r1 : a0000001009cbef0 r2 : 0000000000000010 r3 : 0000000000000010 r8 : 000000000000002a r9 : 00000000000000fd r10 : a0000001007de610 r11 : 0000000000000001 r12 : e000070ff66ffe00 r13 : e000070ff66f8000 r14 : 0000000000004000 r15 : a00000010075fbc0 r16 : a00000010075fbc8 r17 : e0000720f8657de8 r18 : a0000001009f3550 r19 : a0000001009f3550 r20 : 0000000000000004 r21 : 0000000000000000 r22 : 0000000000000000 r23 : 0000000000000000 r24 : 0000000000000000 r25 : 0000000000000004 r26 : e0000000002c8dd0 r27 : 0000000000000000 r28 : e000070ff66f8dd4 r29 : e0000000002c8dd4 r30 : e0000720f8650050 r31 : 00000000356f424c Call Trace: [<a000000100016da0>] show_stack+0x80/0xa0 sp=e000070ff66ff970 bsp=e000070ff66f91b0 [<a0000001000176b0>] show_regs+0x890/0x8c0 sp=e000070ff66ffb40 bsp=e000070ff66f9168 [<a00000010003ecd0>] die+0x150/0x240 sp=e000070ff66ffb60 bsp=e000070ff66f9128 [<a00000010003ee00>] die_if_kernel+0x40/0x60 sp=e000070ff66ffb60 bsp=e000070ff66f90f8 [<a00000010003efa0>] ia64_bad_break+0x180/0x600 sp=e000070ff66ffb60 bsp=e000070ff66f90d0 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260 sp=e000070ff66ffc30 bsp=e000070ff66f90d0 [<a0000001001bcd80>] sysfs_remove_dir+0x340/0x360 sp=e000070ff66ffe00 bsp=e000070ff66f9068 [<a00000010024b760>] kobject_del+0x40/0x80 sp=e000070ff66ffe00 bsp=e000070ff66f9048 [<a00000010024b7c0>] kobject_unregister+0x20/0x60 sp=e000070ff66ffe00 bsp=e000070ff66f9028 [<a00000010045b2c0>] efivar_delete+0x3c0/0x440 sp=e000070ff66ffe00 bsp=e000070ff66f8fb8 [<a0000001001ba4c0>] subsys_attr_store+0x80/0xa0 sp=e000070ff66ffe20 bsp=e000070ff66f8f80 [<a0000001001baad0>] sysfs_write_file+0x230/0x2e0 sp=e000070ff66ffe20 bsp=e000070ff66f8f30 [<a0000001001246b0>] vfs_write+0x290/0x360 sp=e000070ff66ffe20 bsp=e000070ff66f8ee0 [<a0000001001248d0>] sys_write+0x70/0xe0 sp=e000070ff66ffe20 bsp=e000070ff66f8e68 [<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20 sp=e000070ff66ffe30 bsp=e000070ff66f8e68 [<a000000000010640>] 0xa000000000010640 sp=e000070ff6700000 bsp=e000070ff66f8e68 Kernel panic - not syncing: Fatal exception
Lowering the severity on this -- I cannot reproduce this reliably. I've now run through 30000+ iterations of the "reproducer". There is a bug though because I hit the issue once .... P.
Created attachment 146010 [details] Even better kernel stack trace with DEBUG on in kobject and sysfs
The real reproducer here is to do efibootmgr -t 10; efibootmgr -T & efibootmgr -T This causes near 100% reproducibility... P.
Created attachment 146059 [details] Fix for this issue I've sent this to Matt Domsch for a quick review before posting it to rhkernel-list and LKML. P.
Initial upstream submit here: http://marc.theaimsgroup.com/?l=linux-kernel&m=116947959710487&w=2 P.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST.
/people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html