Bug 89189 - Occasional kernel panic on ACPI power button event
Summary: Occasional kernel panic on ACPI power button event
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-04-19 18:03 UTC by Dan Eaton
Modified: 2007-04-18 16:53 UTC (History)
0 users

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2003-06-08 20:28:58 UTC
Embargoed:


Attachments (Terms of Use)

Description Dan Eaton 2003-04-19 18:03:46 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 Galeon/1.2.7 (X11; Linux i686; U;) Gecko/20030131

Description of problem:
Running on a dual P4 2.8GHz system with the ospm_busmgr, osmp_button, and
ospm_processor modules loaded.  About one out of every 20 ACPI graceful shutdown
events generated via the power button the following kernel panic is generated.

kernel BUG at slab.c:1131!
invalid operand: 0000
e100 e1000 ospm_system ospm_button ospm_busmgr usbkbd keybdev usb-storage
scsi_mod mousedev hid usbmouse input usb-uhci usbcore ext3 jbd
CPU:    0
EIP:    0010:[<c013b316>]    Not tainted
EFLAGS: 00010202
  
EIP is at kmem_cache_grow [kernel] 0x56 (2.4.18-27rlx5smp)
eax: 000001f0   ebx: 000001f0   ecx: 00000000   edx: 00000000
esi: c36b4080   edi: c36b4080   ebp: 00000000   esp: c0357e10
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c0357000)
Stack: 00000000 00000000 00000002 c3616000 00000001 c0407584 c36b4080 c36b4080
       c36b4088 c36b4080 00000000 c013bb34 c36b4080 000001f0 00000246 c0197fc8
       0000003e 00000100 00000238 00000006 ffffffff f88a1c6a 00000000 c0188f7f
Call Trace: [<c013bb34>] kmalloc [kernel] 0x1a4 (0xc0357e3c))
[<c0197fc8>] acpi_hw_low_level_write [kernel] 0x18 (0xc0357e4c))
[<f88a1c6a>] .LC11 [ospm_button] 0x68 (0xc0357e64))
[<c0188f7f>] acpi_os_allocate [kernel] 0xf (0xc0357e6c))
[<c0188f9c>] acpi_os_callocate [kernel] 0xc (0xc0357e78))
[<f88a1c6a>] .LC11 [ospm_button] 0x68 (0xc0357e84))
[<f8899cc2>] bm_osl_generate_event_Rsmp_7dc74281 [ospm_busmgr] 0x62 (0xc0357e88))
[<f88a1a17>] bn_osl_generate_event [ospm_button] 0x57 (0xc0357ea0))
[<f88a1c74>] .LC11 [ospm_button] 0x72 (0xc0357ea8))
[<f88a1c64>] .LC11 [ospm_button] 0x62 (0xc0357eac))
[<f88a166d>] bn_notify_fixed [ospm_button] 0x6d (0xc0357ebc))
[<f88a1ce0>] .rodata.str1.32 [ospm_button] 0x60 (0xc0357ed4))
[<f88a1c36>] .LC11 [ospm_button] 0x34 (0xc0357edc))
[<f88a1b50>] .rodata.str1.1 [ospm_button] 0x0 (0xc0357ee0))
[<c0193ac1>] acpi_ev_fixed_event_dispatch [kernel] 0x51 (0xc0357ee4))
[<c0193b19>] acpi_ev_fixed_event_dispatch [kernel] 0xa9 (0xc0357ef0))
[<c0193a45>] acpi_ev_fixed_event_detect [kernel] 0xa5 (0xc0357f00))
[<c01954c5>] acpi_ev_sci_handler [kernel] 0x55 (0xc0357f24))
[<c018908d>] acpi_irq [kernel] 0xd (0xc0357f40))
[<c010ab7e>] handle_IRQ_event [kernel] 0x5e (0xc0357f48))
[<c0189080>] acpi_irq [kernel] 0x0 (0xc0357f50))
[<c010ae9e>] do_IRQ [kernel] 0x10e (0xc0357f68))
[<c0106ed0>] default_idle [kernel] 0x0 (0xc0357f7c))
[<c0106ed0>] default_idle [kernel] 0x0 (0xc0357f88))
[<c0106ed0>] default_idle [kernel] 0x0 (0xc0357f90))
[<c0106ed0>] default_idle [kernel] 0x0 (0xc0357fa4))
[<c0106ef9>] default_idle [kernel] 0x29 (0xc0357fb8))
[<c0106f72>] cpu_idle [kernel] 0x32 (0xc0357fc4))
[<c0105000>] stext [kernel] 0x0 (0xc0357fd0))
  
  
Code: 0f 0b 6b 04 1c 89 26 c0 89 5c 24 08 b8 03 00 00 00 81 64 24
 <0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
 
After analyzing the oops and studying the source code it appears that the root
cause of this issue is that the ACPI code is attempting to allocate kernel
memory from an interrupt context via a blockable memory request
(./linux-2.4/drivers/acpi/os.c::acpi_os_allocate() call to kmalloc using the
GFP_KERNEL flag which allows the executing process to sleep if necessary to
fulfill the memory request). When an ACPI event occurs, it is results in an
interrupt being generated. The ACPI code provides and registers an ISR for
handling this interrupt. In the course of servicing this interrupt, the ACPI
code requests memory for writing the event to user-space for consumption/action
by the OS. If no pre-allocated memory remains and this memory request cannot be
fulfilled, the memory allocation subsytem in the kernel attempts to block to
allocated more memory and fulfill the reqest. Unfortunately, when executing in
the interrupt context, blocking (and consequently releasing back to the
scheduler) is a no-no. It appears that a simple (and possibly naive) fix would
be to change the type of memory allocation requested to be of the atomic
variety. This change involves changing the flags passed to kmalloc from the
acpi_os_allocate() (in linux/acpi/os.c) from GFP_KERNEL to GFP_ATOMIC.

I have tested this fix. Initiation of over 300 ACPI events resulted in no kernel
panics and each event was handled appropriately by the OS.

Version-Release number of selected component (if applicable):
kernel-smp-2.4.18-3; kernel-smp-2.4.18-27.7.x

How reproducible:
Always

Steps to Reproduce:
1. Generate ACPI power button event (will take several attempts because the
event must occur when no available slabs are available)
2. Observe panic
3.
    

Actual Results:  Kernel panic

Expected Results:  No kernel panic

Additional info:

Tested with the stock 7.3 kernel 2.4.18-3 and the latest errata 2.4.18-27

Comment 1 Arjan van de Ven 2003-04-19 19:08:30 UTC
"Tested with the stock 7.3 kernel 2.4.18-3 and the latest errata 2.4.18-27"

those kernels absolutely ship without ACPI enabled for stability reasons, so I
don't see how this can happen.

Comment 2 Dan Eaton 2003-04-20 19:53:02 UTC
You're right.  I turned ACPI on myself.  Sorry for the waste of time.  Shoulda
been reported directly to ACPI project.  I now see why RH configures it off by
default.

Comment 3 Alan Cox 2003-06-08 20:28:58 UTC
BTW the current ACPI code is way better so if you need ACPI you might want to
give it a spin again someday



Note You need to log in before you can comment on or make changes to this bug.