Description of problem: Customer reported a panic with the following oops: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff8000b4e4>] kfree+0x5e/0x1d8 PGD 207f06c067 PUD 207e6ad067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/local_cpus CPU 2 Modules linked in: nfs fscache nfs_acl vxodm(PFU) vxgms(PU) vxglm(PU) vxfen(PU) gab(PU) llt(PU) autofs4 dmpaa(PU) vxspec(PFU) vxio(PFU) vxdmp(PU) lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table bonding ipv6 xfrm_nalgo crypto_api vxportal(PFU) fdd(PFU) vxfs(PU) dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sg i2c_i801 cdc_ether i2c_core usbnet bnx2 pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod lpfc scsi_transport_fc usb_storage shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 10208, comm: hald-addon-acpi Tainted: PF 2.6.18-194.11.4.el5 #1 RIP: 0010:[<ffffffff8000b4e4>] [<ffffffff8000b4e4>] kfree+0x5e/0x1d8 RSP: 0018:ffff81107dc79e18 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffffffff8032ee00 RCX: 00000000000fdff0 RDX: 0000000000000000 RSI: 000001bc7e40b210 RDI: 00000000000007ef RBP: 0000000000000246 R08: ffff81107dc78000 R09: 000000000000002d R10: ffff810009016410 R11: 7063612f73656369 R12: ffff81107dc79ea8 R13: ffffffff8032ee00 R14: 0000000000000286 R15: 000000000000000a FS: 00002b5546a1f2d0(0000) GS:ffff8110800963c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000207e018000 CR4: 00000000000006e0 Process hald-addon-acpi (pid: 10208, threadinfo ffff81107dc78000, task ffff81107d014040) Stack: ffff81107dc79e58 0000000000000000 ffffffff8032ee00 0000000000000246 ffff81107dc79ea8 00002b5546a14000 000000000ff85730 ffffffff80196479 0000000000000000 ffff81107d014040 ffffffff8008cf9d 0000000000100100 Call Trace: [<ffffffff80196479>] acpi_bus_receive_event+0xfe/0x121 [<ffffffff8008cf9d>] default_wake_function+0x0/0xe [<ffffffff801a0431>] acpi_system_read_event+0x53/0xe6 [<ffffffff8000b729>] vfs_read+0xcb/0x171 [<ffffffff80011c3b>] sys_read+0x45/0x6e [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: 48 8b 00 48 83 e0 fc 48 8d 14 30 48 8b 02 25 00 40 02 00 48 RIP [<ffffffff8000b4e4>] kfree+0x5e/0x1d8 RSP <ffff81107dc79e18> Version-Release number of selected component (if applicable): kernel-2.6.18-194.11.4.el5 How reproducible: Unknown.
vmcore is available from megatron.rdu.redhat.com:/cores/20110114003429/work crash> dis -r ffffffff8000b4e4 0xffffffff8000b486 <kfree>: push %r14 0xffffffff8000b488 <kfree+2>: push %r13 0xffffffff8000b48a <kfree+4>: mov %rdi,%r13 0xffffffff8000b48d <kfree+7>: push %r12 0xffffffff8000b48f <kfree+9>: push %rbp 0xffffffff8000b490 <kfree+10>: push %rbx 0xffffffff8000b491 <kfree+11>: sub $0x10,%rsp 0xffffffff8000b495 <kfree+15>: test %rdi,%rdi 0xffffffff8000b498 <kfree+18>: je 0xffffffff8000b652 <kfree+460> 0xffffffff8000b49e <kfree+24>: pushfq 0xffffffff8000b49f <kfree+25>: pop %r14 0xffffffff8000b4a1 <kfree+27>: cli 0xffffffff8000b4a2 <kfree+28>: mov $0x7f0000000000,%rax 0xffffffff8000b4ac <kfree+38>: lea (%rdi,%rax,1),%rax 0xffffffff8000b4b0 <kfree+42>: mov %rax,%rdi 0xffffffff8000b4b3 <kfree+45>: mov %rax,%rsi 0xffffffff8000b4b6 <kfree+48>: mov %rax,%rcx 0xffffffff8000b4b9 <kfree+51>: shr $0x24,%rdi 0xffffffff8000b4bd <kfree+55>: shr $0xc,%rsi 0xffffffff8000b4c1 <kfree+59>: shr $0x1b,%rcx 0xffffffff8000b4c5 <kfree+63>: mov -0x7fb4dc80(,%rdi,8),%rdx 0xffffffff8000b4cd <kfree+71>: xor %eax,%eax 0xffffffff8000b4cf <kfree+73>: test %rdx,%rdx 0xffffffff8000b4d2 <kfree+76>: je 0xffffffff8000b4e0 <kfree+90> 0xffffffff8000b4d4 <kfree+78>: mov %rcx,%rax 0xffffffff8000b4d7 <kfree+81>: and $0x1ff,%eax 0xffffffff8000b4dc <kfree+86>: lea (%rdx,%rax,8),%rax 0xffffffff8000b4e0 <kfree+90>: imul $0x38,%rsi,%rsi 0xffffffff8000b4e4 <kfree+94>: mov (%rax),%rax argument to kfree() is saved in %r13 crash> kmem ffffffff8032ee00 ffffffff8032ee00 (D) acpi_bus_event_list we are trying to free the head of the acpi_bus_event_list list which is not a dynamically allocated object. This looks like a race condition checking/accessing the acpi_bus_event_list. int acpi_bus_receive_event(struct acpi_bus_event *event) { unsigned long flags = 0; struct acpi_bus_event *entry = NULL; DECLARE_WAITQUEUE(wait, current); if (!event) return -EINVAL; if (list_empty(&acpi_bus_event_list)) { // if the list is empty put ourselves to sleep set_current_state(TASK_INTERRUPTIBLE); add_wait_queue(&acpi_bus_event_queue, &wait); if (list_empty(&acpi_bus_event_list)) // check one more time before scheduling out schedule(); remove_wait_queue(&acpi_bus_event_queue, &wait); set_current_state(TASK_RUNNING); if (signal_pending(current)) return -ERESTARTSYS; } // the above list empty checks are done without acquiring the list lock // so by the time we get here the list may have been emptied by another // thread and if so ... spin_lock_irqsave(&acpi_bus_event_lock, flags); entry = list_entry(acpi_bus_event_list.next, struct acpi_bus_event, node); // acpi_bus_event_list.next will point to itself (the head) if (entry) list_del(&entry->node); // removed the head from the list spin_unlock_irqrestore(&acpi_bus_event_lock, flags); if (!entry) return -ENODEV; memcpy(event, entry, sizeof(struct acpi_bus_event)); kfree(entry); // and we try to free the list head return 0; } So we probably just need another list empty check that's done inside the list lock and either return early or put ourselves back to sleep. Upstream commit that should fix this problem: commit f0a37e008750ead1751b7d5e89d220a260a46147 Author: Chuck Ebbert <cebbert> Date: Tue Apr 15 14:34:47 2008 -0700 acpi: bus: check once more for an empty list after locking it List could have become empty after the unlocked check that was made earlier, so check again inside the lock. Should fix https://bugzilla.redhat.com/show_bug.cgi?id=427765 Signed-off-by: Chuck Ebbert <cebbert> Cc: <stable> Cc: Len Brown <lenb> Signed-off-by: Andrew Morton <akpm> Signed-off-by: Linus Torvalds <torvalds> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c index 5b6760e..2d1955c 100644 --- a/drivers/acpi/bus.c +++ b/drivers/acpi/bus.c @@ -373,10 +373,11 @@ int acpi_bus_receive_event(struct acpi_bus_event *event) } spin_lock_irqsave(&acpi_bus_event_lock, flags); - entry = - list_entry(acpi_bus_event_list.next, struct acpi_bus_event, node); - if (entry) + if (!list_empty(&acpi_bus_event_list)) { + entry = list_entry(acpi_bus_event_list.next, + struct acpi_bus_event, node); list_del(&entry->node); + } spin_unlock_irqrestore(&acpi_bus_event_lock, flags); if (!entry)
Created attachment 473965 [details] Patch to check for empty list while guarded by acpi bus event list lock
in kernel-2.6.18-241.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Prior to this update, kernel panic occurred in the kfree() due to a race condition in the acpi_bus_receive_event() function. The acpi_bus_receive_event() function left the acpi_bus_event_list list attribute unlocked between checking it whether it was empty and calling the kfree() function on it. With this update, a check was added after the lock has been lifted in order to prevent the race and the calling of the kfree() function on an empty list.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html