Bug 472844 - kernel panic when modprobe -r acpi_cpufreq on centrino platform with kernel newer than 2.6.18-118
kernel panic when modprobe -r acpi_cpufreq on centrino platform with kernel n...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
i386 Linux
medium Severity high
: rc
: ---
Assigned To: Prarit Bhargava
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-24 20:17 EST by Zhang Kexin
Modified: 2009-01-20 15:18 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 15:18:59 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg (30.00 KB, application/octet-stream)
2008-11-24 22:25 EST, Zhang Kexin
no flags Details
cpuinfo (2.32 KB, application/octet-stream)
2008-11-24 22:27 EST, Zhang Kexin
no flags Details
content of "cat sys/devices/system/cpu/cpu*/cpufreq/*" (916 bytes, application/octet-stream)
2008-11-24 22:30 EST, Zhang Kexin
no flags Details
RHEL5 fix for this issue (868 bytes, patch)
2008-12-01 09:47 EST, Prarit Bhargava
no flags Details | Diff

  None (edit)
Description Zhang Kexin 2008-11-24 20:17:49 EST
Description of problem:


Version-Release number of selected component (if applicable):
2.6.18-119.el5PAE

How reproducible:
always

Steps to Reproduce:
1.modprobe -r acpi_cpufreq
2.
3.
  
Actual results:
kernel panic

Expected results:
module is unloaded correctly or report it can not be unload because it's in use

Additional info:
the machine is a centrino machine, for kernel version equal to or older than 2.6.18-118, speedstep-centrino is used and it is compiled in kernel. so acpi_cpufreq is not used on these kernels, if I issued "modprobe acpi_cpufreq" deliberately, it gives "FATAL: Error inserting acpi_cpufreq (/lib/modules/2.6.18-119.el5PAE/kernel/arch/i386/kernel/cpu/cpufreq/acpi_cpufreq.ko): Device or resource busy"

while on kernel newer than 2.6.18-118, the acpi_cpufreq driver has some updates to support most centrino systems, and centrino driver is compiled as module, so be default acpi_cpufreq is used as the cpufreq driver. see bz  449787
Comment 1 Zhang Kexin 2008-11-24 20:26:32 EST
the machine I used is a rhts machine named hp-xw4800-01.rhts.bos.redhat.com. kernel 2.6.18-124 also has the same problem.
Comment 2 Zhang Kexin 2008-11-24 20:27:22 EST
panic info:

[root@hp-xw4800-01 ~]# BUG: unable to handle kernel paging request at virtual address 076129ff
 printing eip:
c046d45c
*pde = 00000000
Oops: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:3f/0000:3f:00.0/irq
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq dm_multipath scsi_dh video backlight sbs i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport floppy sr_mod cdrom snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device tg3 libphy snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep snd sg soundcore pcspkr serio_raw dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    3
EIP:    0060:[<c046d45c>]    Not tainted VLI
EFLAGS: 00010297   (2.6.18-119.el5PAE #1) 
EIP is at free_percpu+0x12/0x36
eax: 00000000   ebx: 00000000   ecx: 00000202   edx: 00000000
esi: 076129ff   edi: 00000000   ebp: f6856000   esp: f6856f54
ds: 007b   es: 007b   ss: 0068
Process modprobe (pid: 3373, ti=f6856000 task=f6ec4000 task.ti=f6856000)
Stack: f89ec380 00000020 c043d86f 69706361 7570635f 71657266 c0448000 00000000 
       080f10e0 00000081 40000003 f6ec4000 f6eed800 492b4e8a 1d173065 f6856fbc 
       00000000 000f10e0 f89ec380 00000880 f6856fa8 00000000 080f10e0 00000000 
Call Trace:
 [<c043d86f>] sys_delete_module+0x192/0x1bb
 [<c0448000>] audit_syscall_entry+0xb4/0x17d
 [<c0404f17>] syscall_call+0x7/0xb
 =======================
Code: 0b 89 c8 89 da e8 01 ff ff ff 8b 03 89 7c 83 14 40 89 03 56 9d 5b 5e 5f c3 56 89 c6 53 b8 40 b5 77 c0 f7 d6 e8 c2 78 07 00 eb 14 <8b> 04 9e e8 7a ff ff ff ba 40 b5 77 c0 89 d8 e8 c6 78 07 00 83 
EIP: [<c046d45c>] free_percpu+0x12/0x36 SS:ESP 0068:f6856f54
 <0>Kernel panic - not syncing: Fatal exception
Comment 3 Zhang Kexin 2008-11-24 20:29:47 EST
[root@hp-xw4800-01 ~]# lsmod | grep acpi
acpi_cpufreq           14025  0
Comment 4 Zhang Kexin 2008-11-24 22:25:21 EST
Created attachment 324558 [details]
dmesg
Comment 5 Zhang Kexin 2008-11-24 22:27:05 EST
Created attachment 324559 [details]
cpuinfo
Comment 6 Zhang Kexin 2008-11-24 22:30:10 EST
Created attachment 324560 [details]
content of "cat sys/devices/system/cpu/cpu*/cpufreq/*"
Comment 7 Song, Youquan 2008-11-24 23:14:46 EST
Can you reproduce it on x86_64 version? 

It is weird because acpi_cpufreq reference number shoudld not be "0".

[root@hp-xw4800-01 ~]# lsmod | grep acpi
acpi_cpufreq           14025  0
Comment 8 Zhang Kexin 2008-11-24 23:51:19 EST
yes, kernel panic also happen on x86_64 for the same machine.

[root@hp-xw4800-01 ~]# lsmod | grep acpi
acpi_cpufreq           14025  0
Comment 9 Zhang Kexin 2008-11-24 23:56:43 EST
but on another machine intel-s3ea2-03.rhts.bos.redhat.com which has the same cpu family and model as hp-xw4800-01.rhts.bos.redhat.com.

cpu family      : 6
model           : 26

it does not have the problem, and 

[root@hp-xw4800-01 ~]# lsmod | grep acpi
acpi_cpufreq           14025  1
Comment 10 Song, Youquan 2008-11-25 01:19:01 EST
From the dmesg, hp-xw4800-01.rhts.bos.redhat.com have some ACPI issue. 
You can update the new BIOS.   

From the "acpi_cpufreq           14025  0", the cpufreq subsystem is not correctly established.  
You can try to run /etc/init.d/cpuspeed stop, then modprobe -r cpufreq_ondemand governors etc., then modprobe acpi-cpufreq etc. 

ACPI Error (evgpe-0711): No handler or method for GPE[ 0], disabling event [20060707]
ACPI Error (evgpe-0711): No handler or method for GPE[ 1], disabling event [20060707]
ACPI Error (evgpe-0711): No handler or method for GPE[ 2], disabling event [20060707]
ACPI Error (evgpe-0711): No handler or method for GPE[ 6], disabling event [20060707]
ACPI Error (evgpe-0711): No handler or method for GPE[ 7], disabling event [20060707]
ACPI Error (evgpe-0711): No handler or method for GPE[ A], disabling event [20060707]
ACPI Error (evgpe-0711): No handler or method for GPE[ F], disabling event [20060707]
Comment 11 Zhang Kexin 2008-11-25 04:38:03 EST
thanks.
Has filed a ticket asking to update the BIOS.
On old bios, did following: 
[root@hp-xw4800-01 ~]# /etc/init.d/cpuspeed stop
[root@hp-xw4800-01 ~]# modprobe -r cpufreq_ondemand
[root@hp-xw4800-01 ~]# modprobe -r cpufreq_powersave
[root@hp-xw4800-01 ~]# modprobe -r cpufreq_conservative
[root@hp-xw4800-01 ~]# modprobe -r acpi_cpufreq
still kernel panic.
Comment 12 Prarit Bhargava 2008-11-25 14:54:17 EST
(In reply to comment #10)
> From the dmesg, hp-xw4800-01.rhts.bos.redhat.com have some ACPI issue. 
> You can update the new BIOS.   

<snip>

> ACPI Error (evgpe-0711): No handler or method for GPE[ 0], disabling event
> [20060707]
> ACPI Error (evgpe-0711): No handler or method for GPE[ 1], disabling event
> [20060707]
> ACPI Error (evgpe-0711): No handler or method for GPE[ 2], disabling event
> [20060707]
> ACPI Error (evgpe-0711): No handler or method for GPE[ 6], disabling event
> [20060707]
> ACPI Error (evgpe-0711): No handler or method for GPE[ 7], disabling event
> [20060707]
> ACPI Error (evgpe-0711): No handler or method for GPE[ A], disabling event
> [20060707]
> ACPI Error (evgpe-0711): No handler or method for GPE[ F], disabling event
> [20060707]

This is a known issue on the HP xw series systems.  A BIOS update currently will not resolve the GPE register errors.

A WAR for the GPE error has been submitted to RHKL and will ship in 5.4.
Comment 13 Prarit Bhargava 2008-11-26 09:15:01 EST
Tracked this down to RHEL5 commit e12bfee9772aa06e05e40d08918ca838e21db7f0.

Must be some weird module ordering issue?

P.
Comment 14 Song, Youquan 2008-12-01 03:30:46 EST
What's the status of this bug?
Can you reproduce it when build the upstream kernel such as 2.6.28-rc4 etc?
Comment 15 Prarit Bhargava 2008-12-01 06:09:30 EST
Hi Youquan, I'm working this issue right now -- it does not happen upstream AFAICT.

P.
Comment 16 Prarit Bhargava 2008-12-01 09:47:14 EST
Created attachment 325237 [details]
RHEL5 fix for this issue
Comment 19 Don Zickus 2008-12-09 16:05:29 EST
in kernel-2.6.18-126.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 23 errata-xmlrpc 2009-01-20 15:18:59 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.