Description of problem: This appears to be a regression caused by my patches for BZ 253416. It is so far only seen on x86_64 systems that support acpi_cpufreq. when booting the kdump kernel after a crash it panics with: ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at drivers/cpufreq/cpufreq.c:81 invalid opcode: 0000 [1] SMP last sysfs file: CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.18-62.el5 #1 RIP: 0010:[<ffffffff802062b1>] [<ffffffff802062b1>] lock_policy_rwsem_write+0x23/0x78 RSP: 0000:ffff810008c39e60 EFLAGS: 00010246 RAX: 00000000ffffffff RBX: ffffffff804326a8 RCX: 0000000000000000 RDX: ffffffff8039a080 RSI: 0000000000000282 RDI: 0000000000000001 RBP: 0000000000000001 R08: 0000000000000001 R09: ffff810008c307a0 R10: 0000000000000000 R11: 0000000000000050 R12: ffffffff80310e40 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff8039a000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000006ae6f8 CR3: 0000000001001000 CR4: 00000000000006e0 Process swapper (pid: 1, threadinfo ffff810008c38000, task ffff810008c307a0) Stack: ffffffff804326a8 ffffffff804326a8 ffffffff80310e40 ffffffff8020650f ffffffff803214a0 ffffffff801ac7a0 00000000ffffffed ffffffff802e74e8 00000000ffffffed ffffffff802055df 0000000000000000 ffffffff80402ed0 Call Trace: [<ffffffff8020650f>] cpufreq_remove_dev+0xb/0x22 [<ffffffff801ac7a0>] sysdev_driver_unregister+0x4d/0x97 [<ffffffff802055df>] cpufreq_register_driver+0x130/0x194 [<ffffffff803d5a5b>] init+0x1f9/0x2f9 [<ffffffff8005cfb1>] child_rip+0xa/0x11 [<ffffffff8016a19e>] acpi_ds_init_one_object+0x0/0x80 [<ffffffff803d5862>] init+0x0/0x2f9 [<ffffffff8005cfa7>] child_rip+0x0/0x11 Code: 0f 0b 68 2d 28 2b 80 c2 51 00 4c 63 e0 48 c7 c3 30 bc 40 80 RIP [<ffffffff802062b1>] lock_policy_rwsem_write+0x23/0x78 RSP <ffff810008c39e60> <0>Kernel panic - not syncing: Fatal exception Version-Release number of selected component (if applicable): kernel-2.6.18-62.el5 How reproducible: Steps to Reproduce: 1. ensure system supports cpufreq (service cpuspeed start) 2. configure kdump 3. force crash Actual results: Expected results: Additional info:
Turns out this is specific to booting with maxcpus=1 (which also happens with kdump). Panic is from this bit of code: int lock_policy_rwsem_##mode \ (int cpu) \ { \ int policy_cpu = per_cpu(policy_cpu, cpu); \ BUG_ON(policy_cpu == -1); \ down_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu)); \ if (unlikely(!cpu_online(cpu))) { \ up_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu)); \ return -1; \ } \ \ return 0; \ } Since we have cpus that are not online..... I will look at if/how this was fixed upstream. - Doug
FYI: I tested both 2.6.18-60.el5 and 2.6.18-61.el5 on dhcp83-53 and verified that the same panic as above did occur. In my initial attempts, I was booting with "rhgb", and even though I clicked the "show details", the system just seem to freeze, and no panic info was shown. After removing the "rhgb", I get the same panic data shown above with -61.el5.
It turns out this panic only happens with it tries to use the "centrino" cpuspeed driver which is only available on x86 (which explains why I never saw this on ia64). Evidently that driver doesn't initialize things properly when maxcpus=1. Looking upstream it appears CONFIG_X86_SPEEDSTEP_CENTRINO is depricated but I cannot find much discussion as to why. Fedora 8 no longer uses it which explains why I could not reproduce the panic there.
found the fix for this upstream. I will post a RHEL5.1 patch tomorrow: commit ec28297a562f2b022115b9eb82e4ea724d996240 Author: Venki Pallipadi <venkatesh.pallipadi> Date: Mon Mar 26 12:03:19 2007 -0700 [PATCH] Fix maxcpus=1 trigerring BUG() in cpufreq Ingo reported it on lkml in the thread "2.6.21-rc5: maxcpus=1 crash in cpufreq: kernel BUG at drivers/cpufreq/cpufreq.c:82!" This check added to remove_dev is symmetric to one in add_dev and handles callbacks for offline cpus cleanly. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi> Acked-by: Ingo Molnar <mingo> Signed-off-by: Linus Torvalds <torvalds>
Created attachment 291398 [details] patch from upstream This fixes the panic I saw. Dave is trying it on his system also. I will post to rhkernel-list this afternoon assuming Dave's test goes OK.
(In reply to comment #5) > Created an attachment (id=291398) [edit] > patch from upstream > > This fixes the panic I saw. Dave is trying it on his system also. I will post > to rhkernel-list this afternoon assuming Dave's test goes OK. > Applied the patch above to kernel 2.6.18-65.el5, and kdump now works OK. ACK!
patch posted to rhkernel-list for review.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
I have still seen it with kernel 2.6.18-68.el5 Red Hat Enterprise Linux Server release 5.1 (Tikanga) Kernel 2.6.18-68.el5 on an x86_64 dell-pesc430-03.rhts.boston.redhat.com login: root Password: Login incorrect login: SysRq : Trigger a crashdump Memory for crash kernel (0x0 to 0x0) notwithin permissible range PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved PCI: Not using MMCONFIG. ACPI: Getting cpuindex for acpiid 0x3 ACPI: Getting cpuindex for acpiid 0x4 Mounting proc filesystem Mounting sysfs filesystem Creating /dev Creating initial device nodes Loading scsi_mod.ko module Loading sd_mod.ko module Loading libata.ko module Loading ata_piix.ko module Loading jbd.ko module Loading ext3.ko module Creating Block Devices Creating block device ram0 Creating block device ram1 Creating block device ram10 Creating block device ram11 Creating block device ram12 Creating block device ram13 Creating block device ram14 Creating block device ram15 Creating block device ram2 Creating block device ram3 Creating block device ram4 Creating block device ram5 Creating block device ram6 Creating block device ram7 Creating block device ram8 Creating block device ram9 Creating block device sda Attempting to enter user-space to capture vmcore Creating root device. Checking root filesystem. fsck 1.38 (30-Jun-2005) fsck: WARNING: couldn't open /etc/fstab: No such file or directory e2fsck 1.38 (30-Jun-2005) fsck.ext3: while determining whether /dev/sda1 is mounted. /: recovering journal /: clean, 137938/5124480 files, 1211991/5120710 blocks Mounting root filesystem. Trying mount -t ext3 /dev/sda1 /sysroot Using ext3 on root filesystem Switching to new root and running init. INIT: version 2.86 booting Welcome to Red Hat Enterprise Linux Server Press 'I' to enter interactive startup. Setting clock (utc): Wed Jan 16 00:57:30 EST 2008 [ OK ] Starting udev: [ OK ] Loading default keymap (us): [ OK ] Setting hostname localhost.localdomain: [ OK ] No devices found Setting up Logical Volume Management: No volume groups found [ OK ] Remounting root filesystem in read-write mode: [ OK ] Mounting local filesystems: [ OK ] Enabling local filesystem quotas: [ OK ] Enabling /etc/fstab swaps: [ OK ] INIT: Entering runlevel: 3 Entering non-interactive startup Applying Intel CPU microcode update: [ OK ] Checking for hardware changes [ OK ] ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at drivers/cpufreq/cpufreq.c:81 invalid opcode: 0000 [1] SMP last sysfs file: /devices/pnp0/00:00/id CPU 0 Modules linked in: acpi_cpufreq dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev floppy sg tg3 ide_cd i3000_edac uhci_hcd i2c_i801 edac_mc i2c_core ehci_hcd shpchp pcspkr cdrom serio_raw ext3 jbd ata_piix libata sd_mod scsi_mod Pid: 2123, comm: modprobe Not tainted 2.6.18-68.el5 #1 RIP: 0010:[<ffffffff80206b5c>] [<ffffffff80206b5c>] lock_policy_rwsem_write+0x23/0x78 RSP: 0000:ffff810006b23df8 EFLAGS: 00010246 RAX: 00000000ffffffff RBX: ffffffff804346a8 RCX: 0000000000000000 RDX: ffffffff8039c080 RSI: 0000000000000292 RDI: 0000000000000001 RBP: 0000000000000001 R08: ffffffff802a32a6 R09: 0000000000000000 R10: ffffffff80420260 R11: 0000000000000000 R12: ffffffff80312e40 R13: ffff810007069dc0 R14: ffffffff882a3200 R15: ffffc200000f9710 FS: 00002aaaaaac5240(0000) GS:ffffffff8039c000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000555569afe908 CR3: 0000000006a24000 CR4: 00000000000006e0 Process modprobe (pid: 2123, threadinfo ffff810006b22000, task ffff810008435860) Stack: ffffffff804346a8 ffffffff804346a8 ffffffff80312e40 ffffffff80206dba ffffffff803234a0 ffffffff801ad065 00000000ffffffed ffffffff882a3168 00000000ffffffed ffffffff80205e8a ffffffff882a3200 ffff810007069d58 Call Trace: [<ffffffff80206dba>] cpufreq_remove_dev+0xb/0x22 [<ffffffff801ad065>] sysdev_driver_unregister+0x4d/0x97 [<ffffffff80205e8a>] cpufreq_register_driver+0x130/0x194 [<ffffffff800a3500>] sys_init_module+0x16a6/0x1857 [<ffffffff8005b116>] system_call+0x7e/0x83 Code: 0f 0b 68 c6 3d 2b 80 c2 51 00 4c 63 e0 48 c7 c3 30 dc 40 80 RIP [<ffffffff80206b5c>] lock_policy_rwsem_write+0x23/0x78 RSP <ffff810006b23df8> <0>Kernel panic - not syncing: Fatal exception
Sorry. I supposed the patch has not included yet.
(In reply to comment #10) > Sorry. I supposed the patch has not included yet. Correct, Don will change the state of this BZ to "MODIFIED" when a kernel has been built with this patch.
QE ack for RHEL 5.2
in 2.6.18-71.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html