Bug 428331
| Summary: | booting with maxcpus=1 panics when starting cpufreq service | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Doug Chapman <dchapman> | ||||
| Component: | kernel | Assignee: | Doug Chapman <dchapman> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 5.2 | CC: | anderson, bmaly, jplans, qcai, tao | ||||
| Target Milestone: | rc | Keywords: | ZStream | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | GSSApproved | ||||||
| Fixed In Version: | RHBA-2008-0314 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2008-05-21 15:06:13 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 429516 | ||||||
| Attachments: |
|
||||||
|
Description
Doug Chapman
2008-01-10 21:16:18 UTC
Turns out this is specific to booting with maxcpus=1 (which also happens with
kdump). Panic is from this bit of code:
int lock_policy_rwsem_##mode \
(int cpu) \
{ \
int policy_cpu = per_cpu(policy_cpu, cpu); \
BUG_ON(policy_cpu == -1); \
down_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu)); \
if (unlikely(!cpu_online(cpu))) { \
up_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu)); \
return -1; \
} \
\
return 0; \
}
Since we have cpus that are not online.....
I will look at if/how this was fixed upstream.
- Doug
FYI: I tested both 2.6.18-60.el5 and 2.6.18-61.el5 on dhcp83-53 and verified that the same panic as above did occur. In my initial attempts, I was booting with "rhgb", and even though I clicked the "show details", the system just seem to freeze, and no panic info was shown. After removing the "rhgb", I get the same panic data shown above with -61.el5. It turns out this panic only happens with it tries to use the "centrino" cpuspeed driver which is only available on x86 (which explains why I never saw this on ia64). Evidently that driver doesn't initialize things properly when maxcpus=1. Looking upstream it appears CONFIG_X86_SPEEDSTEP_CENTRINO is depricated but I cannot find much discussion as to why. Fedora 8 no longer uses it which explains why I could not reproduce the panic there. found the fix for this upstream. I will post a RHEL5.1 patch tomorrow:
commit ec28297a562f2b022115b9eb82e4ea724d996240
Author: Venki Pallipadi <venkatesh.pallipadi>
Date: Mon Mar 26 12:03:19 2007 -0700
[PATCH] Fix maxcpus=1 trigerring BUG() in cpufreq
Ingo reported it on lkml in the thread
"2.6.21-rc5: maxcpus=1 crash in cpufreq: kernel BUG at
drivers/cpufreq/cpufreq.c:82!"
This check added to remove_dev is symmetric to one in add_dev and handles
callbacks for offline cpus cleanly.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi>
Acked-by: Ingo Molnar <mingo>
Signed-off-by: Linus Torvalds <torvalds>
Created attachment 291398 [details]
patch from upstream
This fixes the panic I saw. Dave is trying it on his system also. I will post
to rhkernel-list this afternoon assuming Dave's test goes OK.
(In reply to comment #5) > Created an attachment (id=291398) [edit] > patch from upstream > > This fixes the panic I saw. Dave is trying it on his system also. I will post > to rhkernel-list this afternoon assuming Dave's test goes OK. > Applied the patch above to kernel 2.6.18-65.el5, and kdump now works OK. ACK! patch posted to rhkernel-list for review. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. I have still seen it with kernel 2.6.18-68.el5
Red Hat Enterprise Linux Server release 5.1 (Tikanga)
Kernel 2.6.18-68.el5 on an x86_64
dell-pesc430-03.rhts.boston.redhat.com login: root
Password:
Login incorrect
login: SysRq : Trigger a crashdump
Memory for crash kernel (0x0 to 0x0) notwithin permissible range
PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
PCI: Not using MMCONFIG.
ACPI: Getting cpuindex for acpiid 0x3
ACPI: Getting cpuindex for acpiid 0x4
Mounting proc filesystem
Mounting sysfs filesystem
Creating /dev
Creating initial device nodes
Loading scsi_mod.ko module
Loading sd_mod.ko module
Loading libata.ko module
Loading ata_piix.ko module
Loading jbd.ko module
Loading ext3.ko module
Creating Block Devices
Creating block device ram0
Creating block device ram1
Creating block device ram10
Creating block device ram11
Creating block device ram12
Creating block device ram13
Creating block device ram14
Creating block device ram15
Creating block device ram2
Creating block device ram3
Creating block device ram4
Creating block device ram5
Creating block device ram6
Creating block device ram7
Creating block device ram8
Creating block device ram9
Creating block device sda
Attempting to enter user-space to capture vmcore
Creating root device.
Checking root filesystem.
fsck 1.38 (30-Jun-2005)
fsck: WARNING: couldn't open /etc/fstab: No such file or directory
e2fsck 1.38 (30-Jun-2005)
fsck.ext3: while determining whether /dev/sda1 is mounted.
/: recovering journal
/: clean, 137938/5124480 files, 1211991/5120710 blocks
Mounting root filesystem.
Trying mount -t ext3 /dev/sda1 /sysroot
Using ext3 on root filesystem
Switching to new root and running init.
INIT: version 2.86 booting
Welcome to Red Hat Enterprise Linux Server
Press 'I' to enter interactive startup.
Setting clock (utc): Wed Jan 16 00:57:30 EST 2008 [ OK ]
Starting udev: [ OK ]
Loading default keymap (us): [ OK ]
Setting hostname localhost.localdomain: [ OK ]
No devices found
Setting up Logical Volume Management: No volume groups found
[ OK ]
Remounting root filesystem in read-write mode: [ OK ]
Mounting local filesystems: [ OK ]
Enabling local filesystem quotas: [ OK ]
Enabling /etc/fstab swaps: [ OK ]
INIT: Entering runlevel: 3
Entering non-interactive startup
Applying Intel CPU microcode update: [ OK ]
Checking for hardware changes [ OK ]
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at drivers/cpufreq/cpufreq.c:81
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/pnp0/00:00/id
CPU 0
Modules linked in: acpi_cpufreq dm_mirror dm_multipath dm_mod video sbs
backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp
parport joydev floppy sg tg3 ide_cd i3000_edac uhci_hcd i2c_i801 edac_mc
i2c_core ehci_hcd shpchp pcspkr cdrom serio_raw ext3 jbd ata_piix libata sd_mod
scsi_mod
Pid: 2123, comm: modprobe Not tainted 2.6.18-68.el5 #1
RIP: 0010:[<ffffffff80206b5c>] [<ffffffff80206b5c>]
lock_policy_rwsem_write+0x23/0x78
RSP: 0000:ffff810006b23df8 EFLAGS: 00010246
RAX: 00000000ffffffff RBX: ffffffff804346a8 RCX: 0000000000000000
RDX: ffffffff8039c080 RSI: 0000000000000292 RDI: 0000000000000001
RBP: 0000000000000001 R08: ffffffff802a32a6 R09: 0000000000000000
R10: ffffffff80420260 R11: 0000000000000000 R12: ffffffff80312e40
R13: ffff810007069dc0 R14: ffffffff882a3200 R15: ffffc200000f9710
FS: 00002aaaaaac5240(0000) GS:ffffffff8039c000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000555569afe908 CR3: 0000000006a24000 CR4: 00000000000006e0
Process modprobe (pid: 2123, threadinfo ffff810006b22000, task ffff810008435860)
Stack: ffffffff804346a8 ffffffff804346a8 ffffffff80312e40 ffffffff80206dba
ffffffff803234a0 ffffffff801ad065 00000000ffffffed ffffffff882a3168
00000000ffffffed ffffffff80205e8a ffffffff882a3200 ffff810007069d58
Call Trace:
[<ffffffff80206dba>] cpufreq_remove_dev+0xb/0x22
[<ffffffff801ad065>] sysdev_driver_unregister+0x4d/0x97
[<ffffffff80205e8a>] cpufreq_register_driver+0x130/0x194
[<ffffffff800a3500>] sys_init_module+0x16a6/0x1857
[<ffffffff8005b116>] system_call+0x7e/0x83
Code: 0f 0b 68 c6 3d 2b 80 c2 51 00 4c 63 e0 48 c7 c3 30 dc 40 80
RIP [<ffffffff80206b5c>] lock_policy_rwsem_write+0x23/0x78
RSP <ffff810006b23df8>
<0>Kernel panic - not syncing: Fatal exception
Sorry. I supposed the patch has not included yet. (In reply to comment #10) > Sorry. I supposed the patch has not included yet. Correct, Don will change the state of this BZ to "MODIFIED" when a kernel has been built with this patch. QE ack for RHEL 5.2 in 2.6.18-71.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html |