Bug 462648 - cpuspeed segv at boot with lockup sometimes
cpuspeed segv at boot with lockup sometimes
Status: CLOSED DUPLICATE of bug 470551
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
9
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-17 15:38 EDT by Tom Mitchell
Modified: 2008-12-09 15:39 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-12-09 15:39:41 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Tom Mitchell 2008-09-17 15:38:54 EDT
Description of problem:
   cpuspeed segv at boot with a system lockup sometimes

Most commonly I see this on a reboot.  Sometimes 
I do see this on a power on boot.  

The messages are lost... the system does not boot
far enough to log the error.

I suspect that this is a BIOS state hand off issue
so the below info might be needed by HP BIOS folk..

Version-Release number of selected component (if applicable):

$ uname -r
2.6.26.3-29.fc9.x86_64
$ rpm -qa | grep cpuspeed
cpuspeed-1.2.1-5.fc9.x86_64
$ cat /proc/cpuinfo
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 17
model		: 3
model name	: AMD Turion Dual-Core RM-70
stepping	: 1
cpu MHz		: 500.000
....
# dmidecode 2.9
SMBIOS 2.4 present.
18 structures occupying 721 bytes.
Table at 0x000DC010.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
        Vendor: Hewlett-Packard
        Version: F.07
        Release Date: 06/26/2008
        Address: 0xE65D0
        Runtime Size: 105008 bytes
        ROM Size: 1024 kB
        Characteristics:
                ISA is supported
                PCI is supported
                PC Card (PCMCIA) is supported
                PNP is supported
                APM is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                ESCD support is available
                Boot from CD is supported
                ACPI is supported
                USB legacy is supported
                AGP is supported
                BIOS boot specification is supported
                Targeted content distribution is supported
        BIOS Revision: 15.7


....
System Information
        Manufacturer: Hewlett-Packard
        Product Name: Compaq Presario CQ50 Notebook PC
        Version: F.07
        Serial Number: xyz
        UUID: xyxxy
        Wake-up Type: Power Switch
        SKU Number: FE869UA#ABA
        Family: 103C_5335KV
....
Processor Information
        Socket Designation: Socket A
        Type: Central Processor
        Family: Opteron
        Manufacturer: AMD
        ID: 31 0F 20 00 FF FB 8B 17
        Signature: Family 17, Model 3, Stepping 1
        Flags:
                FPU (Floating-point unit on-chip)
                VME (Virtual mode extension)
                DE (Debugging extension)
                PSE (Page size extension)
                TSC (Time stamp counter)
                MSR (Model specific registers)
                PAE (Physical address extension)
                MCE (Machine check exception)
                CX8 (CMPXCHG8 instruction supported)
                APIC (On-chip APIC hardware supported)
                SEP (Fast system call)
                MTRR (Memory type range registers)
                PGE (Page global enable)
                MCA (Machine check architecture)
                CMOV (Conditional move instruction supported)
                PAT (Page attribute table)
                PSE-36 (36-bit page size extension)
                CLFSH (CLFLUSH instruction supported)
                MMX (MMX technology supported)
                FXSR (Fast floating-point save and restore)
                SSE (Streaming SIMD extensions)
                SSE2 (Streaming SIMD extensions 2)
                HTT (Hyper-threading technology)
        Version: AMD Turion Dual-Core RM-70
        Voltage: 1.6 V
        External Clock: 133 MHz
        Max Speed: 2000 MHz
        Current Speed: 2000 MHz
        Status: Populated, Enabled
        Upgrade: None
        L1 Cache Handle: 0x0005
        L2 Cache Handle: 0x0006
        L3 Cache Handle: Not Provided
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: Not Specified






How reproducible:
  about 20% of the time.


Steps to Reproduce:
1. load f9 x86_68 and yum update
2. reboot
3.
  
Actual results:

about 1 in five reboots I see a segv from cpuspeed followed
by a hang of the box that is only cleared with a power cycle
reboot.


Expected results:
 clean boot.


Additional info:
  The only non Fedora kernel bits are associated with WiFI.
  MADWIFI: Multimode Atheros Driver for WiFi
  madwifi-hal-0.10.5.6-r3861-20080903.tar.gz
  This driver is not active at this point... so while a don't care IMO
  I mention it for completeness.

Thanks,
mitch
Comment 1 Jarod Wilson 2008-09-17 16:00:22 EDT
Really need the segfault and backtrace to have any hope of doing anything about this.
Comment 2 Tom Mitchell 2008-09-17 17:04:06 EDT
Excuse the hand transcription...
The backtrace looks like:
 __blocking_notifier_call_chan
 __cpufreq-governor
 __cpu_set_policy
 cpufreq_add_dev
 ? handle_update
 sysdev_register
 cpufreq_register_driver
 :power_now_k8
 sys_init_module
 ? do_sync_read

RIP cpufreq_governor_userspace

/etc/rc5.d/cpuspeed: line 112: 1708 Segmentation fault /sbin/modprobe powernow-k8
------
If the hex  addresses will help I can try and transcribe them as well.
Comment 3 Tom Mitchell 2008-09-17 17:16:12 EDT
Running this a handful of times triggered an error that looks a bit
like the one I saw at boot...

   # /etc/rc5.d/*cpuspeed* stop; modprobe -r powernow-k8 ; /etc/rc5.d/*cpuspeed* restart


=======cut and paste from screen ====
Disabling ondemand cpu frequency scaling:                  [  OK  ]
/etc/rc5.d/S06cpuspeed: line 112:  3930 Segmentation fault      /sbin/modprobe powernow-k8 2> /dev/null

Message from syslogd@compegg at Sep 17 14:11:05 ...
 kernel:------------[ cut here ]------------

Message from syslogd@compegg at Sep 17 14:11:05 ...
 kernel:invalid opcode: 0000 [1] SMP 

Message from syslogd@compegg at Sep 17 14:11:05 ...
 kernel:Code: 36 01 00 00 31 d2 ff ce 0f 85 da 01 00 00 44 0f a3 2d 5f 56 27 00 19 c0 85 c0 ba ea ff ff ff 0f 84 c3 01 00 00 83 7f 2c 00 75 04 <0f> 0b eb fe 48 c7 c7 b0 92 3f 81 e8 f1 f4 09 00 83 3d 0b 86 44 
======
Sep 17 14:11:00 compegg cpuspeed: Disabling ondemand cpu frequency scaling governor
Sep 17 14:11:00 compegg kernel: powernow-k8: Found 1 AMD Turion Dual-Core RM-70 processors (2 cpu cores) (version 2.20.00)
Sep 17 14:11:00 compegg kernel: powernow-k8:    0 : pstate 0 (2000 MHz)
Sep 17 14:11:00 compegg kernel: powernow-k8:    1 : pstate 1 (1000 MHz)
Sep 17 14:11:00 compegg kernel: powernow-k8:    2 : pstate 2 (500 MHz)
Sep 17 14:11:00 compegg cpuspeed: Enabling ondemand cpu frequency scaling governor
Sep 17 14:11:02 compegg cpuspeed: Disabling ondemand cpu frequency scaling governor
Sep 17 14:11:02 compegg kernel: powernow-k8: Found 1 AMD Turion Dual-Core RM-70 processors (2 cpu cores) (version 2.20.00)
Sep 17 14:11:02 compegg kernel: powernow-k8:    0 : pstate 0 (2000 MHz)
Sep 17 14:11:02 compegg kernel: powernow-k8:    1 : pstate 1 (1000 MHz)
Sep 17 14:11:02 compegg kernel: powernow-k8:    2 : pstate 2 (500 MHz)
Sep 17 14:11:02 compegg cpuspeed: Enabling ondemand cpu frequency scaling governor
Sep 17 14:11:04 compegg cpuspeed: Disabling ondemand cpu frequency scaling governor
Sep 17 14:11:04 compegg kernel: powernow-k8: Found 1 AMD Turion Dual-Core RM-70 processors (2 cpu cores) (version 2.20.00)
Sep 17 14:11:04 compegg kernel: powernow-k8:    0 : pstate 0 (2000 MHz)
Sep 17 14:11:04 compegg kernel: powernow-k8:    1 : pstate 1 (1000 MHz)
Sep 17 14:11:04 compegg kernel: powernow-k8:    2 : pstate 2 (500 MHz)
Sep 17 14:11:04 compegg cpuspeed: Enabling ondemand cpu frequency scaling governor
Sep 17 14:11:05 compegg cpuspeed: Disabling ondemand cpu frequency scaling governor
Sep 17 14:11:05 compegg kernel: powernow-k8: Found 1 AMD Turion Dual-Core RM-70 processors (2 cpu cores) (version 2.20.00)
Sep 17 14:11:05 compegg kernel: powernow-k8:    0 : pstate 0 (2000 MHz)
Sep 17 14:11:05 compegg kernel: powernow-k8:    1 : pstate 1 (1000 MHz)
Sep 17 14:11:05 compegg kernel: powernow-k8:    2 : pstate 2 (500 MHz)
Sep 17 14:11:05 compegg kernel: ------------[ cut here ]------------
Sep 17 14:11:05 compegg kernel: kernel BUG at drivers/cpufreq/cpufreq_userspace.c:120!
Sep 17 14:11:05 compegg kernel: invalid opcode: 0000 [1] SMP 
Sep 17 14:11:05 compegg kernel: CPU 0 
Sep 17 14:11:05 compegg kernel: Modules linked in: powernow_k8(+) freq_table wlan_wep ipt_MASQUERADE iptable_nat nf_nat bridge bnep rfcomm l2cap bluetooth ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi fuse sunrpc ipt_REJECT nf_conntrack_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 loop dm_multipath kvm ath5k mac80211 cfg80211 wlan_scan_sta nvidia(P) ath_rate_sample i2c_core snd_hda_intel pcspkr ath_pci serio_raw joydev snd_seq_dummy wlan snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss sr_mod ath_hal(P) snd_mixer_oss cdrom shpchp snd_pcm forcedeth snd_timer sg snd_page_alloc snd_hwdep snd soundcore pata_amd battery ac video output wmi usb_storage dm_snapshot dm_zero dm_mirror dm_log dm_mod pata_acpi ata_generic ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
Sep 17 14:11:05 compegg kernel: Pid: 3930, comm: modprobe Tainted: P          2.6.26.3-29.fc9.x86_64 #1
Sep 17 14:11:05 compegg kernel: RIP: 0010:[<ffffffff811fbf4e>]  [<ffffffff811fbf4e>] cpufreq_governor_userspace+0x4d/0x217
Sep 17 14:11:05 compegg kernel: RSP: 0018:ffff81008a43fb68  EFLAGS: 00010246
Sep 17 14:11:05 compegg kernel: RAX: 00000000ffffffff RBX: ffff8100a6408600 RCX: 0000000000000000
Sep 17 14:11:05 compegg kernel: RDX: 00000000ffffffea RSI: 0000000000000000 RDI: ffff8100a6408600
Sep 17 14:11:05 compegg kernel: RBP: ffff81008a43fb98 R08: 0000000000000001 R09: 0000000000000000
Sep 17 14:11:05 compegg kernel: R10: ffff81008a43fb28 R11: ffff81008a43fb68 R12: ffff8100a6408600
Sep 17 14:11:05 compegg kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Sep 17 14:11:05 compegg kernel: FS:  00007fc030c6b6f0(0000) GS:ffffffff81417000(0000) knlGS:0000000000000000
Sep 17 14:11:05 compegg kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 17 14:11:05 compegg kernel: CR2: 00007fcd351a7fd0 CR3: 0000000092574000 CR4: 00000000000006e0
Sep 17 14:11:05 compegg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 17 14:11:05 compegg kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 17 14:11:05 compegg kernel: Process modprobe (pid: 3930, threadinfo ffff81008a43e000, task ffff81009c1d8000)
Sep 17 14:11:05 compegg kernel: Stack:  ffff81008a43fbb8 ffffffff8104da7b ffff8100a6408600 ffff8100a6408600
Sep 17 14:11:05 compegg kernel: ffff8100a6408600 0000000000000001 ffff81008a43fbc8 ffffffff811fa350
Sep 17 14:11:05 compegg kernel: ffff81008a43fc18 ffff81008a43fc18 ffff8100a6408600 0000000000000000
Sep 17 14:11:05 compegg kernel: Call Trace:
Sep 17 14:11:05 compegg kernel: [<ffffffff8104da7b>] ? __blocking_notifier_call_chain+0x58/0x6a
Sep 17 14:11:05 compegg kernel: [<ffffffff811fa350>] __cpufreq_governor+0x9b/0xd9
Sep 17 14:11:05 compegg kernel: [<ffffffff811fa559>] __cpufreq_set_policy+0x195/0x211
Sep 17 14:11:05 compegg kernel: [<ffffffff811fb90b>] cpufreq_add_dev+0x46a/0x58c
Sep 17 14:11:05 compegg kernel: [<ffffffff811fbbd4>] ? handle_update+0x0/0x33
Sep 17 14:11:05 compegg kernel: [<ffffffff811b75e9>] sysdev_driver_register+0xc3/0x122
Sep 17 14:11:05 compegg kernel: [<ffffffff811fa691>] cpufreq_register_driver+0xbc/0x18e
Sep 17 14:11:05 compegg kernel: [<ffffffffa0b8ed5c>] :powernow_k8:powernowk8_init+0x8a/0x93
Sep 17 14:11:05 compegg kernel: [<ffffffff81059dd4>] sys_init_module+0x199c/0x1af8
Sep 17 14:11:05 compegg kernel: [<ffffffff810ac1c4>] ? do_sync_read+0xe7/0x12d
Sep 17 14:11:05 compegg kernel: [<ffffffff8100c291>] tracesys+0xd0/0xd5
Sep 17 14:11:05 compegg kernel:
Sep 17 14:11:05 compegg kernel:
Sep 17 14:11:05 compegg kernel: Code: 36 01 00 00 31 d2 ff ce 0f 85 da 01 00 00 44 0f a3 2d 5f 56 27 00 19 c0 85 c0 ba ea ff ff ff 0f 84 c3 01 00 00 83 7f 2c 00 75 04 <0f> 0b eb fe 48 c7 c7 b0 92 3f 81 e8 f1 f4 09 00 83 3d 0b 86 44 
Sep 17 14:11:05 compegg kernel: RIP  [<ffffffff811fbf4e>] cpufreq_governor_userspace+0x4d/0x217
Sep 17 14:11:05 compegg kernel: RSP <ffff81008a43fb68>
Sep 17 14:11:05 compegg kernel: ---[ end trace 23f2f3ca5b0dc5c2 ]---
Sep 17 14:11:30 compegg kerneloops: Submitted 1 kernel oopses to www.kerneloops.org
[root@compegg ~]#
Comment 4 Jarod Wilson 2008-09-17 17:18:13 EDT
Okay, yeah, I sorta figured this was actually a kernel-side bug, not a cpuspeed
userspace issue (the powernow-k8 kernel driver is going boom), so I'm
reassigning it over to the kernel. Not something I've seen before, but maybe
davej recognizes it...
Comment 5 Tom Mitchell 2008-10-16 15:55:16 EDT
In an attempt to get an error with more active logging tools
I moved /etc/rc5.d/S06cpuspeed to /etc/rc5.d/S99cpuspeed.
In 6 days I have not seen the the error.  If this is not just
a matter of luck there is a race condition.

Also I note a test for acpi in the start script as a fall back. On this point alone, does it make sense for cpuspeed to be S06 when acpid is S26?

I also see /boot/config-2.6.26.5-45.fc9.x86_64:CONFIG_X86_POWERNOW_K8_ACPI=y which is used in arch/x86/kernel/cpu/cpufreq/powernow-k8.c implying that acpid should be run well in advance of cpuspeed (?).

Anyhow, I can see no good reason for cpuspeed to be early in the startup sequence and there is no reason I know of to slow down the CPU in any of the   startup sequence scripts so it makes sense to me to apply a patch something like this to /etc/init.d/cpuspeed:

$ diff -c cpuspeed.new cpuspeed
*** cpuspeed.new	2008-10-16 12:30:30.000000000 -0700
--- cpuspeed	2008-10-16 12:29:37.000000000 -0700
***************
*** 1,7 ****
  #!/bin/sh
  # Startup script for cpuspeed
  #
! # chkconfig: 12345 99 99
  # description: Run dynamic CPU speed daemon and/or load appropriate
  #              cpu frequency scaling kernel modules and/or governors
  
--- 1,7 ----
  #!/bin/sh
  # Startup script for cpuspeed
  #
! # chkconfig: 12345 06 99
  # description: Run dynamic CPU speed daemon and/or load appropriate
  #              cpu frequency scaling kernel modules and/or governors
  

If this is applied, a %pre and %post may be needed to test to see if cpuspeed is checked on at the old S06 level -- perhaps a simple %post "test -L; mv S06cpuspeed S99spuspeed" for all the run levels...

Thanks,
mitch

PS was this bug seen by DJ?
 as implied 2008-09-17 17:18:13
Comment 6 Tom Mitchell 2008-12-01 18:11:14 EST
after moving the  init startup script to 99 I see automated kerneloops 
bug reports flow to www.kerneloops.org .  BTW: I still see this issue with 2.6.27.5-41.fc9.x86_64
Comment 7 Jarod Wilson 2008-12-08 16:12:00 EST
I think I meant to kick this over to the kernel component, but forgot. Initscript order definitely shouldn't matter here, its a kernel issue. Possibly one recently resolved by some amd-specific commits that just went into 2.6.28...
Comment 8 Dave Jones 2008-12-09 15:39:41 EST
There's an F10 bug which is the same thing that I've been scratching my head over for a while.  I'll dupe this against that one, because when it gets fixed in F10 it'll also get fixed in F9 if it's still open for updates.

Right now, I'm still at a loss to explain what's actually happening though.

I've asked some of the AMD guys who wrote the powernow-k8 driver to take a look at the other bug. Nothing from them yet, so it's probably equally as mystifying to them as it is to me.

*** This bug has been marked as a duplicate of bug 470551 ***

Note You need to log in before you can comment on or make changes to this bug.