Description of problem: My lovely AMD64 X2 doesn't scale ondemand as expected, it remains locked at the lowest performance setting at all times which is rather unfortunate. Version-Release number of selected component (if applicable): cpuspeed-1.2.1-1.40.fc6 kernel-2.6.17-1.2617.2.1.fc6 How reproducible: 100% Steps to Reproduce: 1. Boot lovely AMD64 X2 setup Actual results: Both cores set at 1GHz at all times Expected results: Correct scaling through to 2.2GHz per core under load Additional info:
Created attachment 135611 [details] Output from /proc/cpuinfo under heavy load (7.11, 6.88, 5.35)
Created attachment 135612 [details] dmesg output
This posting to LKML indicates that this is a kernel issue os I'm moving the bug into DaveJ territory. http://www.ussg.iu.edu/hypermail/linux/kernel/0609.1/0380.html I looked over the git changelogs for cpufreq.c which I gather would be the vector for the wrong return and I found: http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3bcb09a35641f2840bd59d8f82154f830dca282c The intersting part of the change goes: + err = -EBUSY; + if (__find_governor(governor->name) == NULL) { + err = 0; + list_add(&governor->governor_list, &cpufreq_governor_list); Since we always seem to return -EBUSY according to the posting that would mean + if (__find_governor(governor->name) == NULL) { is the cause of all this havoc, it does not seem that we ever us that test or err would have been set to 0 and all would be well.
Those messages about being unable to turn on the fan are somewhat disturbing. Maybe the ACPI maintainers have some clues whats going on here?
Don't get me wrong, the fan is running but ACPI has been complaining like that for ages. I have one open for that issue mainly because it's spewing that message all over my active VC making it rather unusable. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199812
I'm seeing something similar. Sadly, it doesn't seem to be the exact same thing. I am running on an Intel Centrino. When I first boot up, things work fine, CPU scale as you would assume. Some time later (usually not too long), this stops working and it sticks at the lowest setting. I've running things: /usr/bin/cpufreq-set --max 1.4Ghz /usr/bin/cpufreq-set -f 1.4GHz to no avail, it still continues to sit at the 600Mhz setting. Just to add a little more fun, every once in a while, it will start scaling again correctly for a bit and then stop again. It looks like this bug was added around the beginning of Sept and that's about when I started seeing this (maybe even a week or so earlier), so it seems that something changed in that timeframe that caused this to happen.
Ryan, is there anything in dmesg when it does this limiting ? (Try booting with cpufreq.debug=7 too for more info)
So, some interesting progress on this from my end. After seeing the fan messages in David's output, I decided to check my fan. I noticed it didn't seem to be running. I played around a bit and didn't seem to be able to force it on. I also noticed the computer was running at about 54 degrees celcius. I opened up my laptop, cleaned out the fan (which was full of crap) and rebooted. The fan spun right up. Since doing this, my computer seems to be scaling no problem. I forced the CPU speed up to 1.4Ghz and it has stayed there since (which in the past, it would eventually reset itself to 600Mhz). I looked at the temperature now and it is at 45 degrees celcius. So, it SEEMS like my machine may have been overheating and the computer was trying to compinsate by lowering the cpu level. Is there a hard limit somewhere in the kernel (again probably something that would have been added in late August/early September) for temperature? Seems like this may have been causing my issues. Will let you know if this stops working, but I've been running for about 2 1/2 hours without issue.
After 8 hours, things still seem to be working just fine. Seems like it was the heat, for me at least.
In case it matter in terms of limiting the variables this is a Shuttle SN95G5V3 (motherboard is a Shuttle FN95) and the CPU is a +4400 AMD64 X2 with 1 MB of cache per core (I'm unsure of AMDs current naming scheme). This, unlike Ryans issue, is not a heat issue I checked by doing a manual recompile of the kernel to factor out cpuscaling element and run the CPU at maximum load for days (encoding Ogg Theora files of my DVD collection), the fan speed scales fine under those circumstances according to emitted heat and no crash or other overheating indication occured.
David, When ondemand stops working, does /sys/..../cpufreq/cpuinfo_max_freq drop down to lowest freq? I guess it is, and if it is, it should be happening due to a call from thermal code. If it is indeed happening that way, can you add some print messages in driver/acpi/processor_thermal.c acpi_thermal_cpufreq_decrease() and acpi_thermal_cpufreq_notifier() and check when those are getting called. Thanks, Venki
stops working?? It doesn't appear to start working. it's locked hard on 1GHz. output of cpufreq-info (reduced to one cpu for readablitiy): analyzing CPU 1: driver: powernow-k8 CPUs which need to switch frequency at the same time: 0 1 hardware limits: 1000 MHz - 2.20 GHz available frequency steps: 2.20 GHz, 2.00 GHz, 1.80 GHz, 1000 MHz available cpufreq governors: ondemand, userspace, performance current policy: frequency should be within 1000 MHz and 1000 MHz. The governor "userspace" may decide which speed to use within this range. current CPU frequency is 1000 MHz (asserted by call to hardware). acpitool claims that the fan is on so I don't get why I get all those ACPI messages about not being able to turn the fan on. It seems to me that ACPI is getting a tad confused here. It's very non-responsive to changing the governor to performance (or any other for that matter) not even error output or messages in dmesg. I'll try to add the print messages and report back.
Oh. I thought you were trying to use 'ondemand' governor and that failed. Looks like you were just saying CPU frequency doesn't scale on demand from the load. And the above messages say you were using the 'userspace' governor too. My guess is, the problem is not about how governor is behaving. But, how kernel is finding out what freqs CPU can run at and is there something in the kernel (like thermal) that is limiting this frequency. If you are recompiling the kernel, you should also make sure you have CPU_FREQ_DEBUG config option enabled and boot with cpufreq.debug=7. That will give a lot more messages related to cpufreq, which can give some hint about the problem.
Created attachment 136872 [details] dmesg output with cpufreq.debug=7 set and printk inserted dmesg output from kernel-2.6.17-1.2647 with cpufreq.debug=7 set and printk calls inserted at various spots in the requested functions. * Next time I promise to remember to terminate my strings, it has been to long since I coded anything real.
Thanks for the debug log David. Seems that the ACPI thermal module is trying to passively cool this system by lowering the maximum frequency, and that cpufreq is doing exactly as requested. After that fails, the thermal tries to enable a fan, which claims to fail. (we should probably have some debug messages in thermal to make this this easier to discover...) What do you see when you dump the contents of /proc/acpi/thermal_zone/*/*
cooling mode: active <polling disabled> state: passive temperature: 53 C critical (S5): 60 C passive: 50 C: tc1=4 tc2=3 tsp=60 devices=0xffff810003f6a298 active[0]: 50 C: devices=0xffff81007ff8c810
This is defintely happening for me on an Intel Pentium M laptop. The system has been up for 1:20 and CPU scaling was working for some part of that time. dmesg does show this kernel error (seems unrelated though): ============================================= [ INFO: possible recursive locking detected ] 2.6.17-1.2647.fc6 #1 --------------------------------------------- java/3787 is trying to acquire lock: (slock-AF_INET6){-+..}, at: [<c05b392e>] sk_clone+0xd4/0x2d8 but task is already holding lock: (slock-AF_INET6){-+..}, at: [<f8b1d4c9>] tcp_v6_rcv+0x327/0x736 [ipv6] other info that might help us debug this: 1 lock held by java/3787: #0: (slock-AF_INET6){-+..}, at: [<f8b1d4c9>] tcp_v6_rcv+0x327/0x736 [ipv6] stack backtrace: [<c04051ee>] show_trace_log_lvl+0x58/0x171 [<c0405802>] show_trace+0xd/0x10 [<c040591b>] dump_stack+0x19/0x1b [<c043b9e1>] __lock_acquire+0x778/0x99c [<c043c176>] lock_acquire+0x4b/0x6d [<c061539b>] _spin_lock+0x19/0x28 [<c05b392e>] sk_clone+0xd4/0x2d8 [<c05dc49b>] inet_csk_clone+0xf/0x72 [<c05ed2d9>] tcp_create_openreq_child+0x1b/0x3a1 [<f8b1c155>] tcp_v6_syn_recv_sock+0x271/0x5b3 [ipv6] [<c05ed834>] tcp_check_req+0x1d5/0x2e9 [<f8b1b441>] tcp_v6_do_rcv+0x142/0x340 [ipv6] [<f8b1d883>] tcp_v6_rcv+0x6e1/0x736 [ipv6] [<f8b03a6f>] ip6_input+0x1c3/0x296 [ipv6] [<f8b03fdf>] ipv6_rcv+0x1d2/0x21f [ipv6] [<c05b9ab6>] netif_receive_skb+0x2e2/0x366 [<c05bb42f>] process_backlog+0x99/0xfa [<c05bb612>] net_rx_action+0x9d/0x196 [<c04293bf>] __do_softirq+0x78/0xf2 [<c040668b>] do_softirq+0x5a/0xbe [<c04291b6>] local_bh_enable_ip+0xa9/0xcf [<c0615339>] _spin_unlock_bh+0x25/0x28 [<c05b272f>] release_sock+0xb0/0xb8 [<c05f5552>] inet_stream_connect+0x113/0x206 [<c05b1692>] sys_connect+0x67/0x84 [<c05b1d04>] sys_socketcall+0x8c/0x186 [<c0403faf>] syscall_call+0x7/0xb DWARF2 unwinder stuck at syscall_call+0x7/0xb Leftover inexact backtrace: I'm going to reboot with the cpufreq.debug=7 and see if that will reveal more.
I guess I should add that I'm running 2.6.17-1.2647.fc6.
Created attachment 136983 [details] dmesg since boot with cpufreq.debug=7 With 2.6.18-1.2689.fc6 and cpufreq.debug=7, the initial range is (600000 - 2000000 kHz) and later gets changed to (600000 - 600000 kHz).
This bug is like one I had in FC4 (#137995) where it would get my max cpu freq wrong and the same work around applies to this bug: echo -n "2000000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq It went away with FC5 but is back again. With the 2689 kernel, this is the only way I can get my cpu above 600MHz (the min.)
I had to disappoint you, manually setting the frequency does not workaround the issue, the ACPI system appears to be locking the frequency because it mistakenly thinks it can't turn on the fan. It appears to be a malfunctioning failsafe.
It is related to thermal driver. Either rightly or wrongs, the thermal driver thinks that temperature is too high and tries to reduce the frequency to control temperature adn/or tries to turn the fan on. David: What do you see when you dump the contents of /proc/acpi/thermal_zone/*/*
I assume by dump you mean read (for which I used cat), for which you can see comment #16 but here goes another sampling for my friends at Intel. cooling mode: active <polling disabled> state: passive temperature: 54 C critical (S5): 60 C passive: 50 C: tc1=4 tc2=3 tsp=60 devices=0xffff810003f6c298 active[0]: 50 C: devices=0xffff81007ff8e810 The fan is running at what's called in the BIOS "smart fan", I could adjust that to have it running at full speed making all manners of noise to see if it continues to scale down even if thermal issues are absolutely no present. I also did check the CPU fan for technical errors or excessive amounts of dust none were present I could determined. No problem were present on the chipset fan either.
son of a ..... After updating my system and doing a reboot, I found this: analyzing CPU 1: driver: powernow-k8 CPUs which need to switch frequency at the same time: 0 1 hardware limits: 1000 MHz - 2.20 GHz available frequency steps: 2.20 GHz, 2.00 GHz, 1.80 GHz, 1000 MHz available cpufreq governors: ondemand, userspace, performance current policy: frequency should be within 1000 MHz and 2.20 GHz. The governor "ondemand" may decide which speed to use within this range. current CPU frequency is 2.20 GHz (asserted by call to hardware). I tested that it worked by switching governor and it seems that whatever Dave did in the recent update volley the issue seems to be gone.. I don't like it when things just stop being broken for no apparent reason but in this case I'm overjoyed.
Bug appears to be back with kernel-2.6.18-1.2699.fc6 it was perfectly fine with kernel-2.6.18-1.2693.fc6..
Created attachment 138235 [details] disassmbled DSDT for the system I've been playing a bit with various settings and it seems to work somewhat reliably if you set the low noise fan setting but not when using the default smart fan setting or the ultra low noise fan setting. I've attached a disassembled DSDT for the system in the hope that it might help.
This seems to have gone away, is anyone else still experiencing this bug, if not then I think we can close this.