The patch to fix this problem, linux-2.6-acpi-check-_pss-frequency-to-prevent-cpufreq-crash.patch, causes my HP Pavillion a530n to be stuck at the top CPU frequency. The BIOS problem is that there are two entries with bad frequencies (0x9999999!) followed by three fine entries. Before this patch, the bad entries were ignored (with warnings) and the good entries were accepted. Here's an extract from dmesg: powernow-k8: Pre-initialization of ACPI failed powernow-k8: Found 1 AMD Athlon(tm) 64 Processor 3200+ processors (1 cpu cores) (version 2.20.00) powernow-k8: invalid freq entries 3300000 kHz vs. 2147483048 kHz powernow-k8: invalid freq entries 3300000 kHz vs. 2147483048 kHz powernow-k8: 0 : fid 0xc (2000 MHz), vid 0x2 powernow-k8: 1 : fid 0xa (1800 MHz), vid 0x6 powernow-k8: 2 : fid 0x0 (800 MHz), vid 0xa powernow-k8: ph2 null fid transition 0xc After this patch, the existence of the bad entries means that all entries are ignored. Here's an extract from dmesg: powernow-k8: Pre-initialization of ACPI failed powernow-k8: Found 1 AMD Athlon(tm) 64 Processor 3200+ processors (1 cpu cores) (version 2.20.00) ACPI: Invalid BIOS _PSS frequency: 0x9999999 MHz powernow-k8: BIOS error: maxvid exceeded with pstate 2 Is there any way that the bad entries can be skipped but the good ones accepted? Before the patch, the bad entries were skipped by code in kernel-2.6.18/vanilla/arch/i386/kernel/cpu/cpufreq/powernow-k8.c and kernel-2.6.18/linux-2.6.18.x86_64/arch/i386/kernel/cpu/cpufreq/powernow-k8.c. Look for "invalid freq entries". Alternatively, is there a kernel parameter that I could use to bypass this problem? There is no chance that the BIOS will be fixed at this late date.
Hugh, Could you please do a x86info -a | grep Pstate and paste the contents in this BZ? Thanks, P.
There is no standard out from that pipeline (i.e. Pstate does not appear). stderr gets "munmap: Invalid argument". Peering into the output of the x86info command, I wonder if this is what you need to see. It comes at the end. FID changes won't happen VID changes won't happen Voltage ID codes: Maximum=2.000V Startup=1.900V Currently=2.000V Frequency ID codes: Maximum=9.0x Startup=9.0x Currently=9.0x Decoding BIOS PST tables (maxfid=c, startvid=2) Found PSB header at 0x2b10db6d30a0 Table version: 0x14 Sorry, only v1.2 tables supported right now [For the benefit of other readers: I originally tacked my comment onto https://bugzilla.redhat.com/show_bug.cgi?id=500311. Prarit decided it should be a separate bz.]
Hugh, could you run sosreport on your system and attach the report here? Thanks, P.
It looks as if sosreport program might gather confidential information. Certainly sosreport says "[the information collected] will be considered confidential information." I don't think that you can do that with a bz attachment. I should say that I am not currently a Red Hat customer. I'm using CentOS. What would you like to know about my system? Some information: - it is running CentOS 5.4 on x86_64 with all updates - the last released kernel that allowed clock scaling was 2.6.18-128.7.1 - the next kernel that I tried did not do clock scaling: 2.5.18-164.el5 - system web page from vendor: http://h10025.www1.hp.com/ewfrf/wc/product?product=404646&lc=en&cc=ca&dlc=en&lang=en&tmp_track_link=ot_we/prodlink/en_ca/404646/loc:0&cc=ca - motherboard specifications: http://h10025.www1.hp.com/ewfrf/wc/document?docname=c00064822&lc=en&dlc=en&cc=ca&product=404646&lang=en I will attach dmesg output for each. That should show a fair bit of detail about the hardware. The first message highlights the bit of the dmesg log that seems relevant.
Created attachment 387222 [details] dmesg output from 2.6.18-128.7.1.el5 -- scaling worked
Created attachment 387224 [details] dmesg output from 2.6.18-164.el5 -- scaling does not work
Reporter refuses to do a sosreport and is running CentOS. CLOSED as INSUFFICIENT_DATA. P.
If you give me a secure and confidential way to submit an sosreport, I will do so. I built kernel 2.6.18-164.11.1.el5, suppressing the patch linux-2.6-acpi-check-_pss-frequency-to-prevent-cpufreq-crash.patch. The resulting kernel seems to work and does allow frequency scaling. I will attach dmesg output.
Created attachment 387493 [details] dmesg output from 2.6.18-164.el5.bz559357 -- scaling works with offending patch removed
Created attachment 387665 [details] acpidump of misbehaving machine This should be the only system information required to understand why the kernel code can no longer scale the CPU frequency. When the acpidump was executed, it produced the message "Wrong checksum for OEMB!".
Created attachment 387666 [details] _PSS disassembled, with added comments This is the "human readable" version of _PSS on the system. It turns out that the invalid entries are at the end. If the working entries are to be believed, the power consumption is over 50 Watts higher in idle mode when CPU frequency scaling is not used. Ouch.
See also http://bugzilla.kernel.org/show_bug.cgi?id=15174
Hugh, I *think* I might have a quick solution. Do you have the ability to install the kernel source, patch, and build? (It seems like you do given the data you've provided me in this BZ) P.
Thanks, Prarit Bhargava Yes, I have the ability to patch, build, and install a kernel. My base is a CentOS kernel. I don't know all the differences between a CentOS and RHEL kernel, but any patch you propose would surely apply. Since upstream has expressed interest in this problem, you might wish to post to the bz entry I mentioned in #13. There is no point in RHEL and the Mainline kernel diverging uselessly. I don't urgently need a fix since I have one already, as mentioned in #9. Even so, I would be happy to test a fix you propose. It would be really good if it could also be tested on a machine that required the original linux-2.6-acpi-check-_pss-frequency-to-prevent-cpufreq-crash.patch.
Created attachment 388367 [details] Initial patch Hugh, Here's a patch that might fix the problem. Please let me know how it goes... If it succeeds or fails, could you please post the dmesg log from the boot? Thanks, P.
Thanks, Prarit. Do you wish me to test with or without linux-2.6-acpi-check-_pss-frequency-to-prevent-cpufreq-crash.patch?
(In reply to comment #17) > Thanks, Prarit. > > Do you wish me to test with or without > linux-2.6-acpi-check-_pss-frequency-to-prevent-cpufreq-crash.patch? Hey Hugh -- I'd like you to test with linux-2.6-acpi-check-_pss-frequency-to-prevent-cpufreq-crash.patch. P.
Created attachment 388688 [details] dmesg output from 2.6.18-164.el5.bz559357b16 -- scaling does not work dmesg from stock 2.6.18-164.el5 + "initial patch" (see #16). This kernel does not frequency scale as far as I can tell.
I just went back and checked the BUILD directory. Yes, the patch is reflected in the source code that I built and tested. Prarit: could you have a look at the patch I put in the kernel.org bz entry? I don't know enough to be sure that it is the right approach, but I imagine that you do.
After working on this for a while upstream, this is the conclusion: The patch linux-2.6-acpi-check-_pss-frequency-to-prevent-cpufreq-crash.patch backported into the RHEL kernel introduced the problem observed on my machine The original patch, in the upstream kernel, would not have caused this problem. The reason is that the upstream patch was applied after http://git.moblin.org/cgit.cgi/acpica/commit/?id=ffd0eca830ee3f762e387fe5519fe34fc44b0231 This missing patch eliminates package elements beyond the number that the package specifies. In the case of my machine, the _PSS package says it has 3 elements. It actually has more, but the extra ones are invalid. The invalid elements cause code from linux-2.6-acpi-check-_pss-frequency-to-prevent-cpufreq-crash.patch to discard my machine's whole _PSS. Upstream would only consider the 3 valid elements. So: since linux-2.6-acpi-check-_pss-frequency-to-prevent-cpufreq-crash.patch has been backported and included in the RHEL kernel, I suggest that ffd0eca830ee3f762e387fe5519fe34fc44b0231 (or a successor) be adopted. To understand this better, please read https://bugzilla.kernel.org/show_bug.cgi?id=15174
Could this be retested with the latest 5.x driver? AMD has changed the way we handle p-states and I don't think this should be an issue any longer.
@Mark Langsdorf: I don't know what you mean by the latest 5.5 driver. I have updated to kernel-2.6.18-194.17.1.el5 and the problem is still there. As I understand it, the problem is in code common to Intel and AMD, before the AMD-specific code is executed. Furthermore, the problem is in RHEL, not kernel.org: RHEL cherry-picked patches; they adopted one that created this problem (and solved others for Intel) but not another that avoided it.
Sorry D. Hugh, I've been really busy with some other critical issues and finally had time to come back to this. I'm putting together a patch based on the suggested patch in the kernel.org bug. I'm testing it on some "known good" systems and then I'll attach it to this BZ for you to test. Would that work for you? Again, sorry for the long delay, P.
@Prarit: That would be great for me. I just wonder if it is worthwhile for Red Hat. 1) I haven't heard of others hitting this problem, so it probably doesn't affect many. (I admit that not everyone affected by a problem reports it or even recognizes it.) 2) There is a small chance that the "fix" would break other systems that are currently working. Not only does the code have to be correct, it has to not tickle any BIOS bugs that are currently latent. I imagine that chance is very slight since kernel.org has used this fix already, but it still must be considered. 3) I can live with my current work-around: whenever a new kernel is shipped, I simply rebuild it with Patch 24199 removed.
(In reply to comment #25) > @Prarit: > > That would be great for me. I just wonder if it is worthwhile for Red Hat. > > 1) I haven't heard of others hitting this problem, so it probably doesn't > affect many. (I admit that not everyone affected by a problem reports it or > even recognizes it.) Right -- and I suspect that some people may not have noticed. > > 2) There is a small chance that the "fix" would break other systems that are > currently working. Not only does the code have to be correct, it has to not > tickle any BIOS bugs that are currently latent. I imagine that chance is very > slight since kernel.org has used this fix already, but it still must be > considered. Yeah -- I'm thinking of using a boot parameter to enable the new check. > > 3) I can live with my current work-around: whenever a new kernel is shipped, I > simply rebuild it with Patch 24199 removed. Okay ... you're more than welcome to do that. I'll close this as WONTFIX for now. If you have any problems please feel free to ping me directly. P.