Bug 500311 - Kernel panic when loading cpufreq_governor
Kernel panic when loading cpufreq_governor
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
low Severity urgent
: rc
: ---
Assigned To: Prarit Bhargava
Red Hat Kernel QE team
: Regression
: 467941 (view as bug list)
Depends On:
Blocks: 467941 480792
  Show dependency treegraph
 
Reported: 2009-05-12 04:37 EDT by Klaus Ethgen
Modified: 2013-01-10 02:59 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 05:01:16 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Eine Beschreibung, damit Bugzilla zufrieden ist (505.39 KB, image/jpeg)
2009-05-12 04:38 EDT, Klaus Ethgen
no flags Details
Screenshoot (3.07 MB, image/jpeg)
2009-05-14 05:20 EDT, Klaus Ethgen
no flags Details
sosreport (2.65 MB, application/bzip2)
2009-05-14 05:22 EDT, Klaus Ethgen
no flags Details
RHEL5 debug RPM (17.30 MB, application/x-rpm)
2009-05-20 11:35 EDT, Prarit Bhargava
no flags Details
pmtools RPM for x86_64 (22.14 KB, application/x-rpm)
2009-05-20 13:09 EDT, John Villalovos
no flags Details
acpi.dump.bz2 (25.70 KB, application/bzip2)
2009-05-22 08:53 EDT, Klaus Ethgen
no flags Details
i686 PAE kernel to test (15.66 MB, application/x-rpm)
2009-05-22 09:53 EDT, John Villalovos
no flags Details
Check return value from acpi_processor_preregister_performance (533 bytes, patch)
2009-05-26 11:05 EDT, Matthew Garrett
no flags Details | Diff
patch vs 2.6.30-rc7 to sanity check _PSS frequency (2.14 KB, patch)
2009-05-26 15:21 EDT, Len Brown
no flags Details | Diff
patch vs 2.6.30-rc7 to sanity check _PSS frequency (2.14 KB, patch)
2009-05-26 15:29 EDT, Len Brown
no flags Details | Diff
patch above, refreshed against 2.6.18.8 (2.12 KB, patch)
2009-05-26 15:32 EDT, Len Brown
no flags Details | Diff
RHEL5 RPM with fix (15.67 MB, patch)
2009-05-27 12:02 EDT, Prarit Bhargava
no flags Details | Diff
RHEL5 RPM with fix (15.68 MB, application/octet-stream)
2009-05-28 08:44 EDT, Prarit Bhargava
no flags Details
RHEL5 fix for this issue (2.09 KB, patch)
2009-05-28 11:55 EDT, Prarit Bhargava
no flags Details | Diff

  None (edit)
Description Klaus Ethgen 2009-05-12 04:37:19 EDT
Description of problem:
When the cpuspeed init script is loading the cpufreq_governor on RHEL5.3 the attached kernel panic happens.

The problem seems to be cpu specific as on some machines it works.

The Bug happens on 64bit and 32bit kernels.

Version-Release number of selected component (if applicable):
Kernel 2.6.18-92 works well
Kernel 2.6.18-128 is broken

Actual results:
Kernel Panic while boot

Expected results:
NO kernel panic
Comment 1 Klaus Ethgen 2009-05-12 04:38:47 EDT
Created attachment 343552 [details]
Eine Beschreibung, damit Bugzilla zufrieden ist
Comment 2 Klaus Ethgen 2009-05-12 05:28:50 EDT
CPU is a Xenon E5430
Comment 3 Prarit Bhargava 2009-05-13 08:33:31 EDT
Klaus, could you boot a "known good" kernel, run sosreport, and then attach the output to this BZ?

Thanks,

P.
Comment 5 Prarit Bhargava 2009-05-13 08:48:06 EDT
Klaus, could you also boot with the parameter "vga=791" and attach a picture of the panic?  There are a few lines above the panic that I need ... hopefully you'll be able to capture those.

Thanks,

P.
Comment 6 Klaus Ethgen 2009-05-14 05:18:19 EDT
Hello,

I do not have the old kernel on the system so I created the sosreport by "chmod 000 /etc/init.d/cpuspeed" and the recent kernel. I will try to upload the sosreport to bugzilla.

I also made a screenshoot of the panic. Unfortunatelly the panic was not printed completly and the system hang in the stack trace. (Kernel 2.6.18-128.1.10.el5PAE)
Comment 7 Klaus Ethgen 2009-05-14 05:20:53 EDT
Created attachment 343929 [details]
Screenshoot
Comment 8 Klaus Ethgen 2009-05-14 05:22:50 EDT
Created attachment 343930 [details]
sosreport
Comment 9 Prarit Bhargava 2009-05-20 08:39:08 EDT
If I didn't know any better I'd swear that the BIOS is handing us a system where p-states were not active/not initialized (not sure what the proper nomenclature is for this state).

Later on in the boot, the kernel attempts to access the current p-state, and then it panics because one doesn't exist.

Seems like a BIOS/ACPI issue to me -- jvillalo?

P.
Comment 10 John Villalovos 2009-05-20 09:43:47 EDT
Can we get a copy of the output from acpidump posted here?

It is in the pmtools package.
Comment 11 Klaus Ethgen 2009-05-20 10:00:47 EDT
I'd like to do so but there is no pmtools package in the enterprise solution.
Comment 12 Klaus Ethgen 2009-05-20 10:04:36 EDT
To #9:

Let me ask two questions:
1. Why is the bug only in the version >= *-128?
2. Shouln't it be captured if there is no p-state?

So, for me it is irrelevant if the p-state (whatever that is) is available. If not the kernel should work around it (Maybe with a small log warning or so).
Comment 13 John Villalovos 2009-05-20 10:32:19 EDT
Klaus,

A few things.

The pmtools source can be found at:
http://www.lesswatts.org/projects/acpi/utilities.php

Also, I was able to build the Fedora SRPM on my RHEL5 build root:
ftp://mirrors.kernel.org/fedora/releases/10/Everything/source/SRPMS/pmtools-20071116-1.fc9.src.rpm


Why things could have changed.  RHEL 5.3 uses the acpi-cpufreq driver where previously it was using the speedstep-centrino driver.  That was done by Bug 449787.
Comment 14 Prarit Bhargava 2009-05-20 10:58:08 EDT
(In reply to comment #12)
> To #9:
> 
> Let me ask two questions:
> 1. Why is the bug only in the version >= *-128?

An update to the ACPI code probably causes this to happen >= *-128.

> 2. Shouln't it be captured if there is no p-state?
> 
> So, for me it is irrelevant if the p-state (whatever that is) is available. If
> not the kernel should work around it (Maybe with a small log warning or so).  

Yes, but my previous statement was a stab-in-the-dark.  I have *no* idea if that is really the case and that's why doing an 'acpidump' on your system would be helpful.

P.
Comment 15 Prarit Bhargava 2009-05-20 11:35:53 EDT
Created attachment 344831 [details]
RHEL5 debug RPM

Klaus, could you try booting with this RPM and report back with a result?

Thanks,

P.
Comment 16 Len Brown 2009-05-20 12:22:40 EDT
Is this a production system with a production BIOS?

 (DMI says:
        Product Name: S5000PAL

        Vendor: Intel Corporation
        Version: S5000.86B.10.00.0094.101320081858
        Release Date: 10/13/2008

Is EIST enabled or disabled by default in the BIOS?

Does this panic happen both with EIST enabled and disabled in BIOS Setup?

This look like a duplicate of bug #467941
Comment 17 John Villalovos 2009-05-20 13:09:23 EDT
Created attachment 344846 [details]
pmtools RPM for x86_64

I have added a pmtools RPM compiled for RHEL 5.
Comment 18 Len Brown 2009-05-20 13:19:49 EDT
Looking at support.intel.com S5000PAL page...

While it appears that there is indeed a more recent BIOS
Build Stamp : S5000.86B.11.00.0096.01132009

Build Date  : January 13, 2009

clearly this board has been shipping in production with
the installed BIOS 94 for some time, and with even older
BIOS for some time before that.

So I do not recommend upgrading the BIOS in case that
actually helps.  Instead, we need to root cause the failure.

After answering the questions above, please try 
Youquan, Song's patch here:
http://patchwork.kernel.org/patch/23336/
Comment 19 Prarit Bhargava 2009-05-20 13:33:02 EDT
(In reply to comment #18)

> After answering the questions above, please try 
> Youquan, Song's patch here:
> http://patchwork.kernel.org/patch/23336/  

This is very similar to the patch I put into the previously attached binary.

Please try running that kernel RPM and see if it resolves the problem.

P.
Comment 20 Len Brown 2009-05-20 18:08:48 EDT
> This is very similar to the patch I put into the previously attached binary.

great.
Note, however, that it is a debug patch that will not go upstream.
we need to find the root cause, and for that we need the acpidump output.
Comment 21 John Villalovos 2009-05-22 08:40:24 EDT
Klaus,

Feedback from testing of the kernel from comment 15 would be greatly appreciated.  If we get some feedback then we can try to put a fix into the next release of RHEL 5.

Also the acpidump output would be very useful too.

Thanks.
Comment 22 Klaus Ethgen 2009-05-22 08:52:18 EDT
Steady, steady, yesterday was a bussiness holiday in swizerland. :-)

I will put the acpi dump as an attachment. I hope the tool was collecting the right think, the source rpm has no build requirements. However, it looks good for me.
Comment 23 Klaus Ethgen 2009-05-22 08:53:29 EDT
Created attachment 345089 [details]
acpi.dump.bz2
Comment 24 Klaus Ethgen 2009-05-22 09:42:14 EDT
@#15, Prarit Bhargava: Unfortunatelly the RPM is an x86_64 kernel. But the system is installed with a 32bit PAE kernel.

(64bit is mostelly a pain in the ... RH has sometimes trouble choosing the right RPM to install. But this is another problem.)
Comment 25 Klaus Ethgen 2009-05-22 09:46:13 EDT
@#16, Len Brown: I am not authorized to see the bug you have refered to. (Let me say that I personally find the politic of closed source bugs ... crappy!)
Comment 26 John Villalovos 2009-05-22 09:53:36 EDT
Created attachment 345093 [details]
i686 PAE kernel to test

Klaus,

Here is an i686 PAE kernel for you to test.  Also I have opened up Bug 467941 to the world
Comment 27 Klaus Ethgen 2009-05-22 10:07:20 EDT
Thanks. Now I have a better view on the bug. Now I can also understand the question about productive BIOS in #16.

So, yes, this is a productive BIOS which is shipped with the complete system by our hardware deliverer.

And yes, the system is in the way to hit productive state. So I have to wait until monday to check with the responsible if I can reboot the machine in the test kernel. But tell me what the test kernel should do exactelly? Is it mention to crash when cpufreq is started? Then I have to reenable it. Or is it just a kernel with more debugging informations?
Comment 28 John Villalovos 2009-05-22 10:15:20 EDT
Klaus,

The kernel is compiled with the patch from:
https://bugzilla.redhat.com/show_bug.cgi?id=467941#c39

So my hope is that it will fix the problem and there will be no crash.
Comment 29 John Villalovos 2009-05-22 11:04:14 EDT
Klaus,

When you are doing the testing of the patched kernel it might be a good data point if you could also install the un-patched kernel from:
http://people.redhat.com/dzickus/el5/149.el5/i686/

Make sure that the un-patched kernel crashes and the patched kernel from Comment 26 fixes the issue.
Comment 30 Prarit Bhargava 2009-05-22 14:03:41 EDT
(In reply to comment #29)

> http://people.redhat.com/dzickus/el5/149.el5/i686/
> 
> Make sure that the un-patched kernel crashes and the patched kernel from
> Comment 26 fixes the issue.  

Actually, try 

ttp://people.redhat.com/dzickus/el5/150.el5/i686/

as well.  We pulled a few patches ...

P.
Comment 31 Song, Youquan 2009-05-25 06:12:20 EDT
From the acpi.dump information, this bug is the same bug as 467941. 
The root cause is that when cpu frequencey scaling is disabled, some BIOS report _PSS with all 0x80000000.  If the kernel treats this case as valid, the kernel will boot crash when load cpufreq govenors.

So Len, can you approval my patch to upstream to fix these kind of bugs? Thanks a lot.
Comment 32 John Villalovos 2009-05-26 09:15:33 EDT
*** Bug 467941 has been marked as a duplicate of this bug. ***
Comment 33 John Villalovos 2009-05-26 09:17:11 EDT
Klaus,

Were you able to test the kernel on Monday?
Comment 34 Klaus Ethgen 2009-05-26 10:18:58 EDT
Sorry for the late answer. I had to wait until today.

Now. To debug the problem I renamed the init script to enable cpuspeed, boot in the kernel and start it manually by Hand.

The result for 2.6.18-149.el5.bz467941_1PAE was the following lines in the dmesg:
ACPI: P-states disabled in the BIOS
ACPI: P-states disabled in the BIOS
ACPI: P-states disabled in the BIOS
ACPI: P-states disabled in the BIOS
ACPI: P-states disabled in the BIOS
ACPI: P-states disabled in the BIOS
ACPI: P-states disabled in the BIOS
ACPI: P-states disabled in the BIOS

and no other problem.

Using the 2.6.18-149.el5PAE kernel I have problemes generating the initrd.
Comment 35 Prarit Bhargava 2009-05-26 10:41:44 EDT
Just as an FYI, this is what I put into the debug kernel (sorry for the cut-and-paste):

diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index 38430d8..7ba6427 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -314,6 +314,17 @@ static int acpi_processor_get_performance_states(struct acpi_processor *pr)
                        kfree(pr->performance->states);
                        goto end;
                }
+
+               /* Some Intel boxes report 0x8000000 for the core_frequency
+                * when the BIOS has disabled p-state transitions on the system.
+                */
+               if (px->core_frequency == 0x80000000) {
+                       printk(KERN_ERR PREFIX
+                              "Invalid _PSS data: BIOS disabled p-states\n");
+                       result = -EFAULT;
+                       kfree(pr->performance->states);
+                       goto end;
+               }
        }

Len, I've read the other BZ, and I'm not 100% sure that I've captured what your objections to the above patch are.  My apologies, but could you reiterate them here?

Either way, I think that I will submit this patch internally for review.  We have certified systems that are currently panic'ing.

Thanks,

P.
Comment 36 Matthew Garrett 2009-05-26 11:03:33 EDT
It makes the assumption that this specific value has a meaning, which is fine until another BIOS vendor does the same thing with a different value and we have precisely the same bug again. The difficulty with validating _PSS is that there aren't any constraints on any of the fields - in theory they could all have any value. In reality it's obviously unlikely that things like frequency or busmastering latency are going to rise above certain thresholds, so we could apply some heuristics.

What may be a better bet is validating the _PSD contents. The coordination type is limited to a small set of values and this BIOS also overwrites this field. This should generate an error, but right now we appear to ignore the return code from that (in acpi_processor_preregister_performance) and so continue anyway. I'm not clear on why that's the case - it's possible that it causes us to bail on some systems that would otherwise work, but I haven't got an answer on that point yet. I'll attach a patch I've suggested upstream - it should be possible to apply it to RHEL by hand.
Comment 37 Matthew Garrett 2009-05-26 11:05:31 EDT
Created attachment 345477 [details]
Check return value from acpi_processor_preregister_performance
Comment 39 Luming Yu 2009-05-26 11:11:15 EDT
the magic number 0x80000000 is not definied in ACPI spec. 

A more reasonable sanity check could be, (and it will cover more insane BIOS data.)

if ( px->core_frequency > known_cpu_freq* 100)
 Invalid
if ( px->core_frequency < known_cpu_freq /100)
 Invalid

Len, Is that correct?
Comment 40 Matthew Garrett 2009-05-26 11:29:52 EDT
I'm not sure - it depends on whether there are systems that have invalid _PSD blocks but which have otherwise functional P-state code.
Comment 41 Len Brown 2009-05-26 13:36:29 EDT
I'm not excited about validating the BIOS P-state support
by checking _PSD, because _PSD is notoriously unreliable --
to the extent that we are hoping that in the future Linux
can ignore _PSD altogether.

BTW. the actual crash is here:

drivers/cpufreq/cpufreq_userspace.c

static int cpufreq_governor_userspace(struct cpufreq_policy *policy,
                                   unsigned int event)
{
        unsigned int cpu = policy->cpu;
        switch (event) {
        case CPUFREQ_GOV_START:
                if (!cpu_online(cpu))
                        return -EINVAL;
136:              BUG_ON(!policy->cur);


where...

struct cpufreq_policy {

        unsigned int            cur;    /* in kHz, only needed if cpufreq
                                         * governors are used */


So ACPI specifies frequency in MHz,
but Linux cpufreq stores frequency as KHz in a u32.
Thus, any ACPI frequency that overflows
a u32 when multiplied by 1024 is fundamentally illegal for cpufreq.

So if we are going to use a heuristic to validate MHz, it would
not be a comparison for == 0x80000000, but >= 0x400000,
since 0x400000 * 0x400 overflows 32-bits.
Comment 42 Venkatesh Pallipadi 2009-05-26 14:24:58 EDT
_PSD is optional as per ACPI and I don't think we should fail on invalid _PSD.

Also, we use cpufreq_freq = acpi_core_frequency * 1000 in acpi-cpufreq.c and not 1024. So, the above check technically would have to be
freq > (0xffffffff / 1000) or something like that.
Adding a check for upper bound here seems good a thing, next to already existing check of freq == 0.
Comment 43 Len Brown 2009-05-26 15:21:33 EDT
Created attachment 345504 [details]
patch vs 2.6.30-rc7 to sanity check _PSS frequency

patch candidate for upstream
Comment 44 Len Brown 2009-05-26 15:29:46 EDT
Created attachment 345506 [details]
patch vs 2.6.30-rc7 to sanity check _PSS frequency

wups, correct typo in previous patch
Comment 45 Len Brown 2009-05-26 15:32:29 EDT
Created attachment 345507 [details]
patch above, refreshed against 2.6.18.8
Comment 46 Prarit Bhargava 2009-05-27 08:29:04 EDT
Klaus,

I'm going to take the patch in comment #45, build a binary x86 32-bit RPM, and attach it here for you to test.

P.
Comment 47 Prarit Bhargava 2009-05-27 12:02:18 EDT
Created attachment 345635 [details]
RHEL5 RPM with fix

Klaus, please test this i686 RPM.

Thanks,

P.
Comment 48 Klaus Ethgen 2009-05-28 08:31:17 EDT
@Prarit:
   1. This is no PAE kernel, isn't it?
   2. Can you please upload kernels _not_ as type text/plain.
Comment 49 Prarit Bhargava 2009-05-28 08:42:44 EDT
(In reply to comment #48)
> @Prarit:
>    1. This is no PAE kernel, isn't it?

It shouldn't really matter for the test.  But I'll upload a PAE kernel.

>    2. Can you please upload kernels _not_ as type text/plain.  

Huh .... that's weird. 

/me goes off to file a bug against bugzilla. <grumble>

P.
Comment 50 Prarit Bhargava 2009-05-28 08:44:50 EDT
Created attachment 345747 [details]
RHEL5 RPM with fix

RHEL5 i686 PAE kernel

(Hopefully bugzilla doesn't munge this...)

P.
Comment 51 Klaus Ethgen 2009-05-28 10:30:31 EDT
Tested, nearly the same result as above:
ACPI: Invalid BIOS _PSS frequency: 0x80000000 MHz
ACPI: Invalid BIOS _PSS frequency: 0x80000000 MHz
ACPI: Invalid BIOS _PSS frequency: 0x80000000 MHz
ACPI: Invalid BIOS _PSS frequency: 0x80000000 MHz
ACPI: Invalid BIOS _PSS frequency: 0x80000000 MHz
ACPI: Invalid BIOS _PSS frequency: 0x80000000 MHz
ACPI: Invalid BIOS _PSS frequency: 0x80000000 MHz
ACPI: Invalid BIOS _PSS frequency: 0x80000000 MHz
Comment 52 Prarit Bhargava 2009-05-28 11:30:42 EDT
Cool -- thanks :)

P.
Comment 53 Prarit Bhargava 2009-05-28 11:55:05 EDT
Created attachment 345785 [details]
RHEL5 fix for this issue

Possible final patch.  I sent Len Brown some subtle patch changes that are reflected in this version.

P.
Comment 55 RHEL Product and Program Management 2009-05-28 13:10:35 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 56 Len Brown 2009-06-01 15:33:46 EDT
34d531e640cb805973cf656b15c716b961565cea
"ACPI: sanity check _PSS frequency to prevent cpufreq crash"

is shipping upstream after Linux-2.6.30-rc7-git4, so
it should be in -rc8 and 2.6.30.
Comment 57 Don Zickus 2009-06-04 12:07:47 EDT
in kernel-2.6.18-152.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 59 Klaus Ethgen 2009-06-17 07:49:21 EDT
Today there is a new kernel in RHEL5 released. But as I read in the changelog this bug is still not fixed. (not explicite tested)

As this problem is very urgent I want to ask how I can see it in RHEL5 release?
Comment 60 Prarit Bhargava 2009-06-17 08:13:05 EDT
(In reply to comment #59)
> Today there is a new kernel in RHEL5 released. But as I read in the changelog
> this bug is still not fixed. (not explicite tested)
> 
> As this problem is very urgent I want to ask how I can see it in RHEL5 release?  

Klaus, I'm not sure what you're talking about.  RHEL5.4 is not scheduled to release for a few months IIRC (It might be sooner than that).

The patch for this has been committed to 152.el5.

If you need a current RHEL5 kernel, you can get one from 

http://people.redhat.com/dzickus/el5/

P.
Comment 61 Klaus Ethgen 2009-06-17 09:46:08 EDT
Ehem, I have seen that today there was a new kernel for RHEL5 (the current release). And this kernel do not have the fix for this bug.

What I need and wat we need here is a current release with a working kernel! I see (from the messages above) that 152 has the patch but the kernel in the current release has it not so it is unusable for some of our machines here.

This bug is a big show stopper for redhat and it was introduced in the middle of a "stable" release (EL5). It is very urgent that you fix the bug as fast as possible! I do not think that the fix can wait for a release that will come in some months!
Comment 62 John Villalovos 2009-06-17 10:11:59 EDT
Klaus,

Is there any chance that your system vendor can provide you a better BIOS?  This issue is triggered by a bad BIOS ACPI implementation.  The fix that was done is to work around this BIOS bug.

The patch in comment 53 has a good description of the issue and maybe it could be forwarded to the OEM to help them fix the issue:
---------------------------------------------------------
Bogus values in the _PSS ACPI struct leads to a panic in the CPU frequency
code.

When BIOS SETUP is changed to disable EIST, some BIOS
hand the OS an un-initialized _PSS:

        Name (_PSS, Package (0x06)
        {
            Package (0x06)
            {
                0x80000000,	// frequency [MHz]
                0x80000000,	// power [mW]
                0x80000000,	// latency [us]
                0x80000000,	// BM latency [us]
                0x80000000,	// control
                0x80000000	// status
            },
	    ...

These are outrageous values for frequency,
power and latency, raising the question where to draw
the line between legal and illegal.  We tend to survive
garbage in the power and latency fields, but we can BUG_ON
when garbage is in the frequency field.
--------------------------------------------
Comment 63 Klaus Ethgen 2009-06-17 10:30:45 EDT
That might be that the BIOS is not as you like to have it. But is is fact that earlier releases of EL5 worked well and the bug was introduced just in the middle of the stable release.

It is also fact that this BIOS is part of a productive system delivered by the vendor. And for me the kernel must be able to run with productive BIOS. So from my view that is a clear kernel bug.

However, it might be that the vendor will "fix" this unlike behaviour in future. But that is out of our control.
Comment 64 Don Zickus 2009-06-17 10:35:22 EDT
Klaus,

This particular bug is not scheduled to be updated in the 5.3.z which is why the current kernel you installed did not have the fix.  If you find this bug extremely urgent to be fixed and you can not wait for 5.4, you will have to contact your Technical Account Manager (TAM) or the person from who you bought RHEL5 support from.

As supporting fixes in our z-stream has costs, we unfortunately can not backport every fix we find in 5.4 to 5.3.  Therefore we rely on a justifiable business impact to determine which fixes are backported.  Contacting the person you bought the support contract from and explaining to them your business impact would strongly help your case in getting a fix to you quickly (the next z-stream update should be in two weeks).

-Don
Comment 68 Chris Ward 2009-07-03 14:43:44 EDT
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.
Comment 72 Chris Ward 2009-07-10 15:13:12 EDT
~~ Attention Partners - RHEL 5.4 Snapshot 1 Released! ~~

RHEL 5.4 Snapshot 1 has been released on partners.redhat.com. If you have already reported your test results, you can safely ignore this request. Otherwise, please notice that there should be a fix available now that addresses this particular request. Please test and report back your results here, at your earliest convenience. The RHEL 5.4 exception freeze is quickly approaching.

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Do not flip the bug status to VERIFIED. Instead, please set your Partner ID in the Verified field above if you have successfully verified the resolution of this issue. 

Further questions can be directed to your Red Hat Partner Manager or other appropriate customer representative.
Comment 76 errata-xmlrpc 2009-09-02 05:01:16 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html
Comment 77 D. Hugh Redelmeier 2010-01-27 14:34:25 EST
The patch to fix this problem, linux-2.6-acpi-check-_pss-frequency-to-prevent-cpufreq-crash.patch, causes my HP Pavillion a530n to be stuck at the top CPU frequency.

The BIOS problem is that there are two entries with bad frequencies (0x9999999!) followed by three fine entries.

Before this patch, the bad entries were ignored (with warnings) and the good entries were accepted.  Here's an extract from dmesg:
powernow-k8: Pre-initialization of ACPI failed
powernow-k8: Found 1 AMD Athlon(tm) 64 Processor 3200+ processors (1 cpu cores) (version 2.20.00)
powernow-k8: invalid freq entries 3300000 kHz vs. 2147483048 kHz
powernow-k8: invalid freq entries 3300000 kHz vs. 2147483048 kHz
powernow-k8: 0 : fid 0xc (2000 MHz), vid 0x2
powernow-k8: 1 : fid 0xa (1800 MHz), vid 0x6
powernow-k8: 2 : fid 0x0 (800 MHz), vid 0xa
powernow-k8: ph2 null fid transition 0xc

After this patch, the existence of the bad entries means that all entries are ignored.  Here's an extract from dmesg:
powernow-k8: Pre-initialization of ACPI failed
powernow-k8: Found 1 AMD Athlon(tm) 64 Processor 3200+ processors (1 cpu cores) (version 2.20.00)
ACPI: Invalid BIOS _PSS frequency: 0x9999999 MHz
powernow-k8: BIOS error: maxvid exceeded with pstate 2

Is there any way that the bad entries can be skipped but the good ones accepted?

Before the patch, the bad entries were skipped by code in kernel-2.6.18/vanilla/arch/i386/kernel/cpu/cpufreq/powernow-k8.c and kernel-2.6.18/linux-2.6.18.x86_64/arch/i386/kernel/cpu/cpufreq/powernow-k8.c.  Look for "invalid freq entries".

Alternatively, is there a kernel parameter that I could use to bypass this problem?  There is no chance that the BIOS will be fixed at this late date.
Comment 78 D. Hugh Redelmeier 2010-01-29 14:16:55 EST
my previous comment was turned into a separate bz: https://bugzilla.redhat.com/show_bug.cgi?id=559357
Comment 79 Francesco Allara 2010-04-26 12:11:12 EDT
I would like to say that i had same problem.
I fixed it disabling overclock profile, when overclocking is on, bios does not activate "Cool'n'quiet" tecnology and _PSS object isn't available. 

hope it helps

Note You need to log in before you can comment on or make changes to this bug.