Bug 430938 - RHEL5.2: Improved Power Now! in Xen support on 2nd generation Opteron systems
RHEL5.2: Improved Power Now! in Xen support on 2nd generation Opteron systems
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.2
All Linux
high Severity high
: beta
: ---
Assigned To: Bhavna Sarathy
Martin Jenner
: OtherQA, Regression
: 431788 (view as bug list)
Depends On:
Blocks: 253746
  Show dependency treegraph
 
Reported: 2008-01-30 13:30 EST by Bhavna Sarathy
Modified: 2009-06-19 22:13 EDT (History)
7 users (show)

See Also:
Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 11:08:24 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
resync TSC extrapolation (742 bytes, patch)
2008-02-15 13:40 EST, Bhavna Sarathy
no flags Details | Diff
add vcpu lock/unlock functions (4.51 KB, patch)
2008-02-15 13:41 EST, Bhavna Sarathy
no flags Details | Diff
xen change freq hypercall (1.94 KB, patch)
2008-02-15 13:41 EST, Bhavna Sarathy
no flags Details | Diff
fix 16core xen power now (729 bytes, patch)
2008-02-15 13:42 EST, Bhavna Sarathy
no flags Details | Diff

  None (edit)
Description Bhavna Sarathy 2008-01-30 13:30:08 EST
This patch set fixes the issue that was causing time to go backwards on RevF.  

http://xenbits.xensource.com/staging/xen-unstable.hg?rev/63275fd1596a
http://xenbits.xensource.com/staging/xen-unstable.hg?rev/7327e1c2a42c
http://xenbits.xensource.com/staging/xen-unstable.hg?rev/a4bd1371196e

These fixes are still in testing, but we've seen the time regresssions go 
from 4-7 with every frequency change to 1 regression every 100+ changes 
(still counting).
Comment 1 Russell Doty 2008-01-30 13:52:25 EST
AMD - please update this BZ with the final patches after completing testing.
This is too late to go into RHEL 5.2 Beta 1, so it will need to be processed as
an exception for one of the RHEL 5.2 snapshots.
Comment 2 Bhavna Sarathy 2008-02-15 13:40:02 EST
I have submitted the patches to the mailing lists.  I'll attach them here as
well.  I delayed submitting this based on Russ's recommendation that we need to
complete testing.   The patches have been tested for 2 weeks with positive results.

Is this really kernel issue or kernel-xen?  I'll let you sort that our internally.

Bhavana
Comment 3 Bhavna Sarathy 2008-02-15 13:40:52 EST
Created attachment 295028 [details]
resync TSC extrapolation
Comment 4 Bhavna Sarathy 2008-02-15 13:41:20 EST
Created attachment 295029 [details]
add vcpu lock/unlock functions
Comment 5 Bhavna Sarathy 2008-02-15 13:41:41 EST
Created attachment 295030 [details]
xen change freq hypercall
Comment 6 Bhavna Sarathy 2008-02-15 13:42:06 EST
Created attachment 295031 [details]
fix 16core xen power now
Comment 7 Rik van Riel 2008-02-15 15:29:52 EST
These patches fix regressions introduced in 5.2, hence this bug should be
granted exception status.
Comment 11 Rik van Riel 2008-02-20 11:40:29 EST
*** Bug 431788 has been marked as a duplicate of this bug. ***
Comment 12 Chris Lalancette 2008-02-20 15:26:13 EST
I looked over the 4 posted patches for this issue.  The first 3 look OK, and
seem to be needed for the powernow stuff.  The 4th one looks like a fix for a
different issue, and is a little bit scary anyway.  I'm inclined to take the
first 3 patches and push the fourth to 5.3, unless it can be shown that it also
impacts this issue (or is significant enough to impact a lot of systems; it
would need a separate BZ then).

Chris Lalancette
Comment 14 Bhavna Sarathy 2008-02-21 09:16:36 EST
Can you explain what is scary about the 4th patch??   I'd prefer not to have a
hole in R5.2 when we know exactly what is wrong AND have a fix.

Bhavana
Comment 15 Chris Lalancette 2008-02-21 09:38:54 EST
What's scary about it is that it mucks with the acpiid to apicid mappings, which
can have unintended consequences.  Don't get me wrong; I'm not against the patch
in general.  It's just that we are now post-beta, so the amount of testing it
will have (and the amount of time we have to find any problems and fix them) is
much less now.  That's why I'm trying to get a feeling for what systems it
affects, exactly.

Chris Lalancette
Comment 16 Bhavna Sarathy 2008-02-21 11:07:56 EST
Chris, this fix affects all AMD 16 core systems.  At this point there are not
too many such users (think Rev F 8p/16c systems), though this will change with
Barcelona (4p/16c).  As you are very well aware, Xen always does their own
thing, and we are in a situation of fixing something that was fixed in bare
metal in RHEL4.5 and inherited the upstream fix (2.6.18) in RHEL5.   We have
found the issue and fixed it.  

I want to reiterate that this affects AMD systems only as the APIC ID are lifted
i.e start at 4 instead of 0.  Once the number of 16 core users increase this
will become an issue.   The patch will not create issues for you, it fixes the
issue on all 16core AMD system for Xen users.

I'll post this in response to the RHML posting as well.

BTW, I may have some slides on APIC ID lifting and can share them with you.

Bhavana
Comment 17 Don Zickus 2008-02-22 16:36:48 EST
in 2.6.18-83.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 19 John Poelstra 2008-03-20 23:59:06 EDT
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot1--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you
Comment 20 John Poelstra 2008-04-02 17:39:55 EDT
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot3--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you
Comment 21 John Poelstra 2008-04-09 18:45:38 EDT
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot4--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you
Comment 22 Frank Arnold 2008-04-14 15:55:58 EDT
Tried on two boxes:

Asus M2N-MX SE Plus (nVidia MCP61)
  - Installed RHEL 5.2 snap4
  - Refuses to work, fails even on bare metal
  - System panics if 'MCP61 ACPI HPET TABLE' is enabled in the BIOS, which
    seems to be needed to get PowerNow! working
  - Not RHEL specific

Asus M2A-MX (AMD 690V / SB600)
  - Installed RHEL 5.2 snap4
  - Modified /etc/sysconfig/cpuspeed, GOVERNOR=ondemand
  - Started 3 guests and ran some tests for 56 hours:
    1:
      Guest: suse_sles10_64b_smp
      Guest config:
        vcpus=1; memory=1024; acpi=1; apic=1; pae=1; shadow_memory=10
      Test program: LTP-full-20060412
    2:
      Guest: ms_vista_32b_up
      Guest config:
        vcpus=1; memory=1024; acpi=1; apic=1; pae=1; shadow_memory=10
      Test program: AMD WinSST 4.7.4
    3:
      Guest: redhat_rhel5u1_32bpae_smp_up
      Guest config:
        vcpus=2; memory=1024; acpi=1; apic=1; pae=1; shadow_memory=10
      Test program: CTCS 1.3.0
  - Guest 3 showed time acceleration issues, with and without frequency scaling
    - Workaround: passed kernel parameter hpet=disable to guest kernel
    - Might be related to bug 249521
  - Other guests ran stable without notable time issues
  - Started up guest #3 again and generated altering load for 5 hours

Result:
  - uptime 3 days, 3:35
  - dmesg |grep -c "Time went backwards" yields in 6
     - before the altering load test we had 5 of these entries
Comment 24 errata-xmlrpc 2008-05-21 11:08:24 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.