Bug 430938 - RHEL5.2: Improved Power Now! in Xen support on 2nd generation Opteron systems
Summary: RHEL5.2: Improved Power Now! in Xen support on 2nd generation Opteron systems
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.2
Hardware: All
OS: Linux
high
high
Target Milestone: beta
: ---
Assignee: Bhavna Sarathy
QA Contact: Martin Jenner
URL:
Whiteboard:
: 431788 (view as bug list)
Depends On:
Blocks: 253746
TreeView+ depends on / blocked
 
Reported: 2008-01-30 18:30 UTC by Bhavna Sarathy
Modified: 2009-06-20 02:13 UTC (History)
7 users (show)

Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 15:08:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
resync TSC extrapolation (742 bytes, patch)
2008-02-15 18:40 UTC, Bhavna Sarathy
no flags Details | Diff
add vcpu lock/unlock functions (4.51 KB, patch)
2008-02-15 18:41 UTC, Bhavna Sarathy
no flags Details | Diff
xen change freq hypercall (1.94 KB, patch)
2008-02-15 18:41 UTC, Bhavna Sarathy
no flags Details | Diff
fix 16core xen power now (729 bytes, patch)
2008-02-15 18:42 UTC, Bhavna Sarathy
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0314 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5.2 2008-05-20 18:43:34 UTC

Description Bhavna Sarathy 2008-01-30 18:30:08 UTC
This patch set fixes the issue that was causing time to go backwards on RevF.  

http://xenbits.xensource.com/staging/xen-unstable.hg?rev/63275fd1596a
http://xenbits.xensource.com/staging/xen-unstable.hg?rev/7327e1c2a42c
http://xenbits.xensource.com/staging/xen-unstable.hg?rev/a4bd1371196e

These fixes are still in testing, but we've seen the time regresssions go 
from 4-7 with every frequency change to 1 regression every 100+ changes 
(still counting).

Comment 1 Russell Doty 2008-01-30 18:52:25 UTC
AMD - please update this BZ with the final patches after completing testing.
This is too late to go into RHEL 5.2 Beta 1, so it will need to be processed as
an exception for one of the RHEL 5.2 snapshots.

Comment 2 Bhavna Sarathy 2008-02-15 18:40:02 UTC
I have submitted the patches to the mailing lists.  I'll attach them here as
well.  I delayed submitting this based on Russ's recommendation that we need to
complete testing.   The patches have been tested for 2 weeks with positive results.

Is this really kernel issue or kernel-xen?  I'll let you sort that our internally.

Bhavana

Comment 3 Bhavna Sarathy 2008-02-15 18:40:52 UTC
Created attachment 295028 [details]
resync TSC extrapolation

Comment 4 Bhavna Sarathy 2008-02-15 18:41:20 UTC
Created attachment 295029 [details]
add vcpu lock/unlock functions

Comment 5 Bhavna Sarathy 2008-02-15 18:41:41 UTC
Created attachment 295030 [details]
xen change freq hypercall

Comment 6 Bhavna Sarathy 2008-02-15 18:42:06 UTC
Created attachment 295031 [details]
fix 16core xen power now

Comment 7 Rik van Riel 2008-02-15 20:29:52 UTC
These patches fix regressions introduced in 5.2, hence this bug should be
granted exception status.

Comment 11 Rik van Riel 2008-02-20 16:40:29 UTC
*** Bug 431788 has been marked as a duplicate of this bug. ***

Comment 12 Chris Lalancette 2008-02-20 20:26:13 UTC
I looked over the 4 posted patches for this issue.  The first 3 look OK, and
seem to be needed for the powernow stuff.  The 4th one looks like a fix for a
different issue, and is a little bit scary anyway.  I'm inclined to take the
first 3 patches and push the fourth to 5.3, unless it can be shown that it also
impacts this issue (or is significant enough to impact a lot of systems; it
would need a separate BZ then).

Chris Lalancette

Comment 14 Bhavna Sarathy 2008-02-21 14:16:36 UTC
Can you explain what is scary about the 4th patch??   I'd prefer not to have a
hole in R5.2 when we know exactly what is wrong AND have a fix.

Bhavana

Comment 15 Chris Lalancette 2008-02-21 14:38:54 UTC
What's scary about it is that it mucks with the acpiid to apicid mappings, which
can have unintended consequences.  Don't get me wrong; I'm not against the patch
in general.  It's just that we are now post-beta, so the amount of testing it
will have (and the amount of time we have to find any problems and fix them) is
much less now.  That's why I'm trying to get a feeling for what systems it
affects, exactly.

Chris Lalancette

Comment 16 Bhavna Sarathy 2008-02-21 16:07:56 UTC
Chris, this fix affects all AMD 16 core systems.  At this point there are not
too many such users (think Rev F 8p/16c systems), though this will change with
Barcelona (4p/16c).  As you are very well aware, Xen always does their own
thing, and we are in a situation of fixing something that was fixed in bare
metal in RHEL4.5 and inherited the upstream fix (2.6.18) in RHEL5.   We have
found the issue and fixed it.  

I want to reiterate that this affects AMD systems only as the APIC ID are lifted
i.e start at 4 instead of 0.  Once the number of 16 core users increase this
will become an issue.   The patch will not create issues for you, it fixes the
issue on all 16core AMD system for Xen users.

I'll post this in response to the RHML posting as well.

BTW, I may have some slides on APIC ID lifting and can share them with you.

Bhavana

Comment 17 Don Zickus 2008-02-22 21:36:48 UTC
in 2.6.18-83.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 19 John Poelstra 2008-03-21 03:59:06 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot1--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you

Comment 20 John Poelstra 2008-04-02 21:39:55 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot3--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you


Comment 21 John Poelstra 2008-04-09 22:45:38 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot4--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you


Comment 22 Frank Arnold 2008-04-14 19:55:58 UTC
Tried on two boxes:

Asus M2N-MX SE Plus (nVidia MCP61)
  - Installed RHEL 5.2 snap4
  - Refuses to work, fails even on bare metal
  - System panics if 'MCP61 ACPI HPET TABLE' is enabled in the BIOS, which
    seems to be needed to get PowerNow! working
  - Not RHEL specific

Asus M2A-MX (AMD 690V / SB600)
  - Installed RHEL 5.2 snap4
  - Modified /etc/sysconfig/cpuspeed, GOVERNOR=ondemand
  - Started 3 guests and ran some tests for 56 hours:
    1:
      Guest: suse_sles10_64b_smp
      Guest config:
        vcpus=1; memory=1024; acpi=1; apic=1; pae=1; shadow_memory=10
      Test program: LTP-full-20060412
    2:
      Guest: ms_vista_32b_up
      Guest config:
        vcpus=1; memory=1024; acpi=1; apic=1; pae=1; shadow_memory=10
      Test program: AMD WinSST 4.7.4
    3:
      Guest: redhat_rhel5u1_32bpae_smp_up
      Guest config:
        vcpus=2; memory=1024; acpi=1; apic=1; pae=1; shadow_memory=10
      Test program: CTCS 1.3.0
  - Guest 3 showed time acceleration issues, with and without frequency scaling
    - Workaround: passed kernel parameter hpet=disable to guest kernel
    - Might be related to bug 249521
  - Other guests ran stable without notable time issues
  - Started up guest #3 again and generated altering load for 5 hours

Result:
  - uptime 3 days, 3:35
  - dmesg |grep -c "Time went backwards" yields in 6
     - before the altering load test we had 5 of these entries

Comment 24 errata-xmlrpc 2008-05-21 15:08:24 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html



Note You need to log in before you can comment on or make changes to this bug.