This patch set fixes the issue that was causing time to go backwards on RevF. http://xenbits.xensource.com/staging/xen-unstable.hg?rev/63275fd1596a http://xenbits.xensource.com/staging/xen-unstable.hg?rev/7327e1c2a42c http://xenbits.xensource.com/staging/xen-unstable.hg?rev/a4bd1371196e These fixes are still in testing, but we've seen the time regresssions go from 4-7 with every frequency change to 1 regression every 100+ changes (still counting).
AMD - please update this BZ with the final patches after completing testing. This is too late to go into RHEL 5.2 Beta 1, so it will need to be processed as an exception for one of the RHEL 5.2 snapshots.
I have submitted the patches to the mailing lists. I'll attach them here as well. I delayed submitting this based on Russ's recommendation that we need to complete testing. The patches have been tested for 2 weeks with positive results. Is this really kernel issue or kernel-xen? I'll let you sort that our internally. Bhavana
Created attachment 295028 [details] resync TSC extrapolation
Created attachment 295029 [details] add vcpu lock/unlock functions
Created attachment 295030 [details] xen change freq hypercall
Created attachment 295031 [details] fix 16core xen power now
These patches fix regressions introduced in 5.2, hence this bug should be granted exception status.
*** Bug 431788 has been marked as a duplicate of this bug. ***
I looked over the 4 posted patches for this issue. The first 3 look OK, and seem to be needed for the powernow stuff. The 4th one looks like a fix for a different issue, and is a little bit scary anyway. I'm inclined to take the first 3 patches and push the fourth to 5.3, unless it can be shown that it also impacts this issue (or is significant enough to impact a lot of systems; it would need a separate BZ then). Chris Lalancette
Can you explain what is scary about the 4th patch?? I'd prefer not to have a hole in R5.2 when we know exactly what is wrong AND have a fix. Bhavana
What's scary about it is that it mucks with the acpiid to apicid mappings, which can have unintended consequences. Don't get me wrong; I'm not against the patch in general. It's just that we are now post-beta, so the amount of testing it will have (and the amount of time we have to find any problems and fix them) is much less now. That's why I'm trying to get a feeling for what systems it affects, exactly. Chris Lalancette
Chris, this fix affects all AMD 16 core systems. At this point there are not too many such users (think Rev F 8p/16c systems), though this will change with Barcelona (4p/16c). As you are very well aware, Xen always does their own thing, and we are in a situation of fixing something that was fixed in bare metal in RHEL4.5 and inherited the upstream fix (2.6.18) in RHEL5. We have found the issue and fixed it. I want to reiterate that this affects AMD systems only as the APIC ID are lifted i.e start at 4 instead of 0. Once the number of 16 core users increase this will become an issue. The patch will not create issues for you, it fixes the issue on all 16core AMD system for Xen users. I'll post this in response to the RHML posting as well. BTW, I may have some slides on APIC ID lifting and can share them with you. Bhavana
in 2.6.18-83.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot1--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot3--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot4--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
Tried on two boxes: Asus M2N-MX SE Plus (nVidia MCP61) - Installed RHEL 5.2 snap4 - Refuses to work, fails even on bare metal - System panics if 'MCP61 ACPI HPET TABLE' is enabled in the BIOS, which seems to be needed to get PowerNow! working - Not RHEL specific Asus M2A-MX (AMD 690V / SB600) - Installed RHEL 5.2 snap4 - Modified /etc/sysconfig/cpuspeed, GOVERNOR=ondemand - Started 3 guests and ran some tests for 56 hours: 1: Guest: suse_sles10_64b_smp Guest config: vcpus=1; memory=1024; acpi=1; apic=1; pae=1; shadow_memory=10 Test program: LTP-full-20060412 2: Guest: ms_vista_32b_up Guest config: vcpus=1; memory=1024; acpi=1; apic=1; pae=1; shadow_memory=10 Test program: AMD WinSST 4.7.4 3: Guest: redhat_rhel5u1_32bpae_smp_up Guest config: vcpus=2; memory=1024; acpi=1; apic=1; pae=1; shadow_memory=10 Test program: CTCS 1.3.0 - Guest 3 showed time acceleration issues, with and without frequency scaling - Workaround: passed kernel parameter hpet=disable to guest kernel - Might be related to bug 249521 - Other guests ran stable without notable time issues - Started up guest #3 again and generated altering load for 5 hours Result: - uptime 3 days, 3:35 - dmesg |grep -c "Time went backwards" yields in 6 - before the altering load test we had 5 of these entries
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html