Bug 222973 - [RHEL5 Xen]: Re-enable the hypervisor pm-timer?
[RHEL5 Xen]: Re-enable the hypervisor pm-timer?
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.3
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Xen Maintainance List
Red Hat Kernel QE team
:
: 445363 551153 (view as bug list)
Depends On:
Blocks: 514491 563345
  Show dependency treegraph
 
Reported: 2007-01-17 03:37 EST by John Lau
Modified: 2010-10-22 03:54 EDT (History)
18 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-06-23 06:17:33 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
My sysreport (441.32 KB, application/x-bzip2)
2007-01-17 03:44 EST, John Lau
no flags Details
cpuinfo from 2.6.23.1-42.fc8 non-xen kernel (1.10 KB, text/plain)
2008-04-23 00:46 EDT, John Lau
no flags Details
dmesg and xm dmesg of 2.6.18-90.el5xen and 2.6.21.7-3.fc8xen kernel (11.28 KB, application/x-gzip)
2008-04-23 01:31 EDT, John Lau
no flags Details

  None (edit)
Description John Lau 2007-01-17 03:37:23 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.8) Gecko/20061105 Red Hat/1.5.0.8-0.1.el4 Firefox/1.5.0.8 pango-text

Description of problem:
When I boot the xen kernel 2.6.18-1.2961.el5xen, there are many printk message console: 

Jan 17 18:59:27 localhost kernel: Timer ISR/0: Time went backwards: delta=-14251821 delta_cpu=69748179 shadow=14423492596895 off=785179739 processed=14424292000000 cpu_processed=14424208000000
Jan 17 18:59:27 localhost kernel:  0: 14424208000000
Jan 17 18:59:27 localhost kernel:  1: 14424292000000
Jan 17 18:59:32 localhost kernel: printk: 55 messages suppressed.
Jan 17 18:59:32 localhost kernel: Timer ISR/1: Time went backwards: delta=-57603086 delta_cpu=82396914 shadow=14428863196453 off=423229749 processed=14429344000000 cpu_processed=14429204000000
Jan 17 18:59:32 localhost kernel:  0: 14429344000000
Jan 17 18:59:32 localhost kernel:  1: 14429204000000

These messages shows once in every seconds. And the system clock run about 3 times faster then normal. 

The same problem happened when I install FC6 xen kernel.

My machines is a ASUS A8F series laptop, with T2060 1.6G Centrino Duo CPU.

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-1.2961.el5

How reproducible:
Always


Steps to Reproduce:
1. Install RHEL5 beta 2 
2. Boot with xen kernel
3. Wait a few seconds

Actual Results:
There are many printk message console (the numbers are different every time): 

Jan 17 18:59:27 localhost kernel: Timer ISR/0: Time went backwards: delta=-14251821 delta_cpu=69748179 shadow=14423492596895 off=785179739 processed=14424292000000 cpu_processed=14424208000000
Jan 17 18:59:27 localhost kernel:  0: 14424208000000
Jan 17 18:59:27 localhost kernel:  1: 14424292000000
Jan 17 18:59:32 localhost kernel: printk: 55 messages suppressed.
Jan 17 18:59:32 localhost kernel: Timer ISR/1: Time went backwards: delta=-57603086 delta_cpu=82396914 shadow=14428863196453 off=423229749 processed=14429344000000 cpu_processed=14429204000000
Jan 17 18:59:32 localhost kernel:  0: 14429344000000
Jan 17 18:59:32 localhost kernel:  1: 14429204000000

Expected Results:
No such messages. And system clock run correctly.

Additional info:
May be similar to this: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=195
Comment 1 John Lau 2007-01-17 03:44:38 EST
Created attachment 145779 [details]
My sysreport

It is the sysreport of my laptop. Don't know if it is useful.
Comment 2 Steven Rostedt 2007-01-22 12:22:43 EST
Could you try the latest release (>3002) to see if it makes a difference.
Comment 3 John Lau 2007-01-23 23:20:38 EST
I update the xen kernel to a nightly build 20070120.   The kernel version
kernel-xen-2.6.18-4.el5xen. The problem still exist. 

[root@localhost Server]# rpm -qpi kernel-xen-2.6.18-4.el5.i686.rpm 
warning: kernel-xen-2.6.18-4.el5.i686.rpm: Header V3 DSA signature: NOKEY, key
ID 37017186
Name        : kernel-xen                   Relocations: (not relocatable)
Version     : 2.6.18                            Vendor: Red Hat, Inc.
Release     : 4.el5                         Build Date: Thu 18 Jan 2007 12:50:41
PM HKT
Install Date: (not installed)               Build Host: ls20-bc1-14.build.redhat.com
Group       : System Environment/Kernel     Source RPM: kernel-2.6.18-4.el5.src.rpm
Size        : 36768128                         License: GPLv2
Signature   : DSA/SHA1, Thu 18 Jan 2007 03:36:19 PM HKT, Key ID 5326810137017186
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Summary     : The Linux kernel compiled for Xen VM operations
Description :
This package includes a version of the Linux kernel which
runs in Xen VM. It works for both priviledged and unpriviledged guests.
[root@localhost Server]# uname -a
Linux localhost.localdomain 2.6.18-4.el5xen #1 SMP Wed Jan 17 23:31:02 EST 2007
i686 i686 i386 GNU/Linux
Comment 4 Red Hat Bugzilla 2007-07-24 20:31:55 EDT
change QA contact
Comment 5 James Laska 2007-10-23 16:10:58 EDT
Seeing this on RHEL5.1-Server-20071017.0 (kernel-xen-2.6.18-53.el5,
xen-3.0.3-41.el5) on an Intel Bensley (quad-core harpertown).

[ cpu flags ]
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush
dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor
ds_cpl vmx est tm2 cx16 xtpr lahf_lm

[ /var/log/messages output ]
Oct 23 16:09:24 localhost kernel: printk: 88 messages suppressed.
Oct 23 16:09:24 localhost kernel: Timer ISR/1: Time went backwards:
delta=-16292353 delta_cpu=11707647 shadow=12259248375707 off=431476910
processed=12259696000000 cpu_processed=12259668000000
Oct 23 16:09:24 localhost kernel:  0: 12259692000000
Oct 23 16:09:24 localhost kernel:  1: 12259668000000
Oct 23 16:09:24 localhost kernel: Timer ISR/1: Time went backwards:
delta=-16968342 delta_cpu=11031658 shadow=12259248375707 off=502656161
processed=12259768000000 cpu_processed=12259740000000
Oct 23 16:09:24 localhost kernel:  0: 12259764000000
Oct 23 16:09:24 localhost kernel:  1: 12259740000000
Oct 23 16:09:24 localhost kernel: Timer ISR/1: Time went backwards:
delta=-14239850 delta_cpu=13760150 shadow=12259248375707 off=601385025
processed=12259864000000 cpu_processed=12259836000000
Oct 23 16:09:24 localhost kernel:  0: 12259860000000
Oct 23 16:09:24 localhost kernel:  1: 12259836000000
Oct 23 16:09:24 localhost kernel: Timer ISR/1: Time went backwards:
delta=-12237891 delta_cpu=11762109 shadow=12259248375707 off=611386633
processed=12259872000000 cpu_processed=12259848000000
Oct 23 16:09:24 localhost kernel:  0: 12259872000000
Oct 23 16:09:24 localhost kernel:  1: 12259848000000
Comment 6 RHEL Product and Program Management 2007-10-23 16:34:43 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 7 Chris Lalancette 2008-02-25 00:01:41 EST
OK.  It looks like this problem still persists through to 5.1....any chance you
can try out 5.2 on these boxes and see if there is any difference?  There have
been a few timer related fixes that went in, but I'm not sure if they will help
or not.  If 5.2 does *not* help, please get the output from "xm dmesg" and
"dmesg" on the dom0, and put it here as attachments.

Thanks,
Chris Lalancette
Comment 8 RHEL Product and Program Management 2008-03-11 15:46:17 EDT
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.
Comment 9 Bill Burns 2008-04-16 21:13:45 EDT
Several timer fixes have been made in RHEL 5.2 and this problem should be fixed.
Can you please try the -90 kernel to verify? Thanks.
Comment 10 John Lau 2008-04-22 12:04:23 EDT
I try -90 kernel. The result is interesting that the system clock run about 2
times SLOWER then normal this time. And similar messages are printed:

Apr 22 23:21:39 localhost kernel: Warning Timer ISR/0: Time went backwards:
delta=-25347748 delta_cpu=26652252 shadow=80668539637 off=278114507
processed=80972000000 cpu_processed=80920000000
Apr 22 23:21:39 localhost kernel:  0: 80920000000
Apr 22 23:21:39 localhost kernel:  1: 80972000000
Apr 22 23:21:44 localhost kernel: printk: 241 messages suppressed.
Apr 22 23:21:44 localhost kernel: Warning Timer ISR/1: Time went backwards:
delta=-16039717 delta_cpu=95960283 shadow=85611897209 off=576064630
processed=86204000000 cpu_processed=86092000000
Apr 22 23:21:44 localhost kernel:  0: 86204000000
Apr 22 23:21:44 localhost kernel:  1: 86092000000
Apr 22 23:21:50 localhost kernel: printk: 107 messages suppressed.

FYI, I try using Fedora kernel 2.6.21-2957.fc8xen before but it suffered from
the "system clock 3 times faster" problem. But recently I try 2.6.21.7-3.fc8xen
and the problem seems fixed, the system clock run normally.

In the boot process of the -90 kernel, I found this line:

Apr 22 23:20:36 localhost kernel: Calibrating delay using timer specific
routine.. 3991.91 BogoMIPS (lpj=7983834)

while in the boot process of 2.6.21.7-3.fc8xen kernel, I found these 2 lines:

Apr 22 23:23:16 localhost kernel: Calibrating delay using timer specific
routine.. 3194.11 BogoMIPS (lpj=1597057)
<snip>
Apr 22 23:23:16 localhost kernel: Initializing CPU#1
Apr 22 23:23:16 localhost kernel: Calibrating delay using timer specific
routine.. 3191.58 BogoMIPS (lpj=1595790)

I am not sure if it is related this problem.
Comment 11 Chris Lalancette 2008-04-22 12:52:08 EDT
Oh yeah, that is a big problem; being 800MHz off will cause all kinds of havoc.
 And that might also be related to another problem we saw calibrating the
timers.  2 questions:

1)  Can you upload the full dmesg for the 2.6.18-90.el5xen kernel and the
2.6.21.7-3.fc8xen kernel, as well as the "xm dmesg" from both?

2)  Do you know what the actual MHz of the machine is supposed to be?  I'm
assuming it should be around 3.2GHz, but confirming that might be good.  You can
either go by the hardware description, or boot a bare-metal (non-xen) kernel and
get the output from /proc/cpuinfo

Thanks,
Chris Lalancette
Comment 12 John Lau 2008-04-23 00:46:42 EDT
Created attachment 303423 [details]
cpuinfo from 2.6.23.1-42.fc8 non-xen kernel
Comment 13 John Lau 2008-04-23 01:31:18 EDT
Created attachment 303433 [details]
dmesg and xm dmesg of 2.6.18-90.el5xen and 2.6.21.7-3.fc8xen kernel
Comment 14 Chris Lalancette 2008-10-14 12:16:25 EDT
Took another quick look at this.  My guess is that the difference in behavior between the F-8 kernel and the RHEL-5 kernel has to do with the fact that the F-8 kernel you are using is choosing the ACPI PM Timer, while RHEL-5 is choosing the PIT timer.  Now, at one point we tried to turn on the ACPI PM Timer in RHEL-5 as well, but we found that it caused some problems on certain boxes.  It's something we'll want to investigate again, but until we fix those problems we can't turn it on in RHEL-5.

Chris Lalancette
Comment 19 Chris Lalancette 2009-07-29 07:40:36 EDT
*** Bug 445363 has been marked as a duplicate of this bug. ***
Comment 23 Thomas 2010-02-07 18:41:08 EST
Is there a workaround for this issue? I am seeing it on all of my 32-bit servers and it is creating an issue with Likewise Enterprise--the clock skew is preventing some Active Directory-related processes from operating correctly.
Comment 24 Thomas 2010-02-07 18:46:10 EST
Another note of relevance... I am seeing this issue on 5.4.
Comment 26 Andrew Jones 2010-02-08 11:26:49 EST
(In reply to comment #23)
> Is there a workaround for this issue? I am seeing it on all of my 32-bit
> servers and it is creating an issue with Likewise Enterprise--the clock skew is
> preventing some Active Directory-related processes from operating correctly.    

Hi Thomas,

I'm sorry there's no work-around that I know of at this time. Since the time of the reporting of this bug (over 3 years ago), it's very rarely been seen. I've only seen it once and that was a machine that had flakey hardware. I saw lots of MCEs in /var/log/messages and when I ran the mcelog command periodically on a bare-metal kernel. Is your system behaving ok other than this problem? What exactly do you have for hardware?

Andrew
Comment 30 Thomas 2010-02-16 13:19:08 EST
Both servers are operating correctly otherwise, and both are Dell PowerEdge 2600s.

I updated the BIOS and some other firmware on both to rule out problems there, the problem reoccurred on both after the updates.

Interestingly, this issue doesn't occur when ntpd is disabled. As a workaround, I disabled ntpd and added a cronjob to run ntpdate every 4 hours--the time is never more than milliseconds off.
Comment 32 Paolo Bonzini 2010-02-23 10:06:22 EST
*** Bug 551153 has been marked as a duplicate of this bug. ***
Comment 34 Andrew Jones 2010-06-22 13:08:05 EDT
Has an interested parties been able to reproduce this lately? If not, then I would like to close it out until we can.
Comment 37 Andrew Jones 2010-06-23 06:17:33 EDT
Closing as insufficient data, since we don't have a reliable reproducer. If we cross it's path again, and can reproduce it, then we can reopen the bug at that time.
Comment 38 Chris Lalancette 2010-07-19 09:58:31 EDT
Clearing out old flags for reporting purposes.

Chris Lalancette

Note You need to log in before you can comment on or make changes to this bug.