Bug 168255

Summary: powernow clock instability: warning: many lost ticks.
Product: Red Hat Enterprise Linux 4 Reporter: Christopher P Johnson <christopher.p.johnson>
Component: kernelAssignee: Brian Maly <bmaly>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: andriusb, brett.morrow, jbaron, milan.kerslager, tao
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-29 14:29:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
program to print time jumps
none
Results showing time jumps none

Description Christopher P Johnson 2005-09-14 00:35:21 UTC
Description of problem:

When enabling AMD PowerNow on x86-64 platforms w/out bios HPET timer
support, clock jumps forward and back several seconds or more.

Presumably clock code isn't prepared for cpu frequency changes.

Version-Release number of selected component (if applicable):

rhel4, rhel4-u1 x86-64

How reproducible:

Enable PowerNow. Run the system with variable load/and idle. clock will
begin to jump forward and back (symptoms: Warning: many ticks lost
message, screen saver starting after a few seconds, scsi timeout
messages). See attached program which will print large time jumps.

System is a Sun Fire X4100 (x86-64 system which does not provide bios
HPET support. Note that the problem appears masked on systems with
HPET).
  
Actual results:

Bad time.

Expected results:

System time should advance smoothly.

Additional info:

Comment 1 Christopher P Johnson 2005-09-14 00:35:21 UTC
Created attachment 118780 [details]
program to print time jumps

Comment 2 Christopher P Johnson 2005-09-14 00:37:15 UTC
Created attachment 118781 [details]
Results showing time jumps

Comment 3 Andrius Benokraitis 2005-09-14 04:14:50 UTC
This is a known issue and a fix has been proposed for (late) inclusion in RHEL4
U2. It is not known if this fix is confirmed to go into RHEL4 U2 or U3 due to
testing and QA timelines. Are you willing to assist in testing a beta kernel if
possible?

Comment 4 Andrius Benokraitis 2005-09-14 04:55:35 UTC

*** This bug has been marked as a duplicate of 158847 ***

Comment 5 Milan Kerslager 2005-09-23 09:06:41 UTC
This bug is marked as duplicate of private bug #158847 so reopenning this one.

I have BIOS only with AMD Colin'n'Quiet and this make no sense to lost tiks.
I've got these messages from the kernel (just for search):

Losing some ticks... checking if CPU frequency changed.
...
warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
rip default_idle+0x20/0x23

I'm able to test beta kernel... I'll try to use the one from U2 Beta channel. Or
is there another kernel in place for testing?

Comment 6 Jason Baron 2005-09-23 17:52:21 UTC
no that's the one.

Comment 7 Brett Morrow 2005-09-28 19:20:04 UTC
I have the same problem, and I do have the option to turn off PowerNow, but that
does not help any.  I still get the problem and the errors.  

(With the released WS4 and Beta WS4)


Comment 8 Brian Maly 2005-09-28 19:26:03 UTC
which timesource is being used? (PMTimer, TSC, PIT)

do a "dmesg | grep time.c"

PMTimer is the preferred timekeeing mechanism on AMD systems. 


Comment 9 Milan Kerslager 2005-09-28 21:15:12 UTC
Here: time.c: Using PIT/TSC based timekeeping (2.6.9-11.ELsmp, x86_64 kernel).

Comment 10 Brett Morrow 2005-09-28 21:54:23 UTC
dmesg | grep time.c
time.c: Using 1.193182 MHz PIT timer.
time.c: Detected 2600.048 MHz processor.

uname -a
Linux ori.protect.nssl 2.6.9-17.EL #1 Fri Aug 26 10:54:28 EDT 2005 x86_64 x86_64
x86_64 GNU/Linux


Comment 11 Milan Kerslager 2005-09-29 09:47:41 UTC
With 2.6.9-17.ELsmp it seems to stop loosing the ticks. It seems to solve the
problem completly here (kernel-smp-2.6.9-17.EL.x86_64 still detect PIT/TSC based
timekeeping). Thank you!

Comment 12 Brian Maly 2005-09-29 14:29:50 UTC
Its worth mentioning that dual core SMP powernow support made it into the U2
kernel rather late. This included a PowernNow driver update for dual core SMP
support, as well as a handfull of timer fixes for PMtimer, HPET, TSC and PIT.
You likely need a 2.6.9-22.EL kernel to solve all powernow and/or timekeeping
issues. The 2.6.9-17.EL did not include these timer mods. Also, the timer code
works differently using SMP. 

BTW, feedback regarding the 2.6.9-22.EL kernel (and if it does or does not solve
the problem) would be of help.



Comment 13 Brett Morrow 2005-09-29 17:39:49 UTC
Running kernel 2.6.9-22.ELsmp with powernow turned on.  I still get the messages:

kernel: warning: many lost ticks.
Sep 29 12:33:37 ori kernel: Your time source seems to be instable or some driver
is hogging interupts
Sep 29 12:33:37 ori kernel: rip __do_softirq+0x4d/0xd0


(although, not so frequent.)



Comment 14 Brian Maly 2005-09-29 18:11:47 UTC
does the system keep good time with the .22 kernel regardless of the "lost tick"
message?

Usually the "many lost tick" message is a symptom of another problem. This
message is thrown when the linux kernel corrects for lost ticks (which is what
you want to happen). Point being, an occasional "many lost tick" message is
probably not much to worry about, but if this problem re-occurs very often its a
concern.

Comment 15 Milan Kerslager 2005-09-29 20:47:24 UTC
Well. I have no access to .22 kernel. Even that I have disabled Colin'n'Quiet
(IIRC the name) for all the time and there is no PowerNow setting in the BIOS.
And .17 kernel completly avoid 'lost ticks' messages in my kernel log (even I
have no load on this machine in the current time as its placement has been
postponed due the bugs) and .11 has always the problem (a minute or so after the
boot there is messages about lost ticks in the kernel log). I have no physical
access to the machine at the present time :-(

I'm waiting for U2 but if I'll have an access to testing .22, I'm able to
reboot&test (i'll try to generate some load for .17 though).

AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ on 2.6.9-17.ELsmp x86_64, ASUS A8N-E.

Comment 16 Brett Morrow 2005-09-29 21:26:33 UTC
Yes, the system does keep the proper time.  I wanted to make sure by giving it
several hours to go wrong if it was going to happen.  

Here is the link to the kernel I am using:

http://people.redhat.com/~jbaron/rhel4/

Comment 17 Milan Kerslager 2005-09-30 09:12:39 UTC
After the move from .17kernel to the .22 kernel from URL above, the PIT/TSC
based timekeeping has changed to PM based. No lost ticks yet so all went ok
since .17 here. The bug could be closed now or just after the U2 release (how
long will this take?).

time.c: Using PM based timekeeping.