Bug 159733 - Time source instable
Time source instable
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Brian Maly
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-06-07 12:19 EDT by EE CAP Admin
Modified: 2008-07-15 15:39 EDT (History)
4 users (show)

See Also:
Fixed In Version: RHBA-2007-0304
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-01 18:58:11 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to resolve the issue (483 bytes, patch)
2006-09-14 15:01 EDT, Brian Maly
no flags Details | Diff

  None (edit)
Description EE CAP Admin 2005-06-07 12:19:26 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)

Description of problem:
We are running RHEL4-x86_64 on our HP Proliant DL360 server. After we installed the HP support pack, it comes up the error message as following everytime we boot the machine.

"Your time source seems to be instable or some driver is hogging interupts".

Version-Release number of selected component (if applicable):
 kernel-2.6.9-5.0.5.ELsmp

How reproducible:
Always

Steps to Reproduce:
1. boot the machine
2. the error message comes up
3.
  

Additional info:
Comment 2 EE CAP Admin 2005-06-13 12:11:20 EDT
We have contacted HP for this issue and their response is as following,
"
The developers indicate that there is a difference between RHEL 3 and RHEL 4 
that is the cause of this problem.  When a system management Interrupt is 
issued (SMI, and industry-standard method of internal Communications, in the 
case of the DL380G4 it is supplied by the Intel Chipset), clock ticks are 
suspended.  In RHEL 4 they changed the

Sampling rate to 1000 and when 100 ticks are missed it issues a warning.

Previously it was the other way around with samples at 100 and if 1000 ticks 
were missed it would alert.  Essentially, they messed up and reversed the 
values. 

I think it would be better if you could contact Red hat, so that they would 
help you resolve it.
"

Could you please do us a favour and have a look at this issue? Many thanks.
Comment 4 Jason Baron 2005-06-21 17:01:29 EDT
ok, so that means previously that we got the alert if 10 seconds were missed and
now we get it if 1/10 of a second is missed. 10 Seconds seems like awfully long
time to go without an interrrupt. If this message only occurs during bootup, I
doubt it causes an problems. If its the case that this only occurs during
bootup, we might want to look into disabling this during bootup time.
Comment 5 EE CAP Admin 2005-06-21 17:11:15 EDT
We believe it is only occuring during boot up - yes - it might be nice to
disable that message during boot.

We were just surprised to see an error on a piece of 100% HP/redhat certified
hardware.

Thanks.
Comment 6 Jason Baron 2006-03-16 16:42:18 EST
hmmm, does the latest kernel make any difference...there have been numerous
x86_64 related time keeping fixes.
Comment 7 Brian Maly 2006-09-14 14:45:55 EDT
This is a duplicate of Bug 170043
Comment 8 Brian Maly 2006-09-14 15:01:38 EDT
Created attachment 136295 [details]
patch to resolve the issue

HZ was changed to 1000 in the RHEL4 kernel, this patch sets lost count
threshold accordingly and resolves the issue.
Comment 11 RHEL Product and Program Management 2006-10-11 11:05:38 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 12 Jay Turner 2006-10-17 10:20:52 EDT
QE ack for RHEL4.5.
Comment 13 Jason Baron 2006-12-14 21:30:20 EST
committed in stream U5 build 42.32. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 17 Red Hat Bugzilla 2007-05-01 18:58:11 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html
Comment 18 Sylvia 2007-07-29 10:17:23 EDT
(In reply to comment #4)

 The message can occur at any time, not only at bootup. Here are few lines form
 /var/log/messages:

 Jul 18 00:02:51 rac-oracle1 sshd(pam_unix)[3771]: session closed for
 user oracle
 Jul 18 00:05:06 rac-oracle1 kernel: warning: many lost ticks.
 Jul 18 00:05:06 rac-oracle1 kernel: Your time source seems to be
 instable or some driver is hogging interupts
 Jul 18 00:05:06 rac-oracle1 kernel: rip __do_softirq+0x4d/0xd0
 Jul 18 00:07:49 rac-oracle1 sshd(pam_unix)[11724]: session opened for
 user oracle by (uid=0)
 Jul 18 00:07:49 rac-oracle1 sshd(pam_unix)[11724]: session closed for
 user oracle

 And on the top, RHEL4 U5 does not resolve the problem. 
Comment 20 Brian Maly 2008-07-15 15:39:02 EDT
This is likely not a timekeeping problem. It indicates that some driver may be
buggy or that platform SMI's (at the hardware level) are long enough to cause a
message to get printed.

Please check this system for time skew. It the system keeps time then this
problem is not a major concern. We can address the timeskew in a new BZ if needed. 

We may consider changing this message not to print by default, and only enable
by a boot arg used for debugging,

Note You need to log in before you can comment on or make changes to this bug.