From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322) Description of problem: We are running RHEL4-x86_64 on our HP Proliant DL360 server. After we installed the HP support pack, it comes up the error message as following everytime we boot the machine. "Your time source seems to be instable or some driver is hogging interupts". Version-Release number of selected component (if applicable): kernel-2.6.9-5.0.5.ELsmp How reproducible: Always Steps to Reproduce: 1. boot the machine 2. the error message comes up 3. Additional info:
We have contacted HP for this issue and their response is as following, " The developers indicate that there is a difference between RHEL 3 and RHEL 4 that is the cause of this problem. When a system management Interrupt is issued (SMI, and industry-standard method of internal Communications, in the case of the DL380G4 it is supplied by the Intel Chipset), clock ticks are suspended. In RHEL 4 they changed the Sampling rate to 1000 and when 100 ticks are missed it issues a warning. Previously it was the other way around with samples at 100 and if 1000 ticks were missed it would alert. Essentially, they messed up and reversed the values. I think it would be better if you could contact Red hat, so that they would help you resolve it. " Could you please do us a favour and have a look at this issue? Many thanks.
ok, so that means previously that we got the alert if 10 seconds were missed and now we get it if 1/10 of a second is missed. 10 Seconds seems like awfully long time to go without an interrrupt. If this message only occurs during bootup, I doubt it causes an problems. If its the case that this only occurs during bootup, we might want to look into disabling this during bootup time.
We believe it is only occuring during boot up - yes - it might be nice to disable that message during boot. We were just surprised to see an error on a piece of 100% HP/redhat certified hardware. Thanks.
hmmm, does the latest kernel make any difference...there have been numerous x86_64 related time keeping fixes.
This is a duplicate of Bug 170043
Created attachment 136295 [details] patch to resolve the issue HZ was changed to 1000 in the RHEL4 kernel, this patch sets lost count threshold accordingly and resolves the issue.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
QE ack for RHEL4.5.
committed in stream U5 build 42.32. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html
(In reply to comment #4) The message can occur at any time, not only at bootup. Here are few lines form /var/log/messages: Jul 18 00:02:51 rac-oracle1 sshd(pam_unix)[3771]: session closed for user oracle Jul 18 00:05:06 rac-oracle1 kernel: warning: many lost ticks. Jul 18 00:05:06 rac-oracle1 kernel: Your time source seems to be instable or some driver is hogging interupts Jul 18 00:05:06 rac-oracle1 kernel: rip __do_softirq+0x4d/0xd0 Jul 18 00:07:49 rac-oracle1 sshd(pam_unix)[11724]: session opened for user oracle by (uid=0) Jul 18 00:07:49 rac-oracle1 sshd(pam_unix)[11724]: session closed for user oracle And on the top, RHEL4 U5 does not resolve the problem.
This is likely not a timekeeping problem. It indicates that some driver may be buggy or that platform SMI's (at the hardware level) are long enough to cause a message to get printed. Please check this system for time skew. It the system keeps time then this problem is not a major concern. We can address the timeskew in a new BZ if needed. We may consider changing this message not to print by default, and only enable by a boot arg used for debugging,