4826 – Timed regularly crashes system

Bug 4826 - Timed regularly crashes system

Summary: Timed regularly crashes system

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	timed
Sub Component:
Version:	6.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Preston Brown
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	1999-09-01 14:06 UTC by swietanowski
Modified:	2008-05-01 15:37 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	1999-12-22 15:09:21 UTC
Embargoed:

Attachments	(Terms of Use)

Description swietanowski 1999-09-01 14:06:05 UTC

I set up a small cluster. The master makes NFS shares available. For
normal operation of e.g. make on the NFS share, the time needs to
be very tightly synchronized.

I start timed as a master on the master PC and as a slave on each
slave:
(master) in /etc/rc.d/rc.local I added
    /usr/sbin/timed -M -n cluster.priv -t
(slave) I created a script /etc/rc.d/init.d/timed which issues
    daemon timed -t
  and is called via appropriate symbolic links.

After periods of long operation of the master system (days or even
weeks), timed on master appears to go bonkers (a technical term). On
different occasions I observed the following phenomena:
(a) master system time slows down or even goes backwards (recently
     a few days in just a few minutes),
(b) timed uses up a lot of processor time (50%) until it is killed, (happened
     once -- this is how I figured out who may be the culprit),
(c) the system becomes unstable and requires reboot (always).

As for (c) above, different things are happening on different occasions:
(a) X started displaying something like a TV signal between the
     stations -- like modulated white noise, I had to kill it, kill timed and
     reboot (this was on the same occasion that timed was taking up 50%
     of CPU),
(b) unrelated applications (e.g. F95 compiler) start behaving strangely,
(c) other things which I can't recall now.

The problem *may* be related to manual or automatic adjustments of
time: once a day a cron process adjusts the time on the master
with an external clock (much more accurate):
   /usr/bin/rdate -sp aurora

Comment 1 Jeff Johnson 1999-09-01 18:57:59 UTC

Here's a workaround for your timed problems:

Try using xntpd, it handles time warps much more gracefully than
timed.

The hardest part about xntp is choosing a server. See the url or
/usr/doc/xntp* for guidance there.

Then on the local server, you will need to do
	chkdonfig --add xntpd
	echo your_xntp_server > /etc/ntp/step-tickers
Add to /etc/ntp.conf
	server your_xntp_server

If you can, enable multicast by commenting out multicastclient and
adding broadcast as below:

driftfile /etc/ntp/drift
#multicastclient                        # listen on default 224.0.1.1
broadcast       224.0.1.1       key 65535
broadcastdelay  0.008

On each client, then do
	chkconfig --add xntpd
	echo the_local_server > /etc/ntp/step-tickers

Restart the antpd daemons of local server and clients, wait 15
minutes,
and look us "ntpq -p" to verify operation (an asterisk precedes the
server that is currently being used for synchronization).

If you do the above, all your clients will initially set the clock
from your_local_server using ntpdate, and then receive synchronization
from multicast packages.

Comment 2 Jeff Johnson 1999-12-22 15:09:59 UTC

This problem appears resolved. Please reopen if I'm wrong.

Note You need to log in before you can comment on or make changes to this bug.