Red Hat Bugzilla – Bug 170105
hourly lost responses to control messages
Last modified: 2013-02-14 07:13:54 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de-DE; rv:1.7.7) Gecko/20050414 Firefox/1.0.3
Description of problem:
The following was observed on the RHEL3U4 platform running ntp-4.1.2-4.EL3.1.
We have a (middleware) program that queries ntpd every 5 seconds with a mode 6 control message to read variable "peer" of the local host. The program issues a receive timeout if no answer is received within 3s.
This program showed receive timeouts happening at exactly the same minute and second every hour. Sniffing the network, I saw that indeed ntpd did not answer the request at those times.
I also simulated our program with a shell script loop of the command
ntpq -c "debug more" -c "debug more" -c "debug more" -c "debug more" -c "debug more" -c "timeout 1500" -c "timeout" -c "addvars peer" -c "rl" localhost
This also showed that at hourly intervals, the request was not immediately answered but was retried.
There were no syslog entries written by ntpd at these times, and looking at it with "ntpq -p" also showed no change in its peers.
Deactivating the drift file generation by commenting out the line
resolved the problem - no more unanswered requests.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. start ntpd with driftfile activated in ntp.conf
2. send control request to read the "peer" variable in a fast loop with low timeout
3. observe timeout (due to no response) at regular hourly intervals
Whether it is observed at a particular hour depends (I suppose on the relative timing of the requests and whatever is keeping ntpd busy).
Actual Results: Almost every hour, a response from ntpd to a control request was missing at a particular time (roughly accurate to the second).
Expected Results: The control request should be answered.
Hardware is HP Proliant DL380G4.
Contents of ntp.conf:
server ntpsrv1 version 4 iburst
server ntpsrv2 version 4 iburst
Product Management has reviewed and declined this request. You may appeal this
decision by reopening this request.
As we have to prioritize the amount of change we introduce into a RHEL release
and given the fact that RHEL3 is in transition from an extended Full Support
phase (ended with U8) to the Maintenance phase, we have very strict inclusion
criteria for 3.9.
As this problem is not considered to have a high impact on production systems
and an easy workaround is available we do not think that it meets these criteria.
If you disagree, please re-open the case via Red Hat support.
Please also note that Bugzilla is a development tool, not a Support tool.
Therefor not all comments (especially process related ones) are publicly
visible. Detailed closure reasons for example often are only communicated by the
Red Hat support organization. Therefor we ask our customers to file requests
important for their production systems via our Support service. Only then we can
ensure a consitent communication.