Bug 1121266 - oo-diagnostics should check that a node's clock is synchronized with the broker's
Summary: oo-diagnostics should check that a node's clock is synchronized with the brok...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 2.1.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Miciah Dashiel Butler Masters
QA Contact: libra bugs
URL:
Whiteboard:
Depends On: 1121267 1122194
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-18 19:52 UTC by Miciah Dashiel Butler Masters
Modified: 2014-08-04 13:28 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The oo-diagnostics script did not check whether a node host's clock was in sync with the associated broker host's clock. MCollective ignores messages where the sender's timestamps on its messages are more than 60 seconds behind the recipient's clock at the time it receives the message, and communications between the broker and node hosts could be lost. This bug fix updates the oo-diagnostics script to add the test_node_clock_in_synch_with_broker check, which sends an HTTP request to the broker (as specified by the BROKER_HOST parameter in the /etc/openshift/node.conf file) and compares the time in the "Date:" header in the response with the node host's clock. As a result, the oo-diagnostics script now warns if the clocks are out of sync by five or more seconds, and it fails if the clocks are out of sync by 55 or more seconds.
Clone Of:
: 1121267 (view as bug list)
Environment:
Last Closed: 2014-08-04 13:28:02 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0999 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.1.4 bug fix and enhancement update 2014-08-04 17:26:43 UTC

Description Miciah Dashiel Butler Masters 2014-07-18 19:52:09 UTC
Description of problem:

MCollective ignores messages where the sender's timestamps on those messages is more than 60 seconds behind the recipient's clock.  OpenShift broker and node hosts use MCollective for communication.  Consequently, oo-diagnostics should detect when a node's clock is out of synch with its broker's clock.


How reproducible:

Completely.


Steps to Reproduce:

1. Install an OpenShift Enterprise PaaS with 1 node host and 1 distinct broker host.

2. Set the node's clock 30 seconds ahead of the broker's and run oo-diagnostics on the node.

3. Set the node's clock 30 seconds behind the broker's and run oo-diagnostics on the node.

4. Set the node's clock 90 seconds ahead of the broker's and run oo-diagnostics on the node.

5. Set the node's clock 90 seconds behind the broker's and run oo-diagnostics on the node.


Actual results:

oo-diagnostics does not complain about the clock.


Expected results:

At Steps 2 and 3, oo-diagnostics should give a warning because the node's clock is significantly off from the broker's.

At Steps 4 and 5, oo-diagnostics should give an error because the node's clock is sufficiently far off from the broker's to disrupt communications.


Additional info:

In situations where the clocks are so far out of synch as to disrupt communications, the broker has no good way to discover nodes because uses MCollective, which is disrupted by the problem.  However, a node can identify its broker by the BROKER_HOST setting in its /etc/openshift/node.conf configuration file, so it would be feasible for a node to check that it is in synch with its broker.

Comment 1 Miciah Dashiel Butler Masters 2014-07-22 16:09:15 UTC
PR: https://github.com/openshift/enterprise-server/pull/332

Comment 4 Anping Li 2014-07-24 04:02:17 UTC
Verified and pass on puddle-2-1-2014-07-22.


Result:
When the node is ahead/behind 30~60, show warning message.
When the node is ahead/behind 60~, show error message.

step 1: No complain about the clock.

step 2: Ahead > 30, Show warning message
INFO: running: test_node_clock_in_synch_with_broker
WARN: test_node_clock_in_synch_with_broker
        The local host's clock is ahead of br200.osegeo-20140724.com.cn's
        by 39 seconds.

step 3: Behind >30, show warning message 
WARN: test_node_clock_in_synch_with_broker
        The local host's clock is behind br200.osegeo-20140724.com.cn's
        by 34 seconds.  Note that a host will drop messages that it receives

step 4: Ahead > 90, show error message 
FAIL: test_node_clock_in_synch_with_broker
        The local host's clock is ahead of br200.osegeo-20140724.com.cn's
        by 90 seconds.

step 5: Behind>90, show error message
        FAIL: test_node_clock_in_synch_with_broker
        The local host's clock is behind br200.osegeo-20140724.com.cn's
        by 92 seconds.

Step 6: Ahead >60, Show error message
FAIL: test_node_clock_in_synch_with_broker
        The local host's clock is ahead of br200.osegeo-20140724.com.cn's
        by 64 seconds

Step 7: Behind>60, Show error message
FAIL: test_node_clock_in_synch_with_broker
        The local host's clock is behind br200.osegeo-20140724.com.cn's
        by 65 seconds

Comment 6 errata-xmlrpc 2014-08-04 13:28:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0999.html


Note You need to log in before you can comment on or make changes to this bug.