Description of problem: MCollective ignores messages where the sender's timestamps on those messages is more than 60 seconds behind the recipient's clock. OpenShift broker and node hosts use MCollective for communication. Consequently, oo-diagnostics should detect when a node's clock is out of synch with its broker's clock. How reproducible: Completely. Steps to Reproduce: 1. Install an OpenShift Enterprise PaaS with 1 node host and 1 distinct broker host. 2. Set the node's clock 30 seconds ahead of the broker's and run oo-diagnostics on the node. 3. Set the node's clock 30 seconds behind the broker's and run oo-diagnostics on the node. 4. Set the node's clock 90 seconds ahead of the broker's and run oo-diagnostics on the node. 5. Set the node's clock 90 seconds behind the broker's and run oo-diagnostics on the node. Actual results: oo-diagnostics does not complain about the clock. Expected results: At Steps 2 and 3, oo-diagnostics should give a warning because the node's clock is significantly off from the broker's. At Steps 4 and 5, oo-diagnostics should give an error because the node's clock is sufficiently far off from the broker's to disrupt communications. Additional info: In situations where the clocks are so far out of synch as to disrupt communications, the broker has no good way to discover nodes because uses MCollective, which is disrupted by the problem. However, a node can identify its broker by the BROKER_HOST setting in its /etc/openshift/node.conf configuration file, so it would be feasible for a node to check that it is in synch with its broker.
PR: https://github.com/openshift/enterprise-server/pull/332
Verified and pass on puddle-2-1-2014-07-22. Result: When the node is ahead/behind 30~60, show warning message. When the node is ahead/behind 60~, show error message. step 1: No complain about the clock. step 2: Ahead > 30, Show warning message INFO: running: test_node_clock_in_synch_with_broker WARN: test_node_clock_in_synch_with_broker The local host's clock is ahead of br200.osegeo-20140724.com.cn's by 39 seconds. step 3: Behind >30, show warning message WARN: test_node_clock_in_synch_with_broker The local host's clock is behind br200.osegeo-20140724.com.cn's by 34 seconds. Note that a host will drop messages that it receives step 4: Ahead > 90, show error message FAIL: test_node_clock_in_synch_with_broker The local host's clock is ahead of br200.osegeo-20140724.com.cn's by 90 seconds. step 5: Behind>90, show error message FAIL: test_node_clock_in_synch_with_broker The local host's clock is behind br200.osegeo-20140724.com.cn's by 92 seconds. Step 6: Ahead >60, Show error message FAIL: test_node_clock_in_synch_with_broker The local host's clock is ahead of br200.osegeo-20140724.com.cn's by 64 seconds Step 7: Behind>60, Show error message FAIL: test_node_clock_in_synch_with_broker The local host's clock is behind br200.osegeo-20140724.com.cn's by 65 seconds
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0999.html