Bug 1121267 - oo-diagnostics should check that a node's clock is synchronized with the broker's
Summary: oo-diagnostics should check that a node's clock is synchronized with the brok...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Jhon Honce
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks: 1121266
TreeView+ depends on / blocked
 
Reported: 2014-07-18 19:54 UTC by Miciah Dashiel Butler Masters
Modified: 2015-05-14 23:13 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1121266
Environment:
Last Closed: 2014-10-10 00:46:12 UTC


Attachments (Terms of Use)

Description Miciah Dashiel Butler Masters 2014-07-18 19:54:56 UTC
+++ This bug was initially created as a clone of Bug #1121266 +++

Description of problem:

MCollective ignores messages where the sender's timestamps on those messages is more than 60 seconds behind the recipient's clock.  OpenShift broker and node hosts use MCollective for communication.  Consequently, oo-diagnostics should detect when a node's clock is out of synch with its broker's clock.


How reproducible:

Completely.


Steps to Reproduce:

1. Install an OpenShift Enterprise PaaS with 1 node host and 1 distinct broker host.

2. Set the node's clock 30 seconds ahead of the broker's and run oo-diagnostics on the node.

3. Set the node's clock 30 seconds behind the broker's and run oo-diagnostics on the node.

4. Set the node's clock 90 seconds ahead of the broker's and run oo-diagnostics on the node.

5. Set the node's clock 90 seconds behind the broker's and run oo-diagnostics on the node.


Actual results:

oo-diagnostics does not complain about the clock.


Expected results:

At Steps 2 and 3, oo-diagnostics should give a warning because the node's clock is significantly off from the broker's.

At Steps 4 and 5, oo-diagnostics should give an error because the node's clock is sufficiently far off from the broker's to disrupt communications.


Additional info:

In situations where the clocks are so far out of synch as to disrupt communications, the broker has no good way to discover nodes because uses MCollective, which is disrupted by the problem.  However, a node can identify its broker by the BROKER_HOST setting in its /etc/openshift/node.conf configuration file, so it would be feasible for a node to check that it is in synch with its broker.

Comment 1 Miciah Dashiel Butler Masters 2014-07-18 20:04:14 UTC
PR: https://github.com/openshift/origin-server/pull/5630

Comment 2 openshift-github-bot 2014-07-18 23:11:52 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/9cd67dca0dc94ece82801652554a5d6a7061a3a7
oo-diagnostics: Add test_node_clock_in_synch_with_broker

Add a test_node_clock_in_synch_with_broker check to oo-diagnostics.  This
test is specific to node hosts.  It sends an HTTP request to the host
identified by the BROKER_HOST setting in /etc/openshift/node.conf and
compares the date in the Date: header of the response to the date of the
node host.  If the difference is equal to or more than 55 seconds, the
check gives an error.  Else, if the difference is equal to or more than
5 seconds, the check gives a warning.

This commit fixes bug 1121267.

Comment 3 Meng Bo 2014-07-23 08:14:30 UTC
The time sync checking was added on devenv_4998, and the behavior is same as the description in comment#2.


WARN: test_node_clock_in_synch_with_broker
        The local host's clock is ahead of 10.169.54.13's
        by 46 seconds.  Note that a host will drop messages that it receives
        over MCollective if the sender's timestamp on those messages is more
        than 60 seconds in the past by the recipient's clock.  This means that
        if a node's clock is too far off from the broker's, the node will
        effectively be invisible to the broker.

        Please ensure that all hosts' clocks are synchronized, and consider
        configuring ntpd to keep their clocks synchronized.

WARN: test_node_clock_in_synch_with_broker
        The local host's clock is behind 10.169.54.13's
        by 16 seconds.  Note that a host will drop messages that it receives
        over MCollective if the sender's timestamp on those messages is more
        than 60 seconds in the past by the recipient's clock.  This means that
        if a node's clock is too far off from the broker's, the node will
        effectively be invisible to the broker.

        Please ensure that all hosts' clocks are synchronized, and consider
        configuring ntpd to keep their clocks synchronized.

FAIL: test_node_clock_in_synch_with_broker
        The local host's clock is behind 10.169.54.13's
        by 55 seconds.  Note that a host will drop messages that it receives
        over MCollective if the sender's timestamp on those messages is more
        than 60 seconds in the past by the recipient's clock.  This means that
        if a node's clock is too far off from the broker's, the node will
        effectively be invisible to the broker.

        Please ensure that all hosts' clocks are synchronized, and consider
        configuring ntpd to keep their clocks synchronized.

FAIL: test_node_clock_in_synch_with_broker
        The local host's clock is ahead of 10.169.54.13's
        by 70 seconds.  Note that a host will drop messages that it receives
        over MCollective if the sender's timestamp on those messages is more
        than 60 seconds in the past by the recipient's clock.  This means that
        if a node's clock is too far off from the broker's, the node will
        effectively be invisible to the broker.

        Please ensure that all hosts' clocks are synchronized, and consider
        configuring ntpd to keep their clocks synchronized.


Note You need to log in before you can comment on or make changes to this bug.