Bug 1387714 - OpenShift Router should not bind to ports if it can't talk to the masters
Summary: OpenShift Router should not bind to ports if it can't talk to the masters
Keywords:
Status: CLOSED DUPLICATE of bug 1383663
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RFE
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: Xiaoli Tian
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-21 15:56 UTC by Eric Rich
Modified: 2020-06-11 13:02 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-07 15:26:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Eric Rich 2016-10-21 15:56:58 UTC
Description of problem:

There are situations when the OpenShift Control plain can crash (completely). When such a catastrophic failure occurs. Routers can no longer pull updates from the API servers. 

Because of this, no route changes propagate to the routers. This is generally OK because platform level changes (in the control plain) should not be changing the environment. 

> If changes occur, the "current" routes should be able to handle outage seen in the data layer, such as: Pods that die
>> Routes, will health check these, and disable them (at some point - catastrophic failure of an app 503 will be seen for the route). 

However, there is a situation where, because the OpenShift Control plain is down, it is possible for the OpenShift router to "restart" (due to a catastrophic failure). This restart would then loose the current configuration of routes, and as such be forced to communicate with the OpenShift Control plain to re-populate the configuration of routes. 

This combination of failures should be something that the platform accounts for!

Comment 1 Eric Rich 2016-10-21 16:00:51 UTC
One suggestion to this problem is that, if the OpenShift Router restarts, it should not bind to the host ports (80/443,etc) if it can not communicate with the control plain at startup. 

> Note: if this occurs, some logging should be provided to help operators know / understand that the issue is not with the router but with its in-ability to talk to the Control Plain. 

This causes connection to the routers to be dropped / rejected at a TCP layer allowing for external load balancing to better handle error notifications to users. 

This also better indicates to operators and admins that there is a "catastrophic" issue at hand, that needs to be investigated.

Comment 2 Ben Bennett 2016-10-28 16:57:40 UTC
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1383663

Comment 3 Ben Bennett 2016-11-07 15:26:56 UTC

*** This bug has been marked as a duplicate of bug 1383663 ***


Note You need to log in before you can comment on or make changes to this bug.