Description of problem:
As IP addresses assigned to an Amazon ELB which load balances incoming requests for the Master API are removed, any atomic-openshift-node services which was bound to an IP address which no longer load balances Master API requests enter a NotReady state. We expect atomic-openshift-node to gracefully handle the loss of connectivity to a specific IP address associated with a load balanced Master API FQDN and not attempt to reuse a half closed connection.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Actual results: Amazon ELB IP change causes atomic-openshift-node NotReady state for 15 minutes
Expected results: We expect atomic-openshift-node to gracefully handle the loss of connectivity to a specific IP address associated with a load balanced Master API FQDN and not attempt to reuse a half closed connection.
Support upstream kubernetes documentation:
This bug looks similar to bug 1464653 which should be fixed in 3.7.
(In reply to Ryan Howe from comment #2)
> This bug looks similar to bug 1464653 which should be fixed in 3.7.
Never mind, it looks as though this issue can still happen even with the fix in bug 1464653.
PR that was closed:
*** Bug 1577695 has been marked as a duplicate of this bug. ***
I'm working on it already, and will continue checking...
Tested on ocp with version:
Setup HA env with elb on aws cluster, there are 3 ELB IP(s) before outage, which each is match with qe-geliu-elbmaster-etcd-zone1，qe-geliu-elbmaster-etcd-zone2-1
qe-geliu-elbmaster-etcd-zone2-2， choose 1 ip removed from elb in aws ui, then checked there is not node be in NotReady status for long time, and check the log of atomic-openshift-node, there is not critical err reference comment 1 above.
*** Bug 1584471 has been marked as a duplicate of this bug. ***