Description of problem:
We found a bug in the implementation where clearing NodeNetworkUnavailable condition could fail when there a race between master and node trying to update the node status. Since we get node events for every kubelet node status update, eventually we will clear this condition.
Proposed https://github.com/openshift/origin/pull/18758 to fix this cleanly.
Version-Release number of selected component (if applicable):
oc v3.10.0-alpha.0+1d01229-4-dirty (also valid for older releases)
Not Always (easy with some instrumentation)
Please ensure there is no regression on GCP with this fix.
Commit pushed to master at https://github.com/openshift/origin
Merge pull request #18758 from pravisankar/fix-clear-nodenetwork
Automatic merge from submit-queue.
Bug 1550266 - Fix clearInitialNodeNetworkUnavailableCondition() in sdn master
#This change fixes these 2 issues:
- Currently, clearing NodeNetworkUnavailable node condition only works
if we are successful in updating the node status during the first iteration.
Subsequent retries will not work because:
1. knode != node
2. node.Status is updated in memory
(3) will have no effect as in step (2) node.Status is updated but not knode.Status
- Node object passed to this method is pointer to an item in the informer
cache and it should not be modified directly.
Avoid NodeNetworkUnavailable condition check for every node status update
- We know that kubelet sets NodeNetworkUnavailable condition when the node is
created/registered with api server.
- So we only need to call clearInitialNodeNetworkUnavailableCondition()
for the first time and not during subsequent node status update events.
no issue found during regression test on GCP with v3.10.0-0.54.0.
OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)
kernel: Linux qe-310-crio-master-etcd-1 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.