Description of problem: We found a bug in the implementation where clearing NodeNetworkUnavailable condition could fail when there a race between master and node trying to update the node status. Since we get node events for every kubelet node status update, eventually we will clear this condition. Proposed https://github.com/openshift/origin/pull/18758 to fix this cleanly. Version-Release number of selected component (if applicable): oc v3.10.0-alpha.0+1d01229-4-dirty (also valid for older releases) kubernetes v1.9.1+a0ce1bc657 How reproducible: Not Always (easy with some instrumentation) Verification/Testing: Please ensure there is no regression on GCP with this fix.
https://github.com/openshift/origin/pull/18758
Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/308bb2e8f4f0a198e92993f9ec7a8f5d8ca7e349 Merge pull request #18758 from pravisankar/fix-clear-nodenetwork Automatic merge from submit-queue. Bug 1550266 - Fix clearInitialNodeNetworkUnavailableCondition() in sdn master #This change fixes these 2 issues: - Currently, clearing NodeNetworkUnavailable node condition only works if we are successful in updating the node status during the first iteration. Subsequent retries will not work because: 1. knode != node 2. node.Status is updated in memory 3. UpdateNodeStatus(knode) (3) will have no effect as in step (2) node.Status is updated but not knode.Status - Node object passed to this method is pointer to an item in the informer cache and it should not be modified directly. Avoid NodeNetworkUnavailable condition check for every node status update - We know that kubelet sets NodeNetworkUnavailable condition when the node is created/registered with api server. - So we only need to call clearInitialNodeNetworkUnavailableCondition() for the first time and not during subsequent node status update events.
no issue found during regression test on GCP with v3.10.0-0.54.0. OS: Red Hat Enterprise Linux Server release 7.5 (Maipo) kernel: Linux qe-310-crio-master-etcd-1 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816