Created attachment 1180204 [details] atomic-openshift-node log taken during the node entering notready state Description of problem: We have several clusters where one node is reported as being NotReady for a few seconds before going back to Ready Version-Release number of selected component (if applicable): atomic-openshift-node-3.2.1.7-1.git.0.2702170.el7.x86_64 Some older versions are also affected How reproducible: a few times /day Steps to Reproduce: 1. oc get nodes --watch Actual results: On some clusters, one node will briefly report NotReady, then go back to Ready Expected results: Nodes should be stable Additional info: In the attached log, we're doing automated tests where one script creates its own namespace and deploys openshift/hello-openshift, while another also creates its own namespace, and does an STI build into the namespace. Seems interesting that the NotReady issue occurred during those deploys.
sten, does it happen with the same node or with different nodes at different times in the same cluster?
Sten, could you share more details about your setup like nodes in your cluster, and scripts (although I can try to something similar with them too, but just curious if the script are doing something else too)?
Sten, the next time this happens, could you please attach both the master (api + controller) & node logs?
Sten, has this happened any more?
I've been watching on 3.2.1.7, 3.2.1.9 and 3.2.1.10 clusters, haven't seen it resurface.
Sten, please reopen if needed. Thanks.