Bug 1357078

Summary: Nodes briefly enter and then leave NotReady state
Product: OpenShift Container Platform Reporter: Sten Turpin <sten>
Component: NodeAssignee: Avesh Agarwal <avagarwa>
Status: CLOSED WORKSFORME QA Contact: DeShuai Ma <dma>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.2.1CC: agoldste, aos-bugs, jokerman, mmccomas, sten
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-08 13:43:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1303130    
Attachments:
Description Flags
atomic-openshift-node log taken during the node entering notready state none

Description Sten Turpin 2016-07-15 16:10:53 UTC
Created attachment 1180204 [details]
atomic-openshift-node log taken during the node entering notready state

Description of problem: We have several clusters where one node is reported as being NotReady for a few seconds before going back to Ready


Version-Release number of selected component (if applicable): atomic-openshift-node-3.2.1.7-1.git.0.2702170.el7.x86_64

Some older versions are also affected


How reproducible: a few times /day


Steps to Reproduce:
1. oc get nodes --watch


Actual results:
On some clusters, one node will briefly report NotReady, then go back to Ready

Expected results:
Nodes should be stable


Additional info:
In the attached log, we're doing automated tests where one script creates its own namespace and deploys openshift/hello-openshift, while another also creates its own namespace, and does an STI build into the namespace. Seems interesting that the NotReady issue occurred during those deploys.

Comment 1 Avesh Agarwal 2016-07-15 17:33:45 UTC
sten, does it happen with the same node or with different nodes at different times in the same cluster?

Comment 2 Avesh Agarwal 2016-07-15 19:12:15 UTC
Sten, could you share more details about your setup like nodes in your cluster, and scripts (although I can try to something similar with them too, but just curious if the script are doing something else too)?

Comment 3 Andy Goldstein 2016-07-22 18:20:01 UTC
Sten, the next time this happens, could you please attach both the master (api + controller) & node logs?

Comment 4 Andy Goldstein 2016-08-04 10:52:58 UTC
Sten, has this happened any more?

Comment 5 Sten Turpin 2016-08-04 16:58:45 UTC
I've been watching on 3.2.1.7, 3.2.1.9 and 3.2.1.10 clusters, haven't seen it resurface.

Comment 6 Andy Goldstein 2016-08-08 13:43:59 UTC
Sten, please reopen if needed. Thanks.