Bug 1357078 - Nodes briefly enter and then leave NotReady state
Summary: Nodes briefly enter and then leave NotReady state
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.2.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Avesh Agarwal
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks: OSOPS_V3
TreeView+ depends on / blocked
 
Reported: 2016-07-15 16:10 UTC by Sten Turpin
Modified: 2016-08-08 13:43 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-08 13:43:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
atomic-openshift-node log taken during the node entering notready state (13.96 KB, text/plain)
2016-07-15 16:10 UTC, Sten Turpin
no flags Details

Description Sten Turpin 2016-07-15 16:10:53 UTC
Created attachment 1180204 [details]
atomic-openshift-node log taken during the node entering notready state

Description of problem: We have several clusters where one node is reported as being NotReady for a few seconds before going back to Ready


Version-Release number of selected component (if applicable): atomic-openshift-node-3.2.1.7-1.git.0.2702170.el7.x86_64

Some older versions are also affected


How reproducible: a few times /day


Steps to Reproduce:
1. oc get nodes --watch


Actual results:
On some clusters, one node will briefly report NotReady, then go back to Ready

Expected results:
Nodes should be stable


Additional info:
In the attached log, we're doing automated tests where one script creates its own namespace and deploys openshift/hello-openshift, while another also creates its own namespace, and does an STI build into the namespace. Seems interesting that the NotReady issue occurred during those deploys.

Comment 1 Avesh Agarwal 2016-07-15 17:33:45 UTC
sten, does it happen with the same node or with different nodes at different times in the same cluster?

Comment 2 Avesh Agarwal 2016-07-15 19:12:15 UTC
Sten, could you share more details about your setup like nodes in your cluster, and scripts (although I can try to something similar with them too, but just curious if the script are doing something else too)?

Comment 3 Andy Goldstein 2016-07-22 18:20:01 UTC
Sten, the next time this happens, could you please attach both the master (api + controller) & node logs?

Comment 4 Andy Goldstein 2016-08-04 10:52:58 UTC
Sten, has this happened any more?

Comment 5 Sten Turpin 2016-08-04 16:58:45 UTC
I've been watching on 3.2.1.7, 3.2.1.9 and 3.2.1.10 clusters, haven't seen it resurface.

Comment 6 Andy Goldstein 2016-08-08 13:43:59 UTC
Sten, please reopen if needed. Thanks.


Note You need to log in before you can comment on or make changes to this bug.