Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1357078

Summary:

Nodes briefly enter and then leave NotReady state

Product:

OpenShift Container Platform

Reporter:

Sten Turpin <sten>

Component:

Node

Assignee:

Avesh Agarwal <avagarwa>

Status:

CLOSED WORKSFORME

QA Contact:

DeShuai Ma <dma>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

3.2.1

CC:

agoldste, aos-bugs, jokerman, mmccomas, sten

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-08-08 13:43:59 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1303130

Attachments:

Description	Flags
atomic-openshift-node log taken during the node entering notready state	none

Description Sten Turpin 2016-07-15 16:10:53 UTC

Created attachment 1180204 [details]
atomic-openshift-node log taken during the node entering notready state

Description of problem: We have several clusters where one node is reported as being NotReady for a few seconds before going back to Ready


Version-Release number of selected component (if applicable): atomic-openshift-node-3.2.1.7-1.git.0.2702170.el7.x86_64

Some older versions are also affected


How reproducible: a few times /day


Steps to Reproduce:
1. oc get nodes --watch


Actual results:
On some clusters, one node will briefly report NotReady, then go back to Ready

Expected results:
Nodes should be stable


Additional info:
In the attached log, we're doing automated tests where one script creates its own namespace and deploys openshift/hello-openshift, while another also creates its own namespace, and does an STI build into the namespace. Seems interesting that the NotReady issue occurred during those deploys.

Comment 1 Avesh Agarwal 2016-07-15 17:33:45 UTC

sten, does it happen with the same node or with different nodes at different times in the same cluster?

Comment 2 Avesh Agarwal 2016-07-15 19:12:15 UTC

Sten, could you share more details about your setup like nodes in your cluster, and scripts (although I can try to something similar with them too, but just curious if the script are doing something else too)?

Comment 3 Andy Goldstein 2016-07-22 18:20:01 UTC

Sten, the next time this happens, could you please attach both the master (api + controller) & node logs?

Comment 4 Andy Goldstein 2016-08-04 10:52:58 UTC

Sten, has this happened any more?

Comment 5 Sten Turpin 2016-08-04 16:58:45 UTC

I've been watching on 3.2.1.7, 3.2.1.9 and 3.2.1.10 clusters, haven't seen it resurface.

Comment 6 Andy Goldstein 2016-08-08 13:43:59 UTC

Sten, please reopen if needed. Thanks.