Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1608971

Summary:	controller deleting roles from nodes it can't contact
Product:	OpenShift Container Platform	Reporter:	Alex Chvatal <achvatal>
Component:	Node	Assignee:	Seth Jennings <sjenning>
Status:	CLOSED DUPLICATE	QA Contact:	DeShuai Ma <dma>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.9.0	CC:	aos-bugs, jgoulding, jokerman, mmccomas
Target Milestone:	---	Keywords:	OpsBlocker
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-07-30 20:16:00 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Alex Chvatal 2018-07-26 15:25:20 UTC

Description of problem:
The node controller is deleting roles from nodes that can't be contacted.


Version-Release number of selected component (if applicable):
kubernetes v1.9.1+a0ce1bc657
openshift v3.9.31
kubernetes v1.9.1+a0ce1bc657


How reproducible:
very


Steps to Reproduce:
1. stop node
2. wait


Actual results:
controller removes the labels from the node


Expected results:
the node keeps its labels

Comment 1 Seth Jennings 2018-07-30 20:16:00 UTC

This bug description is pretty terse.  Are you referring the node as "the atomic-openshift-node systemd service" or "the underlying instance".

Considering the behavior, I understand you to mean the latter.  I also assume you are running with cloud provider integration (likely AWS), otherwise the node controller would not remove your Node from the API server.

Thus this happens when
1) master-controllers is using AWS cloudprovider integration
2) shutting down the instance
3) after 5m timeout, Node resource relating to that instance is removed
4) if instances is restarted, a new Node is created and it does not have the correct labels

I'm assuming this is the case and, if so, this is a dup of 1559271, which was closed because we got blocked upstream.  There is some interest in reopening, however, and really forcing the issue with upstream.

*** This bug has been marked as a duplicate of bug 1559271 ***