Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1608971

Summary: controller deleting roles from nodes it can't contact
Product: OpenShift Container Platform Reporter: Alex Chvatal <achvatal>
Component: NodeAssignee: Seth Jennings <sjenning>
Status: CLOSED DUPLICATE QA Contact: DeShuai Ma <dma>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.9.0CC: aos-bugs, jgoulding, jokerman, mmccomas
Target Milestone: ---Keywords: OpsBlocker
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 20:16:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alex Chvatal 2018-07-26 15:25:20 UTC
Description of problem:
The node controller is deleting roles from nodes that can't be contacted.


Version-Release number of selected component (if applicable):
kubernetes v1.9.1+a0ce1bc657
openshift v3.9.31
kubernetes v1.9.1+a0ce1bc657


How reproducible:
very


Steps to Reproduce:
1. stop node
2. wait


Actual results:
controller removes the labels from the node


Expected results:
the node keeps its labels

Comment 1 Seth Jennings 2018-07-30 20:16:00 UTC
This bug description is pretty terse.  Are you referring the node as "the atomic-openshift-node systemd service" or "the underlying instance".

Considering the behavior, I understand you to mean the latter.  I also assume you are running with cloud provider integration (likely AWS), otherwise the node controller would not remove your Node from the API server.

Thus this happens when
1) master-controllers is using AWS cloudprovider integration
2) shutting down the instance
3) after 5m timeout, Node resource relating to that instance is removed
4) if instances is restarted, a new Node is created and it does not have the correct labels

I'm assuming this is the case and, if so, this is a dup of 1559271, which was closed because we got blocked upstream.  There is some interest in reopening, however, and really forcing the issue with upstream.

*** This bug has been marked as a duplicate of bug 1559271 ***