Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1326732

Summary:	node label lost while node shutdown overnight
Product:	OpenShift Container Platform	Reporter:	Mike Fiedler <mifiedle>
Component:	RFE	Assignee:	Derek Carr <decarr>
Status:	CLOSED DEFERRED	QA Contact:	DeShuai Ma <dma>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	3.2.0	CC:	aos-bugs, decarr, hongkliu, jjerezro, jkaur, jokerman, mbarrett, mifiedle, mmccomas, mmckinst, ndordet, pep, simon.gunzenreiner, sjenning
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-03-12 13:54:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1435401

Description Mike Fiedler 2016-04-13 11:48:33 UTC

Description of problem:

Shutdown my labelled AWS nodes overnight to save money.   After restarting them the next morning no pods would start because they failed to match the region label selector which was set during install.   oc get nodes --show-label showed the region label missing.

The nodes were installed using the AWS private DNS name which does not change on shutdown restart.


Version-Release number of selected component (if applicable): 3.2.0.14


How reproducible: Always


Steps to Reproduce:
1.  Install cluster of 4 nodes on AWS: 
   master
   router/registry  (labelled with region=infra)
   2 nodes (labelled with region=primary)

Install using the private DNS names for master and all nodes so that the hostnames will be good after restart.

2. After install set the cluster default project selector to region=primary 
3. After install set an annotation on the default namespace to use a selector of region=infra
4. Create a router and registry dc and pod - verify running on the region=infra node
5. Create some app pods, verify running on the region=primary nodes
6. Stop the nodes (not the master).  In my case it was overnight.
7. Restart the nodes, verify oc get nodes shows Ready

Actual results:

1.  All pods stuck in Pending.  kubectl get events shows FailedScheduling and failed to fit any node

2. Labeling the nodes allows containers to start.  Note:  --overwrite is not needed.   The label region does not exist at all.


Expected results:

After restart, nodes maintain their labels and do not require re-labeling.  Pods restart successfully when nodes restart 

Additional info:

Comment 1 Andy Goldstein 2016-04-13 12:55:53 UTC

If you want to preserve labels, the only way to do this today is the following:

Edit node-config.yaml so that you have a section like this:

kubeletArguments:
  node-labels:
    - "a=b"
    - "c=d"

This only takes effect when the node registers with the master. If your node has already registered and you want to modify its labels *and* preserve them in the future, in the event you stop the node and it's therefore unregistered, you'll need to do:

1. oc label node/$node ....
2. edit node-config.yaml as described above

It is not currently possible to solely modify the node's labels via 'oc label' and have those modifications preserved after an unregistration/reregistration.

Do you believe this is sufficient, or would you prefer we try to find a way to preserve label modifications made after the node has registered?

Comment 2 Mike Fiedler 2016-04-13 13:10:34 UTC

From an end user point of view, my expectation would be that user-provided node labels (install time or set subsequently from cli) would be persisted, especially since they can impact the ability for pods to find a suitable node.  

Since they are configurable in kubeletArguments, I think that provides a workaround for the install scenario.  I'll make sure I can set the labels in openshift_node_kubelet_args in the Ansible inventory.   Having them set-able in openshift_node_labels at install time makes them appear more "permanent" than they are.

Comment 3 Mike Fiedler 2016-04-13 14:59:59 UTC

Alternatively, if openshift_node_labels is set at install time, the values should find their way into node-config.yaml kubeletArguments during install.

Comment 4 Timothy St. Clair 2016-05-18 15:04:44 UTC

Given that the labels live in etcd, it makes sense to for the node to query before update on startup.

xref: https://github.com/kubernetes/kubernetes/issues/25811

Comment 5 Andy Goldstein 2016-05-27 15:05:05 UTC

I know that upstream wanted to have a thorough discussion about this, which is why the "node-labels" flag you can pass to the kubelet is considered alpha at this time.

Comment 6 Timothy St. Clair 2016-07-26 19:58:25 UTC

*this is in the 1.4 bucket but no assignee - https://github.com/kubernetes/kubernetes/issues/28051

Comment 7 Andy Goldstein 2016-08-08 20:20:58 UTC

Needs community agreement on how to proceed. Will not make 3.3.

Comment 8 Derek Carr 2016-09-30 14:19:28 UTC

assigning to paul to track upstream discussion.

Comment 9 Derek Carr 2016-10-26 17:57:21 UTC

This is an RFE and should be treated as such.  This is not a blocker bug.

Comment 14 Eric Rich 2018-03-12 13:54:36 UTC

This bug has been identified as a dated (created more than 3 months ago) bug. 
This bug has been triaged (has a trello card linked to it), or reviewed by Engineering/PM and has been put into the product backlog, 
however this bug has not been slated for a currently planned release (3.9, 3.10 or 3.11), which cover our releases for the rest of the calendar year. 

As a result of this bugs age, state on the current roadmap and PM Score (being below 70), this bug is being Closed - Differed, 
as it is currently not part of the products immediate priorities.

Please see: https://docs.google.com/document/d/1zdqF4rB3ea8GmVIZ7qWCVYUaQ7-EexUrQEF0MTwdDkw/edit for more details.

Comment 15 Seth Jennings 2018-07-09 02:20:45 UTC

This issue is being tracked here for anyone stumbling across the bz
https://bugzilla.redhat.com/show_bug.cgi?id=1559271