Bug 1326732 - node label lost while node shutdown overnight
Summary: node label lost while node shutdown overnight
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RFE
Version: 3.2.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Derek Carr
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks: 1435401
TreeView+ depends on / blocked
 
Reported: 2016-04-13 11:48 UTC by Mike Fiedler
Modified: 2021-06-10 11:15 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-12 13:54:36 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Mike Fiedler 2016-04-13 11:48:33 UTC
Description of problem:

Shutdown my labelled AWS nodes overnight to save money.   After restarting them the next morning no pods would start because they failed to match the region label selector which was set during install.   oc get nodes --show-label showed the region label missing.

The nodes were installed using the AWS private DNS name which does not change on shutdown restart.


Version-Release number of selected component (if applicable): 3.2.0.14


How reproducible: Always


Steps to Reproduce:
1.  Install cluster of 4 nodes on AWS: 
   master
   router/registry  (labelled with region=infra)
   2 nodes (labelled with region=primary)

Install using the private DNS names for master and all nodes so that the hostnames will be good after restart.

2. After install set the cluster default project selector to region=primary 
3. After install set an annotation on the default namespace to use a selector of region=infra
4. Create a router and registry dc and pod - verify running on the region=infra node
5. Create some app pods, verify running on the region=primary nodes
6. Stop the nodes (not the master).  In my case it was overnight.
7. Restart the nodes, verify oc get nodes shows Ready

Actual results:

1.  All pods stuck in Pending.  kubectl get events shows FailedScheduling and failed to fit any node

2. Labeling the nodes allows containers to start.  Note:  --overwrite is not needed.   The label region does not exist at all.


Expected results:

After restart, nodes maintain their labels and do not require re-labeling.  Pods restart successfully when nodes restart 

Additional info:

Comment 1 Andy Goldstein 2016-04-13 12:55:53 UTC
If you want to preserve labels, the only way to do this today is the following:

Edit node-config.yaml so that you have a section like this:

kubeletArguments:
  node-labels:
    - "a=b"
    - "c=d"

This only takes effect when the node registers with the master. If your node has already registered and you want to modify its labels *and* preserve them in the future, in the event you stop the node and it's therefore unregistered, you'll need to do:

1. oc label node/$node ....
2. edit node-config.yaml as described above

It is not currently possible to solely modify the node's labels via 'oc label' and have those modifications preserved after an unregistration/reregistration.

Do you believe this is sufficient, or would you prefer we try to find a way to preserve label modifications made after the node has registered?

Comment 2 Mike Fiedler 2016-04-13 13:10:34 UTC
From an end user point of view, my expectation would be that user-provided node labels (install time or set subsequently from cli) would be persisted, especially since they can impact the ability for pods to find a suitable node.  

Since they are configurable in kubeletArguments, I think that provides a workaround for the install scenario.  I'll make sure I can set the labels in openshift_node_kubelet_args in the Ansible inventory.   Having them set-able in openshift_node_labels at install time makes them appear more "permanent" than they are.

Comment 3 Mike Fiedler 2016-04-13 14:59:59 UTC
Alternatively, if openshift_node_labels is set at install time, the values should find their way into node-config.yaml kubeletArguments during install.

Comment 4 Timothy St. Clair 2016-05-18 15:04:44 UTC
Given that the labels live in etcd, it makes sense to for the node to query before update on startup.

xref: https://github.com/kubernetes/kubernetes/issues/25811

Comment 5 Andy Goldstein 2016-05-27 15:05:05 UTC
I know that upstream wanted to have a thorough discussion about this, which is why the "node-labels" flag you can pass to the kubelet is considered alpha at this time.

Comment 6 Timothy St. Clair 2016-07-26 19:58:25 UTC
*this is in the 1.4 bucket but no assignee - https://github.com/kubernetes/kubernetes/issues/28051

Comment 7 Andy Goldstein 2016-08-08 20:20:58 UTC
Needs community agreement on how to proceed. Will not make 3.3.

Comment 8 Derek Carr 2016-09-30 14:19:28 UTC
assigning to paul to track upstream discussion.

Comment 9 Derek Carr 2016-10-26 17:57:21 UTC
This is an RFE and should be treated as such.  This is not a blocker bug.

Comment 14 Eric Rich 2018-03-12 13:54:36 UTC
This bug has been identified as a dated (created more than 3 months ago) bug. 
This bug has been triaged (has a trello card linked to it), or reviewed by Engineering/PM and has been put into the product backlog, 
however this bug has not been slated for a currently planned release (3.9, 3.10 or 3.11), which cover our releases for the rest of the calendar year. 

As a result of this bugs age, state on the current roadmap and PM Score (being below 70), this bug is being Closed - Differed, 
as it is currently not part of the products immediate priorities.

Please see: https://docs.google.com/document/d/1zdqF4rB3ea8GmVIZ7qWCVYUaQ7-EexUrQEF0MTwdDkw/edit for more details.

Comment 15 Seth Jennings 2018-07-09 02:20:45 UTC
This issue is being tracked here for anyone stumbling across the bz
https://bugzilla.redhat.com/show_bug.cgi?id=1559271


Note You need to log in before you can comment on or make changes to this bug.