Bug 1740137 - Trigger evacuation on cordon (watch for kubectl cordon NODE taint)
Summary: Trigger evacuation on cordon (watch for kubectl cordon NODE taint)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 2.1.0
Assignee: Marc Sluiter
QA Contact: Denys Shchedrivyi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-12 11:02 UTC by Fabian Deutsch
Modified: 2019-11-04 15:07 UTC (History)
7 users (show)

Fixed In Version: hco-bundle-registry:v2.1.0-21
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-04 15:07:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Fabian Deutsch 2019-08-12 11:02:51 UTC
Description of problem:
Today NodeMainteancen CR is used to put a node into manteinance. This does not tie in with the general "kubectl drain …" flow.

In order to tie in with this, KubeVirt needs to watch for the cordon/drain related taints on nodes (To be identified).

These taints then need to be set in the kubevirt-config "nodeDrainTaintKey" to let KubeVirt trigger node live migration once a drain is triggered.

Version-Release number of selected component (if applicable):
2.1

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Fabian Deutsch 2019-08-12 11:03:23 UTC
This shoudl also be documented here:
https://github.com/kubevirt/user-guide/blob/master/administration/node-eviction.adoc

Comment 2 Fabian Deutsch 2019-08-12 12:18:51 UTC
The taint to watch for is probably
Taints:             node.kubernetes.io/unschedulable:NoSchedule

Comment 3 Martin Sivák 2019-08-14 13:15:17 UTC
It is all fine to watch for a taint, but that does not directly satisfy the maintenance epic acceptance criteria:

- We want to be able to record a reason for the maintenance
- and (more importantly) we need the UI to be able to put a node to maintenance

Also the taint itself does not allow any status reporting and the migrations might take some time..

So I would actually propose to keep NodeMaintenance CR as the primary trigger for CNV that then _uses_ the taint and drain APIs that is in turn detected by kubevirt and converted into live migration.

Btw, the documentation about cordon is pretty explicit in saying it does not trigger evictions by itself. And there is no taint for drain/eviction.

"Marking a node as unschedulable prevents new pods from being scheduled to that node, but does not affect any existing pods on the node. This is useful as a preparatory step before a node reboot, etc. For example, to mark a node unschedulable, run this command: kubectl cordon $NODENAME" -> this creates the taint only.


I believe we somehow managed to invert the requested relationship between node maintenance and cordon/drain here. The admin can use cordon/drain manually and kubevirt should notice the evictions and perform live migration. But the NodeMaintenance is just an abstraction on top of cordon/drain that performs the same thing as the command line tool. Those two can be used at the same time (there can be multiple NoSchedule taints at the same time).

Comment 6 Fabian Deutsch 2019-08-16 08:29:46 UTC
This is a pragmatic _stop gap_ solution.
We are working on a right fix for a later release, but this depends on K8s 1.15 features: https://jira.coreos.com/browse/CNV-2727

I understand that the workaround suggested in this bug, has the drawbacks that VMs will also be evacuated (restated or migrated) when "oc node cordon" is used.

Comment 7 Marc Sluiter 2019-08-19 13:53:53 UTC
Hey Fabian, 

I need to more some more details:

- is this about starting live migrations only, or also about restarting non migratable VMs, as your last comment suggests?
- for migrations: I understand we have this feature already, it was implemented in [0]. 2 conditions need to be met for this to work:
  1. the correct taint key needs to be set in the config
  2. the VMs need to have the EvictionStrategy set to LiveMigrate
- for restarts: restarts would need to implemented for VMs which are a) not live migratable or b) don't have the EvictionStrategy set to LiveMigrate. Makes sense? 

[0] https://github.com/kubevirt/kubevirt/pull/2091

Comment 8 Fabian Deutsch 2019-08-23 07:23:39 UTC
The goal is to tie in with the _default OCP flow_ for eviciting pods on nodes.

Thus: If a node is drained, it's pods get evicted, then some VMs should be migrated away, others should be restarted elsewhere - whtever is configured with the evictionStrategy https://kubevirt.io/api-reference/master/definitions.html#_v1_evictionstrategy

The scope of this bug is simply to change the taint key to "node.kubernetes.io/unschedulable:NoSchedule" all other functionality can stay as it is.

Comment 9 Marc Sluiter 2019-08-26 09:44:50 UTC
PR for the node drain key: https://github.com/kubevirt/hyperconverged-cluster-operator/pull/247

Still need to test if restarts work as expected out of the box.

Comment 11 Denys Shchedrivyi 2019-09-03 21:29:08 UTC
Verified on hco-bundle-registry:v2.1.0-29


Note You need to log in before you can comment on or make changes to this bug.