Bug 1706251

Summary: Node Maintenance does not initiate LiveMigration when non default nodeDrainTaintKey is used
Product: Container Native Virtualization (CNV) Reporter: Denys Shchedrivyi <dshchedr>
Component: VirtualizationAssignee: Fabian Deutsch <fdeutsch>
Status: CLOSED NOTABUG QA Contact: Israel Pinto <ipinto>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.0CC: cnv-qe-bugs, fdeutsch, ipinto, yquinn
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-06 15:06:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Denys Shchedrivyi 2019-05-03 21:58:11 UTC
Description of problem:

  After running node maintenance it is adding this taint to the node:

Taints:             kubevirt.io/drain:NoSchedule

  And all VMs with Eviction Strategy successfully migrated.

 
 But if I changed nodeDrainTaintKey from kubevirt.io/drain to something else (e.g. test/test) in kubevirt config - LiveMigration does not happen.



Version-Release number of selected component (if applicable):
2.0


Steps to Reproduce:
1. add nodeDrainTaintKey with non default value to kubevirt-config (e.g. test/test) 
2. create VM and run it
3. Run node maintenance

Actual results:
 Taints kubevirt.io/drain:NoSchedule is added to node, LiveMigration does not start, all pods are terminated and recreated on new node.


Expected results:
 Taint test/test:NoSchedule should be added to the node, LiveMigration started, all source pods deleted only when migration successfully completed

Comment 1 Yanir Quinn 2019-05-04 17:34:23 UTC
As a first step for node maintenance operator we hard coded the kubevirt.io/drain:NoSchedule taint to be added.
We want to enhance the behaviour to add taints and labels to the CRD see:
https://github.com/kubevirt/node-maintenance-operator/issues/13

Comment 2 Israel Pinto 2019-05-05 14:20:40 UTC
(In reply to Yanir Quinn from comment #1)
> As a first step for node maintenance operator we hard coded the
> kubevirt.io/drain:NoSchedule taint to be added.
> We want to enhance the behaviour to add taints and labels to the CRD see:
> https://github.com/kubevirt/node-maintenance-operator/issues/13

Yanir, 
It part of the this release I understand. But how it handle via UI?
And do we document it for this version?

Comment 3 Fabian Deutsch 2019-05-06 11:40:57 UTC
I think this is not a bug.

We speak about two components here:
(1) Maintenance Operator
(2) KubeVirt

This bug is about the taint key that (2) is listening for.

And if (1) is used, then it's clear that (2) needs to be configured to listen for the key that (1) is setting.

And it is obvious that if (2) is configured wrongly, then it will ont work if (1) is used.

Denys, why do you consider this bug to be a bug?

Comment 4 Israel Pinto 2019-05-06 12:01:25 UTC
(In reply to Fabian Deutsch from comment #3)
> I think this is not a bug.
> 
> We speak about two components here:
> (1) Maintenance Operator
> (2) KubeVirt
> 
> This bug is about the taint key that (2) is listening for.
> 
> And if (1) is used, then it's clear that (2) needs to be configured to
> listen for the key that (1) is setting.
> 
> And it is obvious that if (2) is configured wrongly, then it will ont work
> if (1) is used.
> 
> Denys, why do you consider this bug to be a bug?

It answer partial me question to Yanir. It will not run standalone, customer will be use from UI, or will use both components.
Just be sure, is it documented ?

Denys, any other issue here?

Comment 5 Fabian Deutsch 2019-05-06 12:51:54 UTC
HCO is responsible for configuring the components to work correctly (if any configuration is necessary).

Thus: If HCO i sused for deployment,t hen all comonents should work together.
It is _not_ expected that they work together if they are deployed individually.

In that sense: If KubeVirt is deployed by itself / standalone, then it is not expected to work with Node Maintenance Controller out of the box. (Although in this case it might be the case).

Comment 7 Fabian Deutsch 2019-05-06 14:45:10 UTC
Changing the kubevirt-config is like changing any other config file in linux: If you mess it up, then your system will likely fall apart. In that sense: Customers can edit the file, but they should be knowing what they are doing.

lubevirt-config is the config file of KubeVirt, thus any other compoennt is free rto read it as well, but it's not intended that it's configuring another component.

Comment 8 Denys Shchedrivyi 2019-05-06 14:59:42 UTC
If I changed nodeDrainTaintKey in kubevirt-config - I want to be able to run maintenance with new configuration. Otherwise, I can't use this option because it leads to non working environment.

Comment 10 Fabian Deutsch 2019-05-06 15:05:46 UTC
Again: This is a config file, it is configured to have a working system, if a person is changing the configuratio, the it ca brake it. But that#s up to the perso, this is othig we ca prevet.

If you chage a cro path, the you can not expect that systemd is picking up this new path.
Applied to this situatio: if you change the "kubevirt watch taint", the you can not expect that node maintenance controller is picking up this new taint.

It's two components (kubevirt and node maintenance controller) and one is configurable (kubevirt) th eother is not (yet) (node maintenance controller).
If you break the configuration it is your fault.

We take guarantee that the shipped configuration is working.

Comment 11 Fabian Deutsch 2019-05-06 15:06:25 UTC
HCO is th component which can detect a wrong configuration between KubeVirt and NodeMaintenance operator.