Bug 2192858

Summary:	kubevirt-job pod ignores node placement configuration
Product:	Container Native Virtualization (CNV)	Reporter:	nijin ashok <nashok>
Component:	Virtualization	Assignee:	Prita Narayan <prnaraya>
Status:	CLOSED ERRATA	QA Contact:	Denys Shchedrivyi <dshchedr>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.12.2	CC:	acardace, dshchedr, gveitmic, kbidarka, prnaraya
Target Milestone:	---
Target Release:	4.12.4
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	v4.12.4-35	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-06-27 19:10:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description nijin ashok 2023-05-03 10:50:10 UTC

Description of problem:

Add tolerations in node placement rules as per https://docs.openshift.com/container-platform/4.12/virt/install/virt-specifying-nodes-for-virtualization-components.html.

These rules are not getting propagated to the kubevirt-xxx-jobxxx pod. 

When all nodes have taints set, the pod will be stuck in "Pending" status and virt-oprator will fail to deploy virt components.


~~~
2023-05-02T17:48:15.692306010Z {"component":"virt-operator","kind":"","level":"error","msg":"Waiting on install strategy to be posted from job kubevirt-6dd2f5f71de2802413f34578306382b99a65c12f-jobk8c54","name":"kubevirt-6dd2f5f71de2802413f34578306382b99a65c12f-jobk8c54","namespace":"openshift-cnv","pos":"kubevirt.go:948","timestamp":"2023-05-02T17:48:15.692261Z","uid":"2a2df195-2806-4d1a-a5d6-f135f56f44bb"}
~~~
  
Version-Release number of selected component (if applicable):

OpenShift Virtualization 4.12.2

How reproducible:

100%

Steps to Reproduce:

1. Add node placement rules on hco object.
2. Check the placement rules are applied to kubevirt-xxx-jobxxx pod.

Actual results:

kubevirt-job pod ignores node placement configuration

Expected results:

Node placement configuration should be propagated to kubevirt-job pod.

Additional info:

This looks to be already fixed upstream https://github.com/kubevirt/kubevirt/commit/166f5063d1cc839fa1811871418cc528e829ca94

Comment 1 Denys Shchedrivyi 2023-06-16 22:20:24 UTC

 I verified on CNV-v4.12.4-52 - it seems not everything works as expected. 2 questions to clarify:

1. .spec.infra.nodePlacement.tolerations successfully applied to virt-controller and virt-api, but was *not applied to virt-operator*

 steps: 
 1) set taints on the nodes:
  $ oc adm taint nodes cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com key1=value1:NoSchedule
 
 2) added tolerations to HCO
   infra:
    nodePlacement:
      tolerations:
      - effect: NoSchedule
        key: key1
        operator: Equal
        value: value1

  3) check it was propagated to kubevirt:
   $ oc get kubevirt -o json | jq .items[0].spec.infra
   {
     "nodePlacement": {
       "tolerations": [
         {
           "effect": "NoSchedule",
           "key": "key1",
           "operator": "Equal",
           "value": "value1"
         }
       ]
     }
   }

  4) check virt pods:
 VIRT-API and VIRT-CONTROLLER pods have it:
   $oc get pod virt-api-769645b799-2z5kp -o json | jq .spec.tolerations
      {
        "effect": "NoSchedule",
        "key": "key1",
        "operator": "Equal",
        "value": "value1"
      },
 
  VIRT-OPERATOR - does not have it:
   $  oc get pod virt-operator-6c675b7888-9vlsx -o json | jq .spec.tolerations
   [
     {
       "key": "CriticalAddonsOnly",
       "operator": "Exists"
     },
     {
       "effect": "NoExecute",
       "key": "node.kubernetes.io/not-ready",
       "operator": "Exists",
       "tolerationSeconds": 300
     },
     {
       "effect": "NoExecute",
       "key": "node.kubernetes.io/unreachable",
       "operator": "Exists",
       "tolerationSeconds": 300
     },
     {
       "effect": "NoSchedule",
       "key": "node.kubernetes.io/memory-pressure",
       "operator": "Exists"
     }
   ]

  As result, if I remove virt-operator pod - it will be re-created and stuck in Pending state:
    virt-operator-6c675b7888-rt8sj                         0/1     Pending   0          40s

 Probably virt-operator pods should also include necessary tolerations



2. And another question about .spec.workloads 
 When I set workloads tolerations to the hco - it is applied to virt-handler pods, but does not apply to virt-launcher pods, so no any VMs can be created on that node
 $ oc describe pod virt-launcher-vm-label-mdfqt
 .
   Warning  FailedScheduling  22m   default-scheduler  0/3 nodes are available: 3 node(s) had untolerated taint {key1: value1}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.

Comment 2 Antonio Cardace 2023-06-19 14:32:37 UTC

(In reply to Denys Shchedrivyi from comment #1)
>  I verified on CNV-v4.12.4-52 - it seems not everything works as expected. 2
> questions to clarify:
> 
> 1. .spec.infra.nodePlacement.tolerations successfully applied to
> virt-controller and virt-api, but was *not applied to virt-operator*
> 
>  steps: 
>  1) set taints on the nodes:
>   $ oc adm taint nodes cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com
> key1=value1:NoSchedule
>  
>  2) added tolerations to HCO
>    infra:
>     nodePlacement:
>       tolerations:
>       - effect: NoSchedule
>         key: key1
>         operator: Equal
>         value: value1
> 
>   3) check it was propagated to kubevirt:
>    $ oc get kubevirt -o json | jq .items[0].spec.infra
>    {
>      "nodePlacement": {
>        "tolerations": [
>          {
>            "effect": "NoSchedule",
>            "key": "key1",
>            "operator": "Equal",
>            "value": "value1"
>          }
>        ]
>      }
>    }
> 
>   4) check virt pods:
>  VIRT-API and VIRT-CONTROLLER pods have it:
>    $oc get pod virt-api-769645b799-2z5kp -o json | jq .spec.tolerations
>       {
>         "effect": "NoSchedule",
>         "key": "key1",
>         "operator": "Equal",
>         "value": "value1"
>       },
>  
>   VIRT-OPERATOR - does not have it:
>    $  oc get pod virt-operator-6c675b7888-9vlsx -o json | jq
> .spec.tolerations
>    [
>      {
>        "key": "CriticalAddonsOnly",
>        "operator": "Exists"
>      },
>      {
>        "effect": "NoExecute",
>        "key": "node.kubernetes.io/not-ready",
>        "operator": "Exists",
>        "tolerationSeconds": 300
>      },
>      {
>        "effect": "NoExecute",
>        "key": "node.kubernetes.io/unreachable",
>        "operator": "Exists",
>        "tolerationSeconds": 300
>      },
>      {
>        "effect": "NoSchedule",
>        "key": "node.kubernetes.io/memory-pressure",
>        "operator": "Exists"
>      }
>    ]
> 
>   As result, if I remove virt-operator pod - it will be re-created and stuck
> in Pending state:
>     virt-operator-6c675b7888-rt8sj                         0/1     Pending  
> 0          40s
> 
>  Probably virt-operator pods should also include necessary tolerations

You're right, but I think that's trickier than expected as KubeVirt starts "living" when virt-operator starts running, so by that time virt-operator pods have already been scheduled and placed onto nodes (before the KubeVirt CR is created).

If we want to do this then HCO needs to install the virt-operator with the scheduling hints already in place in its deployment spec, so I guess a different HCO bug must be filed for this to happen.
  
> 
> 
> 
> 2. And another question about .spec.workloads 
>  When I set workloads tolerations to the hco - it is applied to virt-handler
> pods, but does not apply to virt-launcher pods, so no any VMs can be created
> on that node
>  $ oc describe pod virt-launcher-vm-label-mdfqt
>  .
>    Warning  FailedScheduling  22m   default-scheduler  0/3 nodes are
> available: 3 node(s) had untolerated taint {key1: value1}. preemption: 0/3
> nodes are available: 3 Preemption is not helpful for scheduling.

This sounds like a bug, though a different one.

In terms of this bug, I'd say everything works as expected actually even with the shortcomings you highlighted.

Comment 3 Denys Shchedrivyi 2023-06-20 17:29:57 UTC

Based on commment #2 closing this bug

For virt-launcher pods opened new one: bug 2216276

Comment 9 errata-xmlrpc 2023-06-27 19:10:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 4.12.4 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:3889