Bug 2192858 - kubevirt-job pod ignores node placement configuration
Summary: kubevirt-job pod ignores node placement configuration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 4.12.2
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: 4.12.4
Assignee: Prita Narayan
QA Contact: Denys Shchedrivyi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-03 10:50 UTC by nijin ashok
Modified: 2023-06-30 16:51 UTC (History)
5 users (show)

Fixed In Version: v4.12.4-35
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-27 19:10:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 9699 0 None Merged [release-0.58] Set infra placement on the installstrategy job 2023-05-31 12:28:53 UTC
Red Hat Issue Tracker CNV-28524 0 None None None 2023-05-03 10:51:41 UTC
Red Hat Knowledge Base (Solution) 7010990 0 None None None 2023-05-03 21:44:32 UTC
Red Hat Product Errata RHEA-2023:3889 0 None None None 2023-06-27 19:10:51 UTC

Description nijin ashok 2023-05-03 10:50:10 UTC
Description of problem:

Add tolerations in node placement rules as per https://docs.openshift.com/container-platform/4.12/virt/install/virt-specifying-nodes-for-virtualization-components.html.

These rules are not getting propagated to the kubevirt-xxx-jobxxx pod. 

When all nodes have taints set, the pod will be stuck in "Pending" status and virt-oprator will fail to deploy virt components.


~~~
2023-05-02T17:48:15.692306010Z {"component":"virt-operator","kind":"","level":"error","msg":"Waiting on install strategy to be posted from job kubevirt-6dd2f5f71de2802413f34578306382b99a65c12f-jobk8c54","name":"kubevirt-6dd2f5f71de2802413f34578306382b99a65c12f-jobk8c54","namespace":"openshift-cnv","pos":"kubevirt.go:948","timestamp":"2023-05-02T17:48:15.692261Z","uid":"2a2df195-2806-4d1a-a5d6-f135f56f44bb"}
~~~
  
Version-Release number of selected component (if applicable):

OpenShift Virtualization 4.12.2

How reproducible:

100%

Steps to Reproduce:

1. Add node placement rules on hco object.
2. Check the placement rules are applied to kubevirt-xxx-jobxxx pod.

Actual results:

kubevirt-job pod ignores node placement configuration

Expected results:

Node placement configuration should be propagated to kubevirt-job pod.

Additional info:

This looks to be already fixed upstream https://github.com/kubevirt/kubevirt/commit/166f5063d1cc839fa1811871418cc528e829ca94

Comment 1 Denys Shchedrivyi 2023-06-16 22:20:24 UTC
 I verified on CNV-v4.12.4-52 - it seems not everything works as expected. 2 questions to clarify:

1. .spec.infra.nodePlacement.tolerations successfully applied to virt-controller and virt-api, but was *not applied to virt-operator*

 steps: 
 1) set taints on the nodes:
  $ oc adm taint nodes cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com key1=value1:NoSchedule
 
 2) added tolerations to HCO
   infra:
    nodePlacement:
      tolerations:
      - effect: NoSchedule
        key: key1
        operator: Equal
        value: value1

  3) check it was propagated to kubevirt:
   $ oc get kubevirt -o json | jq .items[0].spec.infra
   {
     "nodePlacement": {
       "tolerations": [
         {
           "effect": "NoSchedule",
           "key": "key1",
           "operator": "Equal",
           "value": "value1"
         }
       ]
     }
   }

  4) check virt pods:
 VIRT-API and VIRT-CONTROLLER pods have it:
   $oc get pod virt-api-769645b799-2z5kp -o json | jq .spec.tolerations
      {
        "effect": "NoSchedule",
        "key": "key1",
        "operator": "Equal",
        "value": "value1"
      },
 
  VIRT-OPERATOR - does not have it:
   $  oc get pod virt-operator-6c675b7888-9vlsx -o json | jq .spec.tolerations
   [
     {
       "key": "CriticalAddonsOnly",
       "operator": "Exists"
     },
     {
       "effect": "NoExecute",
       "key": "node.kubernetes.io/not-ready",
       "operator": "Exists",
       "tolerationSeconds": 300
     },
     {
       "effect": "NoExecute",
       "key": "node.kubernetes.io/unreachable",
       "operator": "Exists",
       "tolerationSeconds": 300
     },
     {
       "effect": "NoSchedule",
       "key": "node.kubernetes.io/memory-pressure",
       "operator": "Exists"
     }
   ]

  As result, if I remove virt-operator pod - it will be re-created and stuck in Pending state:
    virt-operator-6c675b7888-rt8sj                         0/1     Pending   0          40s

 Probably virt-operator pods should also include necessary tolerations



2. And another question about .spec.workloads 
 When I set workloads tolerations to the hco - it is applied to virt-handler pods, but does not apply to virt-launcher pods, so no any VMs can be created on that node
 $ oc describe pod virt-launcher-vm-label-mdfqt
 .
   Warning  FailedScheduling  22m   default-scheduler  0/3 nodes are available: 3 node(s) had untolerated taint {key1: value1}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.

Comment 2 Antonio Cardace 2023-06-19 14:32:37 UTC
(In reply to Denys Shchedrivyi from comment #1)
>  I verified on CNV-v4.12.4-52 - it seems not everything works as expected. 2
> questions to clarify:
> 
> 1. .spec.infra.nodePlacement.tolerations successfully applied to
> virt-controller and virt-api, but was *not applied to virt-operator*
> 
>  steps: 
>  1) set taints on the nodes:
>   $ oc adm taint nodes cnv-qe-infra-04.cnvqe3.lab.eng.rdu2.redhat.com
> key1=value1:NoSchedule
>  
>  2) added tolerations to HCO
>    infra:
>     nodePlacement:
>       tolerations:
>       - effect: NoSchedule
>         key: key1
>         operator: Equal
>         value: value1
> 
>   3) check it was propagated to kubevirt:
>    $ oc get kubevirt -o json | jq .items[0].spec.infra
>    {
>      "nodePlacement": {
>        "tolerations": [
>          {
>            "effect": "NoSchedule",
>            "key": "key1",
>            "operator": "Equal",
>            "value": "value1"
>          }
>        ]
>      }
>    }
> 
>   4) check virt pods:
>  VIRT-API and VIRT-CONTROLLER pods have it:
>    $oc get pod virt-api-769645b799-2z5kp -o json | jq .spec.tolerations
>       {
>         "effect": "NoSchedule",
>         "key": "key1",
>         "operator": "Equal",
>         "value": "value1"
>       },
>  
>   VIRT-OPERATOR - does not have it:
>    $  oc get pod virt-operator-6c675b7888-9vlsx -o json | jq
> .spec.tolerations
>    [
>      {
>        "key": "CriticalAddonsOnly",
>        "operator": "Exists"
>      },
>      {
>        "effect": "NoExecute",
>        "key": "node.kubernetes.io/not-ready",
>        "operator": "Exists",
>        "tolerationSeconds": 300
>      },
>      {
>        "effect": "NoExecute",
>        "key": "node.kubernetes.io/unreachable",
>        "operator": "Exists",
>        "tolerationSeconds": 300
>      },
>      {
>        "effect": "NoSchedule",
>        "key": "node.kubernetes.io/memory-pressure",
>        "operator": "Exists"
>      }
>    ]
> 
>   As result, if I remove virt-operator pod - it will be re-created and stuck
> in Pending state:
>     virt-operator-6c675b7888-rt8sj                         0/1     Pending  
> 0          40s
> 
>  Probably virt-operator pods should also include necessary tolerations

You're right, but I think that's trickier than expected as KubeVirt starts "living" when virt-operator starts running, so by that time virt-operator pods have already been scheduled and placed onto nodes (before the KubeVirt CR is created).

If we want to do this then HCO needs to install the virt-operator with the scheduling hints already in place in its deployment spec, so I guess a different HCO bug must be filed for this to happen.
  
> 
> 
> 
> 2. And another question about .spec.workloads 
>  When I set workloads tolerations to the hco - it is applied to virt-handler
> pods, but does not apply to virt-launcher pods, so no any VMs can be created
> on that node
>  $ oc describe pod virt-launcher-vm-label-mdfqt
>  .
>    Warning  FailedScheduling  22m   default-scheduler  0/3 nodes are
> available: 3 node(s) had untolerated taint {key1: value1}. preemption: 0/3
> nodes are available: 3 Preemption is not helpful for scheduling.

This sounds like a bug, though a different one.

In terms of this bug, I'd say everything works as expected actually even with the shortcomings you highlighted.

Comment 3 Denys Shchedrivyi 2023-06-20 17:29:57 UTC
Based on commment #2 closing this bug

For virt-launcher pods opened new one: bug 2216276

Comment 9 errata-xmlrpc 2023-06-27 19:10:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 4.12.4 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:3889


Note You need to log in before you can comment on or make changes to this bug.