Bug 1881676 - Cordon of nodes should not trigger VMI Migration.
Summary: Cordon of nodes should not trigger VMI Migration.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.5.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 2.5.0
Assignee: Daniel Belenky
QA Contact: zhe peng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-22 20:59 UTC by Kedar Bidarkar
Modified: 2020-11-17 13:24 UTC (History)
7 users (show)

Fixed In Version: hco-bundle-registry:v2.5.0-405 hyperconverged-cluster-operator-container-v2.5.0-52
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-17 13:24:24 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt hyperconverged-cluster-operator pull 823 0 None closed Disable VMI migration upon K8s unschedulable taint 2021-01-08 10:00:52 UTC
Github kubevirt hyperconverged-cluster-operator pull 904 0 None closed [release-1.2] Disable VMI migration upon K8s unschedulable taint (#823) 2021-01-08 10:00:52 UTC
Red Hat Product Errata RHEA-2020:5127 0 None None None 2020-11-17 13:24:39 UTC

Description Kedar Bidarkar 2020-09-22 20:59:15 UTC
Description of problem:

Raising this bug, just to track the changes around Node cordon, which triggers migration. 

1) We currently trigger evacuation on Cordon too.
Looking at this bug https://bugzilla.redhat.com/show_bug.cgi?id=1740137

2) and in the kubevirt-config  cm we have the below taint key.

]$ oc get cm kubevirt-config -n openshift-cnv -o yaml | grep migration 
  migrations: '{"nodeDrainTaintKey" : "node.kubernetes.io/unschedulable"}'

With the changes around Eviction Webhook, should now trigger a migration.


Version-Release number of selected component (if applicable):
CNV-2.6

How reproducible:


Steps to Reproduce:
1. "oc adm cordon <node-name>" ( Yes, Cordon, not Drain )
2. migrations: '{"nodeDrainTaintKey" : "node.kubernetes.io/unschedulable"}' in kubevirt-config cm 
3.

Actual results:

 migrations: '{"nodeDrainTaintKey" : "node.kubernetes.io/unschedulable"}'

Currently due to the above TaintKey, it currently triggers VMI Migration even upon Cordon of nodes.

Expected results:

We may want to update the kubevirt-config cm 
and drop the taint key.

Cordon of nodes, should not trigger VMI Migration.

Additional info:

Probably we just want to revert the changes introduced by this bug
https://bugzilla.redhat.com/show_bug.cgi?id=1740137

As we have the new "Eviction Webhook".

Comment 4 zhe peng 2020-11-02 04:09:55 UTC
I can reproduce this.
verify with build hco-bundle-registry:v2.5.0-405
step:
1 create a vm, running on node
cordon node
$oc adm cordon <node-name>
check node status:
zpeng-ocp46-gs6cv-worker-0-wnx4w   Ready,SchedulingDisabled   worker   2d15h   v1.19.0+d59ce34
check vm pod
$ oc get pods
NAME                            READY   STATUS    RESTARTS   AGE
virt-launcher-vm-fedora-8dz6x   1/1     Running   0          38m
no migration start
$ oc get cm kubevirt-config -n openshift-cnv -o yaml | grep migration
no output
move to verified.

Comment 5 Ruth Netser 2020-11-10 14:57:00 UTC
Reopened on hco-bundle-registry-container-v2.5.0-427:

Node cordon triggers VMI migration

wind-template-node-cordon-and-drain-1605018436-7155707   61s   Running   10.131.1.65   cnv-qe-04.cnvqe.lab.eng.rdu2.redhat.com
wind-template-node-cordon-and-drain-1605018436-7155707   64s   Running   10.131.1.65   cnv-qe-05.cnvqe.lab.eng.rdu2.redhat.com


$ oc get pod -n virt-migration-and-maintenance-test-node-maintenance -owide
NAME                                                              READY   STATUS      RESTARTS   AGE     IP             NODE                                      NOMINATED NODE   READINESS GATES
virt-launcher-wind-template-node-cordon-and-drain-160501842dsq6   0/1     Completed   0          6m9s    10.128.3.251   cnv-qe-05.cnvqe.lab.eng.rdu2.redhat.com   <none>           <none>
virt-launcher-wind-template-node-cordon-and-drain-16050184ccwk2   1/1     Running     0          5m51s   10.131.1.67    cnv-qe-04.cnvqe.lab.eng.rdu2.redhat.com   <none>           <none>
virt-launcher-wind-template-node-cordon-and-drain-16050184d958x   0/1     Completed   0          5m59s   10.129.2.96    cnv-qe-06.cnvqe.lab.eng.rdu2.redhat.com   <none>           <none>


$ oc get virtualmachineinstancemigration -n virt-migration-and-maintenance-test-node-maintenance 
NAME                        AGE
kubevirt-evacuation-5ppcr   19s
kubevirt-evacuation-w4lpq   37s

$ oc describe virtualmachineinstancemigration -n virt-migration-and-maintenance-test-node-maintenance kubevirt-evacuation-5ppcr
Name:         kubevirt-evacuation-5ppcr
Namespace:    virt-migration-and-maintenance-test-node-maintenance
Labels:       <none>
Annotations:  kubevirt.io/latest-observed-api-version: v1alpha3
              kubevirt.io/storage-observed-api-version: v1alpha3
API Version:  kubevirt.io/v1alpha3
Kind:         VirtualMachineInstanceMigration
Metadata:
  Creation Timestamp:  2020-11-10T14:28:30Z
  Generate Name:       kubevirt-evacuation-
  Generation:          1
  Managed Fields:
    API Version:  kubevirt.io/v1alpha3
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubevirt.io/latest-observed-api-version:
          f:kubevirt.io/storage-observed-api-version:
        f:generateName:
      f:spec:
        .:
        f:vmiName:
      f:status:
        .:
        f:phase:
    Manager:         virt-controller
    Operation:       Update
    Time:            2020-11-10T14:28:39Z
  Resource Version:  37886083
  Self Link:         /apis/kubevirt.io/v1alpha3/namespaces/virt-migration-and-maintenance-test-node-maintenance/virtualmachineinstancemigrations/kubevirt-evacuation-5ppcr
  UID:               08df6e27-f3a0-4225-8c38-59d1cf37db2f
Spec:
  Vmi Name:  wind-template-node-cordon-and-drain-1605018436-7155707
Status:
  Phase:  Succeeded
Events:
  Type    Reason               Age   From                       Message
  ----    ------               ----  ----                       -------
  Normal  SuccessfulCreate     36s   virtualmachine-controller  Created migration target pod virt-launcher-wind-template-node-cordon-and-drain-16050184ccwk2
  Normal  SuccessfulHandOver   31s   virtualmachine-controller  Migration target pod is ready for preparation by virt-handler.
  Normal  SuccessfulMigration  27s   virtualmachine-controller  Source node reported migration succeeded

$ oc describe virtualmachineinstancemigration -n virt-migration-and-maintenance-test-node-maintenance kubevirt-evacuation-w4lpq
Name:         kubevirt-evacuation-w4lpq
Namespace:    virt-migration-and-maintenance-test-node-maintenance
Labels:       <none>
Annotations:  kubevirt.io/latest-observed-api-version: v1alpha3
              kubevirt.io/storage-observed-api-version: v1alpha3
API Version:  kubevirt.io/v1alpha3
Kind:         VirtualMachineInstanceMigration
Metadata:
  Creation Timestamp:  2020-11-10T14:28:12Z
  Generate Name:       kubevirt-evacuation-
  Generation:          1
  Managed Fields:
    API Version:  kubevirt.io/v1alpha3
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubevirt.io/latest-observed-api-version:
          f:kubevirt.io/storage-observed-api-version:
        f:generateName:
      f:spec:
        .:
        f:vmiName:
      f:status:
        .:
        f:phase:
    Manager:         virt-controller
    Operation:       Update
    Time:            2020-11-10T14:28:21Z
  Resource Version:  37885290
  Self Link:         /apis/kubevirt.io/v1alpha3/namespaces/virt-migration-and-maintenance-test-node-maintenance/virtualmachineinstancemigrations/kubevirt-evacuation-w4lpq
  UID:               dfda2310-b0c5-431b-a70b-22eae49fb6de
Spec:
  Vmi Name:  wind-template-node-cordon-and-drain-1605018436-7155707
Status:
  Phase:  Succeeded
Events:
  Type    Reason               Age   From                       Message
  ----    ------               ----  ----                       -------
  Normal  SuccessfulCreate     57s   virtualmachine-controller  Created migration target pod virt-launcher-wind-template-node-cordon-and-drain-160501842dsq6
  Normal  SuccessfulHandOver   52s   virtualmachine-controller  Migration target pod is ready for preparation by virt-handler.
  Normal  SuccessfulMigration  48s   virtualmachine-controller  Source node reported migration succeeded


$ oc get cm -n openshift-cnv kubevirt-config -oyaml
apiVersion: v1
data:
  default-network-interface: masquerade
  feature-gates: DataVolumes,SRIOV,LiveMigration,CPUManager,CPUNodeDiscovery,Sidecar,Snapshot
  machine-type: pc-q35-rhel8.2.0
  selinuxLauncherType: virt_launcher.process
  smbios: |-
    Family: Red Hat
    Product: Container-native virtualization
    Manufacturer: Red Hat
    Sku: 2.5.0
    Version: 2.5.0
kind: ConfigMap
metadata:
  creationTimestamp: "2020-11-05T22:23:53Z"
  labels:
    app: kubevirt-hyperconverged
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        .: {}
        f:default-network-interface: {}
        f:feature-gates: {}
        f:selinuxLauncherType: {}
        f:smbios: {}
      f:metadata:
        f:labels:
          .: {}
          f:app: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"8b934e34-7628-492b-bf9e-e4d85debb3ad"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
    manager: hyperconverged-cluster-operator
    operation: Update
    time: "2020-11-05T22:23:53Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        f:machine-type: {}
    manager: OpenAPI-Generator
    operation: Update
    time: "2020-11-05T22:53:09Z"
  name: kubevirt-config
  namespace: openshift-cnv
  ownerReferences:
  - apiVersion: hco.kubevirt.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: HyperConverged
    name: kubevirt-hyperconverged
    uid: 8b934e34-7628-492b-bf9e-e4d85debb3ad
  resourceVersion: "25322787"
  selfLink: /api/v1/namespaces/openshift-cnv/configmaps/kubevirt-config
  uid: 011f6be1-3b08-46e9-b337-377617d3e460

Comment 6 sgott 2020-11-10 15:34:13 UTC
Updated the fixed-in version to include a specific deployment of HCO.

Comment 7 sgott 2020-11-10 15:35:31 UTC
Appologies, Comment #6 was inclomplete. Can you please verify you were using hyperconverged-cluster-operator-container-v2.5.0-52 or newer when you observed this, Ruth?

Comment 9 Ruth Netser 2020-11-10 16:10:51 UTC
Moving to verified; will fix the automation test (due to bug 1888790, there are leftover migration jobs)

Comment 12 errata-xmlrpc 2020-11-17 13:24:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 2.5.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:5127


Note You need to log in before you can comment on or make changes to this bug.