Description of problem: After OCP upgrade, VMI with runStrategy: Manual and evictionStrategy: LiveMigrate is not running. Version-Release number of selected component (if applicable): OCP 4.6.16+ CNV 2.5.3 -> OCP 4.7.0 rc0 How reproducible: 100% (seen on 2 clusters) Steps to Reproduce: 1. Cluster with OCP 4.6.16+ CNV 2.5.3 2. Create a VM with runStrategy: Manual and evictionStrategy: LiveMigrate The VM is using a RWX PVC 3. Upgrade OCP Actual results: VMI pahse is "Succeeded" The VMI is not running Expected results: The VMI should be running throughout the upgrade and after it Additional info: $ oc get vmi rhel8-nfs 3h49m Succeeded 10.131.1.93 ssp05-4fpbn-worker-0-vsfvz $ oc get events -A | grep -vi normal | grep rhel8-nfs b4-upgrade 158m Warning SyncFailed virtualmachineinstance/rhel8-nfs unknown error encountered sending command SyncVMI: rpc error: code = DeadlineExceeded desc = context deadline exceeded b4-upgrade 165m Warning NodeNotReady pod/virt-launcher-rhel8-nfs-6l62f Node is not ready 153m Warning NodeNotReady pod/virt-launcher-rhel8-nfs-6l62f Node is not ready 148m Normal TaintManagerEviction pod/virt-launcher-rhel8-nfs-6l62f Marking for deletion Pod b4-upgrade/virt-launcher-rhel8-nfs-6l62f 144m Normal TaintManagerEviction pod/virt-launcher-rhel8-nfs-6l62f Cancelling deletion of Pod b4-upgrade/virt-launcher-rhel8-nfs-6l62f 144m Normal Killing pod/virt-launcher-rhel8-nfs-6l62f Stopping container compute =========================== $ oc describe vmi rhel8-nfs Name: rhel8-nfs Namespace: b4-upgrade Labels: flavor.template.kubevirt.io/tiny=true kubevirt.io/domain=rhel8-nfs kubevirt.io/nodeName=ssp05-4fpbn-worker-0-vsfvz kubevirt.io/size=tiny os.template.kubevirt.io/rhel8.3=true vm.kubevirt.io/name=rhel8-nfs workload.template.kubevirt.io/server=true Annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 API Version: kubevirt.io/v1alpha3 Kind: VirtualMachineInstance Metadata: Creation Timestamp: 2021-02-10T11:55:04Z Generation: 45 Managed Fields: API Version: kubevirt.io/v1alpha3 Fields Type: FieldsV1 fieldsV1: f:status: f:interfaces: f:migrationMethod: f:phase: Manager: virt-handler Operation: Update Time: 2021-02-10T13:11:40Z API Version: kubevirt.io/v1alpha3 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:kubevirt.io/latest-observed-api-version: f:kubevirt.io/storage-observed-api-version: f:labels: .: f:flavor.template.kubevirt.io/tiny: f:kubevirt.io/domain: f:kubevirt.io/nodeName: f:kubevirt.io/size: f:os.template.kubevirt.io/rhel8.3: f:vm.kubevirt.io/name: f:workload.template.kubevirt.io/server: f:ownerReferences: f:spec: .: f:domain: .: f:cpu: .: f:cores: f:sockets: f:threads: f:devices: .: f:disks: f:interfaces: f:networkInterfaceMultiqueue: f:rng: f:firmware: .: f:uuid: f:machine: .: f:type: f:resources: .: f:requests: .: f:memory: f:evictionStrategy: f:hostname: f:networks: f:terminationGracePeriodSeconds: f:volumes: f:status: .: f:conditions: f:guestOSInfo: f:nodeName: f:qosClass: Manager: virt-controller Operation: Update Time: 2021-02-10T13:11:43Z Owner References: API Version: kubevirt.io/v1alpha3 Block Owner Deletion: true Controller: true Kind: VirtualMachine Name: rhel8-nfs UID: e715a3b4-4b49-4da6-a527-c903a3990e77 Resource Version: 997581 Self Link: /apis/kubevirt.io/v1alpha3/namespaces/b4-upgrade/virtualmachineinstances/rhel8-nfs UID: b6e26b13-77cd-4382-b3d4-abcfbb0b14b5 Spec: Domain: Cpu: Cores: 1 Sockets: 1 Threads: 1 Devices: Disks: Disk: Bus: virtio Name: cloudinitdisk Boot Order: 1 Disk: Bus: virtio Name: rootdisk Interfaces: Masquerade: Model: virtio Name: nic-0 Network Interface Multiqueue: true Rng: Features: Acpi: Enabled: true Firmware: Uuid: 8ce719db-adcd-5abe-99ec-813760c30897 Machine: Type: pc-q35-rhel8.2.0 Resources: Requests: Cpu: 100m Memory: 1536Mi Eviction Strategy: LiveMigrate Hostname: rhel8-nfs Networks: Name: nic-0 Pod: Termination Grace Period Seconds: 180 Volumes: Cloud Init No Cloud: User Data: #cloud-config user: cloud-user password: redhat chpasswd: expire: false Name: cloudinitdisk Data Volume: Name: rhel8-nfs-rootdisk-9k9cs Name: rootdisk Status: Conditions: Last Probe Time: <nil> Last Transition Time: <nil> Status: True Type: LiveMigratable Guest OS Info: Interfaces: Interface Name: eth0 Ip Address: 10.131.1.93 Ip Addresses: 10.131.1.93 Mac: 02:00:00:6a:8d:cd Name: nic-0 Migration Method: BlockMigration Node Name: ssp05-4fpbn-worker-0-vsfvz Phase: Succeeded Qos Class: Burstable Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 3h48m disruptionbudget-controller Created PodDisruptionBudget kubevirt-disruption-budget-szvgb Normal SuccessfulCreate 3h48m virtualmachine-controller Created virtual machine pod virt-launcher-rhel8-nfs-6l62f Normal Started 3h48m virt-handler VirtualMachineInstance started. Normal Created 163m (x84 over 3h48m) virt-handler VirtualMachineInstance defined. Warning SyncFailed 154m (x2 over 155m) virt-handler unknown error encountered sending command SyncVMI: rpc error: code = DeadlineExceeded desc = context deadline exceeded =========================== $ oc get vm rhel8-nfs -oyaml apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: annotations: kubevirt.io/latest-observed-api-version: v1alpha3 kubevirt.io/storage-observed-api-version: v1alpha3 name.os.template.kubevirt.io/rhel8.3: Red Hat Enterprise Linux 8.0 or higher creationTimestamp: "2021-02-10T10:56:15Z" generation: 4 labels: app: rhel8-nfs flavor.template.kubevirt.io/tiny: "true" os.template.kubevirt.io/rhel8.3: "true" vm.kubevirt.io/template: rhel8-server-tiny-v0.11.3 vm.kubevirt.io/template.namespace: openshift vm.kubevirt.io/template.revision: "1" vm.kubevirt.io/template.version: v0.12.4 workload.template.kubevirt.io/server: "true" managedFields: - apiVersion: kubevirt.io/v1alpha3 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:name.os.template.kubevirt.io/rhel8.3: {} f:labels: .: {} f:app: {} f:flavor.template.kubevirt.io/tiny: {} f:os.template.kubevirt.io/rhel8.3: {} f:vm.kubevirt.io/template: {} f:vm.kubevirt.io/template.namespace: {} f:vm.kubevirt.io/template.revision: {} f:vm.kubevirt.io/template.version: {} f:workload.template.kubevirt.io/server: {} f:spec: .: {} f:dataVolumeTemplates: {} f:runStrategy: {} f:template: .: {} f:metadata: .: {} f:labels: .: {} f:flavor.template.kubevirt.io/tiny: {} f:kubevirt.io/domain: {} f:kubevirt.io/size: {} f:os.template.kubevirt.io/rhel8.3: {} f:vm.kubevirt.io/name: {} f:workload.template.kubevirt.io/server: {} f:spec: .: {} f:domain: .: {} f:cpu: .: {} f:cores: {} f:sockets: {} f:threads: {} f:devices: .: {} f:disks: {} f:interfaces: {} f:networkInterfaceMultiqueue: {} f:rng: {} f:machine: .: {} f:type: {} f:resources: .: {} f:requests: .: {} f:memory: {} f:evictionStrategy: {} f:hostname: {} f:networks: {} f:terminationGracePeriodSeconds: {} f:volumes: {} manager: Mozilla operation: Update time: "2021-02-10T11:55:00Z" - apiVersion: kubevirt.io/v1alpha3 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: f:kubevirt.io/latest-observed-api-version: {} f:kubevirt.io/storage-observed-api-version: {} f:status: .: {} f:created: {} manager: virt-controller operation: Update time: "2021-02-10T13:11:40Z" name: rhel8-nfs namespace: b4-upgrade resourceVersion: "997362" selfLink: /apis/kubevirt.io/v1alpha3/namespaces/b4-upgrade/virtualmachines/rhel8-nfs uid: e715a3b4-4b49-4da6-a527-c903a3990e77 spec: dataVolumeTemplates: - apiVersion: cdi.kubevirt.io/v1alpha1 kind: DataVolume metadata: creationTimestamp: null name: rhel8-nfs-rootdisk-9k9cs spec: pvc: accessModes: - ReadWriteMany resources: requests: storage: 30Gi storageClassName: nfs volumeMode: Filesystem source: pvc: name: rhel8 namespace: openshift-virtualization-os-images runStrategy: Manual template: metadata: creationTimestamp: null labels: flavor.template.kubevirt.io/tiny: "true" kubevirt.io/domain: rhel8-nfs kubevirt.io/size: tiny os.template.kubevirt.io/rhel8.3: "true" vm.kubevirt.io/name: rhel8-nfs workload.template.kubevirt.io/server: "true" spec: domain: cpu: cores: 1 sockets: 1 threads: 1 devices: disks: - disk: bus: virtio name: cloudinitdisk - bootOrder: 1 disk: bus: virtio name: rootdisk interfaces: - masquerade: {} model: virtio name: nic-0 networkInterfaceMultiqueue: true rng: {} machine: type: pc-q35-rhel8.2.0 resources: requests: memory: 1536Mi evictionStrategy: LiveMigrate hostname: rhel8-nfs networks: - name: nic-0 pod: {} terminationGracePeriodSeconds: 180 volumes: - cloudInitNoCloud: userData: | #cloud-config user: cloud-user password: redhat chpasswd: expire: false name: cloudinitdisk - dataVolume: name: rhel8-nfs-rootdisk-9k9cs name: rootdisk status: created: true =========================== ===========================
This excerpt from an email thread is a comment from Roman: -------------- Was looking with Ruth at the latest occurrence: In the events it looks like a taint on the nodes seems to cause the pods to be deleted before the migration via evictions kicks in: ``` 153m Warning NodeNotReady pod/virt-launcher-rhel8-nfs-6l62f Node is not ready 148m Normal TaintManagerEviction pod/virt-launcher-rhel8-nfs-6l62f Marking for deletion Pod b4-upgrade/virt-launcher-rhel8-nfs-6l62f 144m Normal TaintManagerEviction pod/virt-launcher-rhel8-nfs-6l62f Cancelling deletion of Pod b4-upgrade/virt-launcher-rhel8-nfs-6l62f 144m Normal Killing pod/virt-launcher-rhel8-nfs-6l62f Stopping container compute ``` -------------- So we know that the VMI was stopped because the node was tainted. The VMI wasn't re-started because the RunStrategy was set to Manual. We need to ascertain what that taint was and why it was added. Ruth, are you able to reproduce this error, and note what taints are being applied to the unresponsive nodes?
Reproduced, taint is added, virt-launcher pods are terminating, no live migration: VMI is running on node rhel8-nfs 42m Running 10.131.0.47 ssp04-rvkqg-worker-0-zvfkw rhel8-nfs 42m Running 10.131.0.47 ssp04-rvkqg-worker-0-zvfkw During the upgrade, the following taints are added to the node: Taints: node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unreachable:NoSchedule Unschedulable: false $ oc get node ssp04-rvkqg-worker-0-zvfkw NAME STATUS ROLES AGE VERSION ssp04-rvkqg-worker-0-zvfkw NotReady worker 22h v1.19.0+e49167a virt-launcher pods of the VMIs running on the node are Terminating, live migration is not performed: =========================================== $ oc get pod NAME READY STATUS RESTARTS AGE virt-launcher-rhel8-nfs-rr9tx 1/1 Terminating 0 102m virt-launcher-win10-ocs-2kx2v 1/1 Terminating 0 89m =========================================== $ oc describe node ssp04-rvkqg-worker-0-zvfkw Name: ssp04-rvkqg-worker-0-zvfkw Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=ci.nested.virt.m1.xlarge beta.kubernetes.io/os=linux cluster.ocs.openshift.io/openshift-storage= cpumanager=true failure-domain.beta.kubernetes.io/zone=nova feature.node.kubernetes.io/cpu-feature-aes=true feature.node.kubernetes.io/cpu-feature-avx=true feature.node.kubernetes.io/cpu-feature-avx2=true feature.node.kubernetes.io/cpu-feature-bmi1=true feature.node.kubernetes.io/cpu-feature-bmi2=true feature.node.kubernetes.io/cpu-feature-erms=true feature.node.kubernetes.io/cpu-feature-f16c=true feature.node.kubernetes.io/cpu-feature-fma=true feature.node.kubernetes.io/cpu-feature-fsgsbase=true feature.node.kubernetes.io/cpu-feature-invpcid=true feature.node.kubernetes.io/cpu-feature-movbe=true feature.node.kubernetes.io/cpu-feature-pcid=true feature.node.kubernetes.io/cpu-feature-pclmuldq=true feature.node.kubernetes.io/cpu-feature-popcnt=true feature.node.kubernetes.io/cpu-feature-rdrand=true feature.node.kubernetes.io/cpu-feature-rdtscp=true feature.node.kubernetes.io/cpu-feature-smep=true feature.node.kubernetes.io/cpu-feature-spec-ctrl=true feature.node.kubernetes.io/cpu-feature-sse4.2=true feature.node.kubernetes.io/cpu-feature-svm=true feature.node.kubernetes.io/cpu-feature-tsc-deadline=true feature.node.kubernetes.io/cpu-feature-vme=true feature.node.kubernetes.io/cpu-feature-x2apic=true feature.node.kubernetes.io/cpu-feature-xsave=true feature.node.kubernetes.io/cpu-model-Haswell-noTSX=true feature.node.kubernetes.io/cpu-model-Haswell-noTSX-IBRS=true feature.node.kubernetes.io/cpu-model-IvyBridge=true feature.node.kubernetes.io/cpu-model-IvyBridge-IBRS=true feature.node.kubernetes.io/cpu-model-Nehalem=true feature.node.kubernetes.io/cpu-model-Nehalem-IBRS=true feature.node.kubernetes.io/cpu-model-Opteron_G1=true feature.node.kubernetes.io/cpu-model-Opteron_G2=true feature.node.kubernetes.io/cpu-model-Penryn=true feature.node.kubernetes.io/cpu-model-SandyBridge=true feature.node.kubernetes.io/cpu-model-SandyBridge-IBRS=true feature.node.kubernetes.io/cpu-model-Westmere=true feature.node.kubernetes.io/cpu-model-Westmere-IBRS=true feature.node.kubernetes.io/cpu-model-kvm32=true feature.node.kubernetes.io/cpu-model-kvm64=true feature.node.kubernetes.io/cpu-model-qemu32=true feature.node.kubernetes.io/cpu-model-qemu64=true feature.node.kubernetes.io/kvm-info-cap-hyperv-base=true feature.node.kubernetes.io/kvm-info-cap-hyperv-frequencies=true feature.node.kubernetes.io/kvm-info-cap-hyperv-ipi=true feature.node.kubernetes.io/kvm-info-cap-hyperv-reenlightenment=true feature.node.kubernetes.io/kvm-info-cap-hyperv-reset=true feature.node.kubernetes.io/kvm-info-cap-hyperv-runtime=true feature.node.kubernetes.io/kvm-info-cap-hyperv-synic=true feature.node.kubernetes.io/kvm-info-cap-hyperv-synic2=true feature.node.kubernetes.io/kvm-info-cap-hyperv-synictimer=true feature.node.kubernetes.io/kvm-info-cap-hyperv-time=true feature.node.kubernetes.io/kvm-info-cap-hyperv-tlbflush=true feature.node.kubernetes.io/kvm-info-cap-hyperv-vpindex=true kubernetes.io/arch=amd64 kubernetes.io/hostname=ssp04-rvkqg-worker-0-zvfkw kubernetes.io/os=linux kubevirt.io/schedulable=false node-role.kubernetes.io/worker= node.kubernetes.io/instance-type=ci.nested.virt.m1.xlarge node.openshift.io/os_id=rhcos topology.cinder.csi.openstack.org/zone=nova topology.kubernetes.io/zone=nova topology.rook.io/rack=rack2 Annotations: csi.volume.kubernetes.io/nodeid: {"cinder.csi.openstack.org":"885b9301-803e-43b9-a6c8-c07324f015e6","manila.csi.openstack.org":"ssp04-rvkqg-worker-0-zvfkw","openshift-stor... kubevirt.io/heartbeat: 2021-02-11T14:46:59Z machine.openshift.io/machine: openshift-machine-api/ssp04-rvkqg-worker-0-zvfkw machineconfiguration.openshift.io/currentConfig: rendered-worker-e32d0c5fbb975fcd1cd2ff67ba0f0965 machineconfiguration.openshift.io/desiredConfig: rendered-worker-e32d0c5fbb975fcd1cd2ff67ba0f0965 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done node-labeller-feature.node.kubernetes.io/cpu-feature-aes: true node-labeller-feature.node.kubernetes.io/cpu-feature-avx: true node-labeller-feature.node.kubernetes.io/cpu-feature-avx2: true node-labeller-feature.node.kubernetes.io/cpu-feature-bmi1: true node-labeller-feature.node.kubernetes.io/cpu-feature-bmi2: true node-labeller-feature.node.kubernetes.io/cpu-feature-erms: true node-labeller-feature.node.kubernetes.io/cpu-feature-f16c: true node-labeller-feature.node.kubernetes.io/cpu-feature-fma: true node-labeller-feature.node.kubernetes.io/cpu-feature-fsgsbase: true node-labeller-feature.node.kubernetes.io/cpu-feature-invpcid: true node-labeller-feature.node.kubernetes.io/cpu-feature-movbe: true node-labeller-feature.node.kubernetes.io/cpu-feature-pcid: true node-labeller-feature.node.kubernetes.io/cpu-feature-pclmuldq: true node-labeller-feature.node.kubernetes.io/cpu-feature-popcnt: true node-labeller-feature.node.kubernetes.io/cpu-feature-rdrand: true node-labeller-feature.node.kubernetes.io/cpu-feature-rdtscp: true node-labeller-feature.node.kubernetes.io/cpu-feature-smep: true node-labeller-feature.node.kubernetes.io/cpu-feature-spec-ctrl: true node-labeller-feature.node.kubernetes.io/cpu-feature-sse4.2: true node-labeller-feature.node.kubernetes.io/cpu-feature-svm: true node-labeller-feature.node.kubernetes.io/cpu-feature-tsc-deadline: true node-labeller-feature.node.kubernetes.io/cpu-feature-vme: true node-labeller-feature.node.kubernetes.io/cpu-feature-x2apic: true node-labeller-feature.node.kubernetes.io/cpu-feature-xsave: true node-labeller-feature.node.kubernetes.io/cpu-model-Haswell-noTSX: true node-labeller-feature.node.kubernetes.io/cpu-model-Haswell-noTSX-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-IvyBridge: true node-labeller-feature.node.kubernetes.io/cpu-model-IvyBridge-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-Nehalem: true node-labeller-feature.node.kubernetes.io/cpu-model-Nehalem-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-Opteron_G1: true node-labeller-feature.node.kubernetes.io/cpu-model-Opteron_G2: true node-labeller-feature.node.kubernetes.io/cpu-model-Penryn: true node-labeller-feature.node.kubernetes.io/cpu-model-SandyBridge: true node-labeller-feature.node.kubernetes.io/cpu-model-SandyBridge-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-Westmere: true node-labeller-feature.node.kubernetes.io/cpu-model-Westmere-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-kvm32: true node-labeller-feature.node.kubernetes.io/cpu-model-kvm64: true node-labeller-feature.node.kubernetes.io/cpu-model-qemu32: true node-labeller-feature.node.kubernetes.io/cpu-model-qemu64: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-base: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-frequencies: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-ipi: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-reenlightenment: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-reset: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-runtime: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-synic: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-synic2: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-synictimer: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-time: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-tlbflush: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-vpindex: true volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Wed, 10 Feb 2021 17:12:28 +0000 Taints: node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unreachable:NoSchedule Unschedulable: false Lease: HolderIdentity: ssp04-rvkqg-worker-0-zvfkw AcquireTime: <unset> RenewTime: Thu, 11 Feb 2021 14:47:18 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure Unknown Thu, 11 Feb 2021 14:47:20 +0000 Thu, 11 Feb 2021 14:48:01 +0000 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Thu, 11 Feb 2021 14:47:20 +0000 Thu, 11 Feb 2021 14:48:01 +0000 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown Thu, 11 Feb 2021 14:47:20 +0000 Thu, 11 Feb 2021 14:48:01 +0000 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Thu, 11 Feb 2021 14:47:20 +0000 Thu, 11 Feb 2021 14:48:01 +0000 NodeStatusUnknown Kubelet stopped posting node status. Addresses: InternalIP: 192.168.1.21 Hostname: ssp04-rvkqg-worker-0-zvfkw Capacity: attachable-volumes-cinder: 256 cpu: 8 devices.kubevirt.io/kvm: 110 devices.kubevirt.io/tun: 110 devices.kubevirt.io/vhost-net: 110 ephemeral-storage: 41391084Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16418260Ki ovs-cni.network.kubevirt.io/br0: 1k pods: 250 Allocatable: attachable-volumes-cinder: 256 cpu: 7500m devices.kubevirt.io/kvm: 110 devices.kubevirt.io/tun: 110 devices.kubevirt.io/vhost-net: 110 ephemeral-storage: 37072281128 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 15267284Ki ovs-cni.network.kubevirt.io/br0: 1k pods: 250 System Info: Machine ID: 885b9301803e43b9a6c8c07324f015e6 System UUID: 885b9301-803e-43b9-a6c8-c07324f015e6 Boot ID: 8e64b1bd-f422-411c-a633-c451e40469ec Kernel Version: 4.18.0-193.41.1.el8_2.x86_64 OS Image: Red Hat Enterprise Linux CoreOS 46.82.202101301821-0 (Ootpa) Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.19.1-7.rhaos4.6.git6377f68.el8 Kubelet Version: v1.19.0+e49167a Kube-Proxy Version: v1.19.0+e49167a ProviderID: openstack:///885b9301-803e-43b9-a6c8-c07324f015e6 Non-terminated Pods: (70 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- b4-upgarde virt-launcher-rhel8-nfs-rr9tx 100m (1%) 100m (1%) 1709Mi (11%) 40M (0%) 95m b4-upgarde virt-launcher-win10-ocs-2kx2v 100m (1%) 100m (1%) 4481613825 (28%) 40M (0%) 82m openshift-cluster-csi-drivers openstack-cinder-csi-driver-node-h4pp8 20m (0%) 0 (0%) 100Mi (0%) 0 (0%) 41m openshift-cluster-node-tuning-operator tuned-7ll2f 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 40m openshift-cnv bridge-marker-n9pqq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-cnv cdi-apiserver-fd7f4fb6-q2b8v 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-cnv cdi-deployment-65c8dbfdc7-hxms2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-cnv cdi-uploadproxy-c9ff9fc78-qp99p 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-cnv cluster-network-addons-operator-69dd97cb44-l26h9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 14m openshift-cnv hostpath-provisioner-operator-55d7b8b595-zhbk9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 14m openshift-cnv hostpath-provisioner-vhvr7 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-cnv kube-cni-linux-bridge-plugin-zvvcf 60m (0%) 0 (0%) 30Mi (0%) 0 (0%) 20h openshift-cnv kubevirt-node-labeller-qmmb6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-cnv nmstate-handler-ssqsc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-cnv ovs-cni-amd64-t8cbz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-cnv virt-api-6d765b6dd5-t2v5l 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-cnv virt-controller-68985f6974-mv2vb 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-cnv virt-handler-mzfgg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-cnv virt-operator-7d57664b89-nzjcl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 14m openshift-cnv virt-template-validator-65dbfc87d8-56dks 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-cnv virt-template-validator-65dbfc87d8-89rvn 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7m25s openshift-cnv vm-import-controller-74d785b999-4vrkx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-cnv vm-import-operator-7d856d4f4b-kdzh6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7m25s openshift-dns dns-default-pn9k5 65m (0%) 0 (0%) 110Mi (0%) 512Mi (3%) 20m openshift-image-registry node-ca-lmnqz 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 40m openshift-ingress-canary ingress-canary-gn4fs 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 41m openshift-ingress router-default-6c5c8d967b-vm8qb 100m (1%) 0 (0%) 256Mi (1%) 0 (0%) 14m openshift-local-storage local-block-local-diskmaker-8zkvh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-local-storage local-block-local-provisioner-phdhq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-machine-config-operator machine-config-daemon-sqjbv 40m (0%) 0 (0%) 100Mi (0%) 0 (0%) 17m openshift-manila-csi-driver csi-nodeplugin-nfsplugin-frgkn 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 40m openshift-manila-csi-driver openstack-manila-csi-nodeplugin-v9889 15m (0%) 0 (0%) 70Mi (0%) 0 (0%) 40m openshift-marketplace certified-operators-5ghdn 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 13m openshift-marketplace hco-catalogsource-s5p9d 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 20h openshift-marketplace ocs-catalogsource-m5974 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 7m25s openshift-marketplace redhat-marketplace-qfmcp 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 7m26s openshift-marketplace redhat-operators-vhszj 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 14m openshift-monitoring alertmanager-main-0 8m (0%) 0 (0%) 270Mi (1%) 0 (0%) 14m openshift-monitoring alertmanager-main-1 8m (0%) 0 (0%) 270Mi (1%) 0 (0%) 40m openshift-monitoring kube-state-metrics-6c798c69f5-sdkph 4m (0%) 0 (0%) 120Mi (0%) 0 (0%) 7m28s openshift-monitoring node-exporter-x6lq2 9m (0%) 0 (0%) 210Mi (1%) 0 (0%) 41m openshift-monitoring openshift-state-metrics-6bd7d6f65-b7c9x 3m (0%) 0 (0%) 190Mi (1%) 0 (0%) 14m openshift-monitoring prometheus-adapter-6575b658bf-qnxj6 1m (0%) 0 (0%) 25Mi (0%) 0 (0%) 14m openshift-monitoring prometheus-adapter-6575b658bf-rzxht 1m (0%) 0 (0%) 25Mi (0%) 0 (0%) 7m27s openshift-monitoring prometheus-k8s-0 76m (1%) 0 (0%) 1204Mi (8%) 0 (0%) 7m26s openshift-monitoring prometheus-k8s-1 76m (1%) 0 (0%) 1204Mi (8%) 0 (0%) 41m openshift-monitoring thanos-querier-565fc8859b-pkzx2 9m (0%) 0 (0%) 92Mi (0%) 0 (0%) 14m openshift-multus multus-4w72h 10m (0%) 0 (0%) 150Mi (1%) 0 (0%) 24m openshift-multus network-metrics-daemon-nm5zq 20m (0%) 0 (0%) 120Mi (0%) 0 (0%) 25m openshift-network-diagnostics network-check-target-4bsnh 10m (0%) 0 (0%) 15Mi (0%) 0 (0%) 27m openshift-openstack-infra coredns-ssp04-rvkqg-worker-0-zvfkw 100m (1%) 0 (0%) 200Mi (1%) 0 (0%) 21h openshift-openstack-infra keepalived-ssp04-rvkqg-worker-0-zvfkw 200m (2%) 0 (0%) 400Mi (2%) 0 (0%) 21h openshift-openstack-infra mdns-publisher-ssp04-rvkqg-worker-0-zvfkw 100m (1%) 0 (0%) 200Mi (1%) 0 (0%) 21h openshift-sdn ovs-k77kz 15m (0%) 0 (0%) 400Mi (2%) 0 (0%) 23m openshift-sdn sdn-srksb 110m (1%) 0 (0%) 220Mi (1%) 0 (0%) 27m openshift-storage csi-cephfsplugin-provisioner-6bc7cbf6f-88vxh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 14m openshift-storage csi-cephfsplugin-xjcf6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-storage csi-rbdplugin-8rtwc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-storage csi-rbdplugin-provisioner-7699b8c4b8-vtf9b 0 (0%) 0 (0%) 0 (0%) 0 (0%) 14m openshift-storage noobaa-endpoint-7584c5969f-rwn7z 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7m25s openshift-storage noobaa-operator-5855b5688-cfn67 0 (0%) 0 (0%) 0 (0%) 0 (0%) 14m openshift-storage ocs-metrics-exporter-5d66d5fc59-fbqm6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7m27s openshift-storage ocs-operator-cd5b866f5-29q72 0 (0%) 0 (0%) 0 (0%) 0 (0%) 7m26s openshift-storage rook-ceph-crashcollector-ssp04-rvkqg-worker-0-zvfkw-666867qr7vg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-storage rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-786bdbf7qv4qj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 14m openshift-storage rook-ceph-mgr-a-5576d7458b-sdsz2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 14m openshift-storage rook-ceph-mon-b-76cb57664f-v7bhl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-storage rook-ceph-operator-6cfc658d4b-prp6k 0 (0%) 0 (0%) 0 (0%) 0 (0%) 14m openshift-storage rook-ceph-osd-1-5946d84494-rkmbx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 20h openshift-storage rook-ceph-rgw-ocs-storagecluster-cephobjectstore-b-8b6b579tmgtf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1350m (18%) 200m (2%) memory 12943622145 (82%) 616870912 (3%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) attachable-volumes-cinder 0 0 devices.kubevirt.io/kvm 2 2 devices.kubevirt.io/tun 2 2 devices.kubevirt.io/vhost-net 1 1 ovs-cni.network.kubevirt.io/br0 0 0 Events: <none> =========================================== $ oc describe pod virt-launcher-rhel8-nfs-rr9tx Name: virt-launcher-rhel8-nfs-rr9tx Namespace: b4-upgarde Priority: 0 Node: ssp04-rvkqg-worker-0-zvfkw/192.168.1.21 Start Time: Thu, 11 Feb 2021 13:17:48 +0000 Labels: kubevirt.io=virt-launcher kubevirt.io/created-by=fee21d27-2344-4720-8472-c7b9550c1d71 kubevirt.io/domain=rhel8-nfs kubevirt.io/size=tiny Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "", "interface": "eth0", "ips": [ "10.131.0.47" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "", "interface": "eth0", "ips": [ "10.131.0.47" ], "default": true, "dns": {} }] kubevirt.io/domain: rhel8-nfs openshift.io/scc: kubevirt-controller traffic.sidecar.istio.io/kubevirtInterfaces: k6t-eth0 Status: Terminating (lasts 4m55s) Termination Grace Period: 210s IP: 10.131.0.47 IPs: IP: 10.131.0.47 Controlled By: VirtualMachineInstance/rhel8-nfs Init Containers: container-disk-binary: Container ID: cri-o://c2d1b35e4ba2ad1ac71bb84e506375ead0baf769591313cae4f209fc37cadfd1 Image: registry.redhat.io/container-native-virtualization/virt-launcher@sha256:61b2083d39a867d87b09d56ab8eaca4734ffddfe02e4b25ac50798f7b672811b Image ID: registry.redhat.io/container-native-virtualization/virt-launcher@sha256:61b2083d39a867d87b09d56ab8eaca4734ffddfe02e4b25ac50798f7b672811b Port: <none> Host Port: <none> Command: /usr/bin/cp /usr/bin/container-disk /init/usr/bin/container-disk State: Terminated Reason: Completed Exit Code: 0 Started: Thu, 11 Feb 2021 13:17:52 +0000 Finished: Thu, 11 Feb 2021 13:17:52 +0000 Ready: True Restart Count: 0 Limits: cpu: 100m memory: 40M Requests: cpu: 10m memory: 1M Environment: <none> Mounts: /init/usr/bin from virt-bin-share-dir (rw) Containers: compute: Container ID: cri-o://b9b7d0d53452e821c48c3b1e7f90b41853d5442f9811b1c4f7de076933711f2a Image: registry.redhat.io/container-native-virtualization/virt-launcher@sha256:61b2083d39a867d87b09d56ab8eaca4734ffddfe02e4b25ac50798f7b672811b Image ID: registry.redhat.io/container-native-virtualization/virt-launcher@sha256:61b2083d39a867d87b09d56ab8eaca4734ffddfe02e4b25ac50798f7b672811b Port: <none> Host Port: <none> Command: /usr/bin/virt-launcher --qemu-timeout 5m --name rhel8-nfs --uid fee21d27-2344-4720-8472-c7b9550c1d71 --namespace b4-upgarde --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 195 --hook-sidecars 0 --less-pvc-space-toleration 10 --ovmf-path /usr/share/OVMF State: Running Started: Thu, 11 Feb 2021 13:17:52 +0000 Ready: True Restart Count: 0 Limits: devices.kubevirt.io/kvm: 1 devices.kubevirt.io/tun: 1 devices.kubevirt.io/vhost-net: 1 Requests: cpu: 100m devices.kubevirt.io/kvm: 1 devices.kubevirt.io/tun: 1 devices.kubevirt.io/vhost-net: 1 memory: 1709Mi Environment: <none> Mounts: /var/run/kubevirt-ephemeral-disks from ephemeral-disks (rw) /var/run/kubevirt-private/vmi-disks/rootdisk from rootdisk (rw) /var/run/kubevirt/container-disks from container-disks (rw) /var/run/kubevirt/sockets from sockets (rw) /var/run/libvirt from libvirt-runtime (rw) Conditions: Type Status Initialized True Ready False ContainersReady True PodScheduled True Volumes: sockets: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> rootdisk: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: rhel8 ReadOnly: false virt-bin-share-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> libvirt-runtime: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> ephemeral-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> container-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> QoS Class: Burstable Node-Selectors: kubevirt.io/schedulable=true Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 103m default-scheduler Successfully assigned b4-upgarde/virt-launcher-rhel8-nfs-rr9tx to ssp04-rvkqg-worker-0-zvfkw Normal AddedInterface 103m multus Add eth0 [10.131.0.47/23] Normal Pulled 103m kubelet Container image "registry.redhat.io/container-native-virtualization/virt-launcher@sha256:61b2083d39a867d87b09d56ab8eaca4734ffddfe02e4b25ac50798f7b672811b" already present on machine Normal Created 103m kubelet Created container container-disk-binary Normal Started 103m kubelet Started container container-disk-binary Normal Pulled 103m kubelet Container image "registry.redhat.io/container-native-virtualization/virt-launcher@sha256:61b2083d39a867d87b09d56ab8eaca4734ffddfe02e4b25ac50798f7b672811b" already present on machine Normal Created 103m kubelet Created container compute Normal Started 103m kubelet Started container compute Warning NodeNotReady 13m node-controller Node is not ready ===========================================
Ruth, is this only for runStrategy: Manual or also happening with running: true?
Happens for either runStrategy (Manual or Always) or running. With runStrategy:Manual, the VMI is not restarted. With runStrategy:Always/running:true, the VMI is restarted. Steps to reproduce: * Start a VMI rhel8-nfs 14s Running 10.128.3.14 ssp04-rvkqg-worker-0-9knk9 * Taint the node the VMI is runnign on: oc adm taint nodes <node name> kubevirt.io/drain="":NoExecute * virt-launcher pod is terminaing virt-launcher-rhel8-nfs-gfstw 1/1 Terminating 0 102s * A new pod is created virt-launcher-rhel8-nfs-prqk5 1/1 Running 0 21s * No live migration.
Ruth, it looks like the taint used to reproduce this issue with `runStrategy:Always/running:true` is different than that of the issue with `runStrategy:Manual`. Does this issue occur when using taint `node.kubernetes.io/unreachable:NoExecute` with `runStrategy:Always/running: true` as well? The example above uses `kubevirt.io/drain=""` which seems like it would be a different issue?
(In reply to aschuett from comment #8) > Ruth, it looks like the taint used to reproduce this issue with > `runStrategy:Always/running:true` is different than that of the issue with > `runStrategy:Manual`. > > Does this issue occur when using taint > `node.kubernetes.io/unreachable:NoExecute` with `runStrategy:Always/running: > true` as well? The example above uses `kubevirt.io/drain=""` which seems > like it would be a different issue? Ruth just verified that *a* NoExec taint has the effect of bluntly deleting pods instead of doing a /evict call through the API.
So, talked with Ryan and also checked the taint again: Rayn confirms that during a normal healthy upgrade no NoExecute taints are applied. Further after he asked me again which taint it is which we see applied, I realized that it is `node.kubernetes.io/unreachable:NoExecute` and NOT `node.kubernetes.io/unschedulable:NoExecute`. This means that everything is fine on our side. The NotReady of the node is clearly the reason for this being added.
Testing with small VMs (2 VMs running with Cirros DVs), the VMIs were live migrated.
We have the final confirmation that, if there is no issue during the upgrade, no NoExecute taints appear. Both Ruth verifying it and the word from the node team. Removing the blocker here.
Marking this as TEST_ONLY because we expect it will be resolved when https://bugzilla.redhat.com/show_bug.cgi?id=1913532 is fixed.
The fix is at: https://bugzilla.redhat.com/show_bug.cgi?id=1929278
Tested with 4.7.0-0.nightly-2021-02-17-224627. Everything was running after the upgrade but after a some time, the 2 migratable VMs were signaled to b shutdown. VMI with runstrategy: Always: {"component":"virt-handler","level":"info","msg":"Processing event b4-ugrade/win10-vm-ocs","pos":"vm.go:1175","timestamp":"2021-02-18T16:09:47.623885Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Running\n","name":"win10-vm-ocs","namespace":"b4-ugrade","pos":"vm.go:1177","timestamp":"2021-02-18T16:09:47.623910Z","uid":"83dc4f8d- 415b-4e0a-a983-2bf61a97bc74"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Running, reason: Unknown\n","name":"win10-vm-ocs","namespace":"b4-ugrade","pos":"vm.go:1182","timestamp":"2021-02-18T16:09:47.6239 31Z","uid":"83dc4f8d-415b-4e0a-a983-2bf61a97bc74"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Received Domain Event of type MODIFIED","name":"win10-vm-ocs","namespace":"b4-ugrade","pos":"server.go:78","timestamp":"2021-02-18T16:09:47.63361 5Z","uid":"83dc4f8d-415b-4e0a-a983-2bf61a97bc74"} {"component":"virt-handler","kind":"","level":"info","msg":"Signaled graceful shutdown for win10-vm-ocs","name":"win10-vm-ocs","namespace":"b4-ugrade","pos":"vm.go:1649","timestamp":"2021-02-18T16:09:47.659708Z","uid":"83dc4f8d-415b-4e0a-a983-2bf61a97bc74"} VMI with runstrategy: Manual: {"component":"virt-handler","level":"info","msg":"Processing event b4-ugrade/fed-nfs-vm","pos":"vm.go:1175","timestamp":"2021-02-18T10:44:20.705166Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Running\n","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:1177","timestamp":"2021-02-18T10:44:20.705199 Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Paused, reason: Migration\n","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:1182","timestamp":"2021 -02-18T10:44:20.705219Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Received Domain Event of type MODIFIED","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"server.go:78","timestamp":"2021-0 2-18T10:44:21.179196Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain is in state Shutoff reason Migrated","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:2175","timestamp":"2021-02-18T10:44:21.179311Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","level":"info","msg":"Processing event b4-ugrade/fed-nfs-vm","pos":"vm.go:1175","timestamp":"2021-02-18T10:44:21.179384Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Running\n","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:1177","timestamp":"2021-02-18T10:44:21.179454Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Shutoff, reason: Migrated\n","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:1182","timestamp":"2021-02-18T10:44:21.179474Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","kind":"VirtualMachineInstance","level":"info","msg":"Using cached UID for vmi found in domain cache","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:1350","timestamp":"2021-02-18T10:44:21.207149Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","level":"info","msg":"Processing event b4-ugrade/fed-nfs-vm","pos":"vm.go:1175","timestamp":"2021-02-18T10:44:21.207216Z"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Shutoff, reason: Migrated\n","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:1182","timestamp":"2021-02-18T10:44:21.207263Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} - 3 running VMs: Windows10, OCP, runstrategy; Always Fedora33, NFS, runstrategy: Manual Rhel8.3, HPP Started off from OCP 4.6.17, CNV 2.5.3 Upgraded OCP VMs were live migrated (checked running process in the migated VMIs): ---- ------ ---- ---- ------- Normal SuccessfulCreate 4h8m disruptionbudget-controller Created PodDisruptionBudget kubevirt-disruption-budget-78g8k Normal SuccessfulCreate 4h8m virtualmachine-controller Created virtual machine pod virt-launcher-win10-vm-ocs-vxjps Normal Started 4h8m virt-handler VirtualMachineInstance started. Warning SyncFailed 162m virt-handler unknown error encountered sending command SyncVMI: rpc error: code = DeadlineExceeded desc = context deadline exceeded Normal Created 126m (x141 over 4h8m) virt-handler VirtualMachineInstance defined. Normal SuccessfulCreate 126m disruptionbudget-controller Created Migration kubevirt-evacuation-xjj9z Normal PreparingTarget 123m (x2 over 123m) virt-handler VirtualMachineInstance Migration Target Prepared. Normal PreparingTarget 123m virt-handler Migration Target is listening at 10.131.0.5, on ports: 39759,40051 Warning SyncFailed 122m virt-handler server error. command Migrate failed: "migration job already executed" Normal SuccessfulCreate 122m disruptionbudget-controller Created Migration kubevirt-evacuation-wfhcz Normal PreparingTarget 120m (x2 over 120m) virt-handler VirtualMachineInstance Migration Target Prepared. Normal PreparingTarget 120m virt-handler Migration Target is listening at 10.129.2.4, on ports: 34763,43953 Normal Created 27m (x132 over 119m) virt-handler VirtualMachineInstance defined. Normal ShuttingDown 25s (x369 over 27m) virt-handler Signaled Graceful Shutdown $ oc get node NAME STATUS ROLES AGE VERSION ssp09-c7g7r-master-0 Ready master 26h v1.20.0+ba45583 ssp09-c7g7r-master-1 Ready master 26h v1.20.0+ba45583 ssp09-c7g7r-master-2 Ready master 26h v1.20.0+ba45583 ssp09-c7g7r-worker-0-624qp Ready worker 26h v1.20.0+ba45583 ssp09-c7g7r-worker-0-kwzsk Ready worker 26h v1.20.0+ba45583 ssp09-c7g7r-worker-0-ndrjw Ready worker 26h v1.20.0+ba45583 $ oc get vmi NAME AGE PHASE IP NODENAME fed-nfs-vm 8h Running 10.129.2.46 ssp09-c7g7r-worker-0-624qp rhel8-hpp-vm 53m Running 10.131.0.16 ssp09-c7g7r-worker-0-ndrjw win10-vm-ocs 3h10m Running 10.129.2.48 ssp09-c7g7r-worker-0-624qp
@rnetser can the most recent symptoms in comment 16 sound like a different/new bug. Please create a new bug to track it's resolution.
NVM; I've split it to bug #1930630 Moving this bug back to ON_QA and TestOnly to cover the original issue
Moving this to verified; with the fix the VMIs are migrated and nodes are Ready after upgrade
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0799