Description of problem that Ruth has discovered: Tested with 4.7.0-0.nightly-2021-02-17-224627. Everything was running after the upgrade but after a some time, the 2 migratable VMs were signaled to b shutdown. VMI with runstrategy: Always: {"component":"virt-handler","level":"info","msg":"Processing event b4-ugrade/win10-vm-ocs","pos":"vm.go:1175","timestamp":"2021-02-18T16:09:47.623885Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Running\n","name":"win10-vm-ocs","namespace":"b4-ugrade","pos":"vm.go:1177","timestamp":"2021-02-18T16:09:47.623910Z","uid":"83dc4f8d- 415b-4e0a-a983-2bf61a97bc74"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Running, reason: Unknown\n","name":"win10-vm-ocs","namespace":"b4-ugrade","pos":"vm.go:1182","timestamp":"2021-02-18T16:09:47.6239 31Z","uid":"83dc4f8d-415b-4e0a-a983-2bf61a97bc74"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Received Domain Event of type MODIFIED","name":"win10-vm-ocs","namespace":"b4-ugrade","pos":"server.go:78","timestamp":"2021-02-18T16:09:47.63361 5Z","uid":"83dc4f8d-415b-4e0a-a983-2bf61a97bc74"} {"component":"virt-handler","kind":"","level":"info","msg":"Signaled graceful shutdown for win10-vm-ocs","name":"win10-vm-ocs","namespace":"b4-ugrade","pos":"vm.go:1649","timestamp":"2021-02-18T16:09:47.659708Z","uid":"83dc4f8d-415b-4e0a-a983-2bf61a97bc74"} VMI with runstrategy: Manual: {"component":"virt-handler","level":"info","msg":"Processing event b4-ugrade/fed-nfs-vm","pos":"vm.go:1175","timestamp":"2021-02-18T10:44:20.705166Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Running\n","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:1177","timestamp":"2021-02-18T10:44:20.705199 Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Paused, reason: Migration\n","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:1182","timestamp":"2021 -02-18T10:44:20.705219Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Received Domain Event of type MODIFIED","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"server.go:78","timestamp":"2021-0 2-18T10:44:21.179196Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain is in state Shutoff reason Migrated","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:2175","timestamp":"2021-02-18T10:44:21.179311Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","level":"info","msg":"Processing event b4-ugrade/fed-nfs-vm","pos":"vm.go:1175","timestamp":"2021-02-18T10:44:21.179384Z"} {"component":"virt-handler","kind":"","level":"info","msg":"VMI is in phase: Running\n","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:1177","timestamp":"2021-02-18T10:44:21.179454Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Shutoff, reason: Migrated\n","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:1182","timestamp":"2021-02-18T10:44:21.179474Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","kind":"VirtualMachineInstance","level":"info","msg":"Using cached UID for vmi found in domain cache","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:1350","timestamp":"2021-02-18T10:44:21.207149Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} {"component":"virt-handler","level":"info","msg":"Processing event b4-ugrade/fed-nfs-vm","pos":"vm.go:1175","timestamp":"2021-02-18T10:44:21.207216Z"} {"component":"virt-handler","kind":"Domain","level":"info","msg":"Domain status: Shutoff, reason: Migrated\n","name":"fed-nfs-vm","namespace":"b4-ugrade","pos":"vm.go:1182","timestamp":"2021-02-18T10:44:21.207263Z","uid":"218bb948-3110-4f77-ab9b-0d31403eae89"} - 3 running VMs: Windows10, OCP, runstrategy; Always Fedora33, NFS, runstrategy: Manual Rhel8.3, HPP Started off from OCP 4.6.17, CNV 2.5.3 Upgraded OCP VMs were live migrated (checked running process in the migated VMIs): ---- ------ ---- ---- ------- Normal SuccessfulCreate 4h8m disruptionbudget-controller Created PodDisruptionBudget kubevirt-disruption-budget-78g8k Normal SuccessfulCreate 4h8m virtualmachine-controller Created virtual machine pod virt-launcher-win10-vm-ocs-vxjps Normal Started 4h8m virt-handler VirtualMachineInstance started. Warning SyncFailed 162m virt-handler unknown error encountered sending command SyncVMI: rpc error: code = DeadlineExceeded desc = context deadline exceeded Normal Created 126m (x141 over 4h8m) virt-handler VirtualMachineInstance defined. Normal SuccessfulCreate 126m disruptionbudget-controller Created Migration kubevirt-evacuation-xjj9z Normal PreparingTarget 123m (x2 over 123m) virt-handler VirtualMachineInstance Migration Target Prepared. Normal PreparingTarget 123m virt-handler Migration Target is listening at 10.131.0.5, on ports: 39759,40051 Warning SyncFailed 122m virt-handler server error. command Migrate failed: "migration job already executed" Normal SuccessfulCreate 122m disruptionbudget-controller Created Migration kubevirt-evacuation-wfhcz Normal PreparingTarget 120m (x2 over 120m) virt-handler VirtualMachineInstance Migration Target Prepared. Normal PreparingTarget 120m virt-handler Migration Target is listening at 10.129.2.4, on ports: 34763,43953 Normal Created 27m (x132 over 119m) virt-handler VirtualMachineInstance defined. Normal ShuttingDown 25s (x369 over 27m) virt-handler Signaled Graceful Shutdown $ oc get node NAME STATUS ROLES AGE VERSION ssp09-c7g7r-master-0 Ready master 26h v1.20.0+ba45583 ssp09-c7g7r-master-1 Ready master 26h v1.20.0+ba45583 ssp09-c7g7r-master-2 Ready master 26h v1.20.0+ba45583 ssp09-c7g7r-worker-0-624qp Ready worker 26h v1.20.0+ba45583 ssp09-c7g7r-worker-0-kwzsk Ready worker 26h v1.20.0+ba45583 ssp09-c7g7r-worker-0-ndrjw Ready worker 26h v1.20.0+ba45583 $ oc get vmi NAME AGE PHASE IP NODENAME fed-nfs-vm 8h Running 10.129.2.46 ssp09-c7g7r-worker-0-624qp rhel8-hpp-vm 53m Running 10.131.0.16 ssp09-c7g7r-worker-0-ndrjw win10-vm-ocs 3h10m Running 10.129.2.48 ssp09-c7g7r-worker-0-624qp Version-Release number of selected component (if applicable): 2.6.0 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
We were seeing bug #1913532 and/or bug #1906496 There were quite a few OOM issues around prom in 4.7, this lead to ndoes becoming unready, leading to pods getting deleted, leading to VMs getting shut down. *** This bug has been marked as a duplicate of bug 1906496 ***