Description of problem: Migration of VM doesn't clean up the target pod in time in case of failed migration. This is caused by our detection if istio proxy is present: {"component":"virt-launcher","level":"error","msg":"dirty virt-launcher shutdown: exit-code 2","pos":"virt-launcher.go:567","timestamp":"2021-10-14T22:04:06.674177Z"} {"component":"virt-launcher","level":"error","msg":"error when checking for istio-proxy presence","pos":"virt-launcher.go:657","reason":"Get \"http://localhost:15021/healthz/ready\": dial tcp [::1]:15021: connect: connection refused","timestamp":"2021-10-14T22:04:10.781733Z"} {"component":"virt-launcher","level":"error","msg":"error when checking for istio-proxy presence","pos":"virt-launcher.go:657","reason":"Get \"http://localhost:15021/healthz/ready\": dial tcp [::1]:15021: connect: connection refused","timestamp":"2021-10-14T22:04:13.853706Z"} {"component":"virt-launcher","level":"error","msg":"error when checking for istio-proxy presence","pos":"virt-launcher.go:657","reason":"Get \"http://localhost:15021/healthz/ready\": dial tcp [::1]:15021: connect: connection refused","timestamp":"2021-10-14T22:04:16.925749Z"} {"component":"virt-launcher","level":"error","msg":"error when checking for istio-proxy presence","pos":"virt-launcher.go:657","reason":"Get \"http://localhost:15021/healthz/ready\": dial tcp [::1]:15021: connect: connection refused","timestamp":"2021-10-14T22:04:19.998665Z"} It takes more than 10 seconds to clean up. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Start migration and make it fail. 2. Observe if target pods get to cleaned up in less than 10 seconds 3. Actual results: Expected results: Additional info:
Verified with the following scenario: 1. Create a simple fedora VM. 2. View the virt-launcher pod $ oc get pods NAME READY STATUS RESTARTS AGE virt-launcher-vm-fedora-bdmqg 2/2 Running 0 100s 3. Start tailing the virt-launcher pods (with `oc get pods -w`), when initlally there is only the source virt-launcher. 4. On a different shell - start a simple migration, to migrate the VM. 5. As soon as the migration is running - delete the source virt-launcher pod, and follow the pods status in the first terminal: $ oc delete virt-launcher-vm-fedora-bdmqg [cnv-qe-jenkins@n-yoss-410-sm-chjhf-executor ~]$ oc get pods -w NAME READY STATUS RESTARTS AGE virt-launcher-vm-fedora-bdmqg 2/2 Running 0 100s virt-launcher-vm-fedora-x6c6r 0/2 Pending 0 0s virt-launcher-vm-fedora-x6c6r 0/2 Pending 0 1s virt-launcher-vm-fedora-x6c6r 0/2 Init:0/2 0 1s virt-launcher-vm-fedora-bdmqg 2/2 Terminating 0 2m11s virt-launcher-vm-fedora-x6c6r 0/2 Terminating 0 3s virt-launcher-vm-fedora-x6c6r 0/2 Terminating 0 4s virt-launcher-vm-fedora-x6c6r 0/2 Terminating 0 5s virt-launcher-vm-fedora-x6c6r 0/2 Terminating 0 6s virt-launcher-vm-fedora-x6c6r 0/2 Terminating 0 6s virt-launcher-vm-fedora-bdmqg 0/2 Terminating 0 2m16s virt-launcher-vm-fedora-bdmqg 0/2 Terminating 0 2m16s virt-launcher-vm-fedora-bdmqg 0/2 Terminating 0 2m17s virt-launcher-vm-fedora-cc4d4 0/2 Pending 0 0s virt-launcher-vm-fedora-cc4d4 0/2 Pending 0 0s virt-launcher-vm-fedora-cc4d4 0/2 Pending 0 0s virt-launcher-vm-fedora-cc4d4 0/2 Init:0/2 0 1s virt-launcher-vm-fedora-cc4d4 0/2 Init:0/2 0 3s virt-launcher-vm-fedora-cc4d4 0/2 Init:1/2 0 4s virt-launcher-vm-fedora-cc4d4 0/2 PodInitializing 0 5s virt-launcher-vm-fedora-cc4d4 2/2 Running 0 7s As can be seen - the first target pod is the one with the "x6c6r" suffix. It starts its expected initialization, and then, when the source pod "bdmqg" is deleted - the target pod also starts termination. Eventually a new target pod cc4d4 is initialized, and ends up in Running state. I can't say exactly how long it took, but it as a matter of few seconds (less than 10 seconds for sure). I repeated the scenario twice, and in both cases I viewed the same outcome. OCP version 4.10.0-fc.0 CNV version 4.10.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0947