Description of problem: oc vm delete <vmname> may not complete -pod will be in terminating state -vmi stays in running status until guest vm initiates shutdown -sometimes vm deletes but not vmi -could be related to finalizer Might be related to BZ1883875 where guest agent is not running but not always the case Version-Release number of selected component (if applicable): CNV 2.4.3 OCP 4.5.17 How reproducible: Sporadic - need more details on how to reproduce Steps to Reproduce: 1. Create many VMs through automation using same source vm pv 2. Delete vms 3. Actual results: Sporadic deletes of vms Expected results: All vms should be deleted Additional info:
Is it possible to get some more information? in particular must-gather ouput would give us some more context here.
Without any other context, this sounds like it might be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1896387 That BZ is likely due to an issue in CRI-O which is being addressed in CNV 4.7. Thus I'm deferring this pending a fix in OCP.
To complete the loop here, https://bugzilla.redhat.com/show_bug.cgi?id=1896387#c8 mentioned this BZ https://bugzilla.redhat.com/show_bug.cgi?id=1883991 which is what I was referring to in Comment #6
I have been able to reproduce this in 4.7.3 with a Windows guest VM. Start Windows 2019 VM and then from OCP console try to stop the VM - seems to hang. Now if I repeat the same steps but go into Windows 2019 VM and shutdown - and then before the Windows VM shuts down stop it in OCP console it will stop properly. Given my statement above do we believe this is still related to CRI-O as indicated in comment 6?
Ben, There exists a BZ where the reporter created a VM and then deleted it immediately--on a windows VM. https://bugzilla.redhat.com/show_bug.cgi?id=1933043 In some cases, this causes graceful shutdown to fail--at which point the VMI will wait for terminationGracePeriodSeconds to be deleted. This is especially noticeable on Windows because the grace period is quite long (to ensure we don't break Windows updates). Does this appear similar to what you're experiencing? What were the TerminationGracePeriodSeconds for those that are able to terminate immediately vs those that hang?
It seems that when doing ephemeral VMs they used 60 seconds otherwise 3600 seconds for VMs that might get created but stay up awhile.
verify with build HCO:[v4.8.1-18] step: 1. create 50 vms with same dv source 2. start all vms, waiting vm all in running status 3. destroy all vms, check vm and vmi status all vm and vmi deleted.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.1 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3259