Bug 2132873
| Summary: | VM is removed before virt-launcher pod exits, new VM with same name points to old VMI/virt-launcher pod still terminating | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Germano Veit Michel <gveitmic> |
| Component: | Virtualization | Assignee: | ffossemo |
| Status: | CLOSED ERRATA | QA Contact: | zhe peng <zpeng> |
| Severity: | medium | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 4.11.0 | CC: | acardace, fdeutsch, ffossemo, ycui |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | 4.13.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | hco-bundle-registry-container-v4.13.0.rhel9-1639 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-05-18 02:55:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Germano Veit Michel
2022-10-07 04:58:28 UTC
(In reply to Germano Veit Michel from comment #0) > Duplicate name check does not work after the VM is deleted, customer is able > to create a new VM with the same name as the old VM with same name that > still has its virt-launcher pod terminating. This ended up quite confusing. Let me rephrase: * After VM is deleted, one is able to create new VM with same name * But VM deleted does not mean VMI and virt-launcher already gone * If one creates new VM with same name of old VMI that is still terminating, they get associated. Hopefully its better :) This is intended behavior overall. From OpenShift Virtualization's point of view, we're relying on the OwnerReference to trigger a cascading delete. Which means we're relying on the kube scheduler to clean up remaining objects, which is not an atomic operation. For the workflow given in the description, we suggest that adding the "--cascade" and "--foreground" flags to the delete VM command, This will result in the desired behavior. (In reply to sgott from comment #2) > This is intended behavior overall. From OpenShift Virtualization's point of > view, we're relying on the OwnerReference to trigger a cascading delete. > Which means we're relying on the kube scheduler to clean up remaining > objects, which is not an atomic operation. I disagree with this. For a user point of view this is horrible and we have a customer complaining about it. While the cleanup is being done a new VM should have its own VMI and so on. It should not link back to an old VMI that is completely unrelated except for its name. The cleanup of old objects is one thing and thats fine that its still being deleted, the problem is relating a new object to old objects that have nothing to do with it. > > For the workflow given in the description, we suggest that adding the > "--cascade" and "--foreground" flags to the delete VM command, This will > result in the desired behavior. Can you consider doing this on the Web UI then? WRT deletion I agree that the system behaves as expected, and follows principles for other objects in the cluster. delete --foreground - or kubectl delete --cascade=foreground can mitigate this problem. However, that the new VM is linked to the old VMI instance, is a bug. This must not happen. Consider the following flow 1. Create VM A 2. Start VM A 3. Delete VM A 4. Create VM A 5. Start VM A Then we should fail in step #4 or at latest in step #5 assuming that the old VMI is still running. Because VMs will create VMIs with the same name, it might make sense to block step #4 whenever we detect a VMI with the same name as the VM. Given that a new VM will most likely conflict with a VMI's settings in some way, it does make sense to block the creation of the VM in step 4 of comment #6. @fdeutsch @sgott For uniformity, we should do the same in case of creation of VMIs. Let's take an example: 1. Creation of the VM `vm-test` 2. Creation of the VMI `vm-test` 3. Start of the VMI `vm-test` Currently the behaviour is that the creation of the VMI` at point 2. will be deleted at least it is created, due to the reconciliation of the VM created at point 1. So, it results in the following: ``` fossedihelm@fedora ~/w/kubevirt (name_conflict_vm_vmi) [1]> k create -f examples/vmi-windows.yaml virtualmachineinstance.kubevirt.io/windows-11-test created fossedihelm@fedora ~/w/kubevirt (name_conflict_vm_vmi)> k get vmi No resources found in default namespace. ``` Is it ok that the creation of a VMI checks that there are no VM with the same name? Thanks This is not enough I suppose, and VMIs do not have the problem. The problem exists for VMs only. Consider the following flow 1. Create VM A 2. Start VM A (VMI A is getting created) 3. Delete VM A (VMI A is started to be deleted) 4. Create VM A 5. Start VM A (old VMI A was not deleted and is now bound to VM A again) The bug is step 5: VMI A which got created in step #2, was marked for deletion in step #3, however, VMI A was NOT deleted before step #5. Thus just before step #5 VM A exists, and VMI A exists. On step #5 VM A is bound to VMI A. This is the bug, becuse VMI A (in step #5) is from VM A in step #1, and not th einstance of VM A from step #4. As part of this bug, please take care to create a new testcase in tier1 (upstream) to test for this scenario/bug. Assumption is: 1. VM created and eventually running 2. Delete VM (background deletion) 3. Fast create a new VM Expected outcome: New VM will eventually be running, has to wait for the old VMI to be gone I can reproduce this bug. verified with build: CNV-v4.13.0.rhel9-1639 step: 1: Create a new Windows VM with Blank PVC 2: Delete vm with --cascade=orphan $ kubectl delete vm vm-win10 --cascade=orphan virtualmachine.kubevirt.io "vm-win10" deleted 3: check vm, vmi and virt-launcher pod $ oc get vm No resources found in default namespace. $ oc get vmi NAME AGE PHASE IP NODENAME READY vm-win10 25m Running 10.129.3.160 c01-zpeng-413-sv526-worker-0-8dp6m False $ oc get pods NAME READY STATUS RESTARTS AGE virt-launcher-vm-win10-h2dhq 2/2 Terminating 0 49m 4: create a new window vm using same name with step 1 and start it $ virtctl start vm-win10 Error starting VirtualMachine Operation cannot be fulfilled on virtualmachine.kubevirt.io "vm-win10": VM is already running new vm is not bound to vmi. move to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3205 |