Description of problem: After creating a VM using instancetype/preference a revision controller is created and the VM revisionName field is being added. If the ControllerRevision is being deleted before deleting the VM (such is the case when calling "delete namespace"), the VM will be stuck and won't be deleted until the revisionName field is removed from the VM. Version-Release number of selected component (if applicable): 4.13 How reproducible: 100% Steps to Reproduce: 1. Create a VM with instanceType 2. Start the VM (so the controllerRevision will be created) 3. Delete the namespace Actual results: The VM (and namespace) are stuck on termination phase until the instancetype.revisionName field is removed Expected results: The VM and namespace should be deleted Additional info:
As discussed lets get an actual example in here please.
IMHO I think this is a bug with the virtualMachineControllerFinalize implementation, I can't reproduce c#0 but I can cause the removal of the VirtualMachine to get stuck until I remove spec.instancetype. I believe this update and eventual re-queue is allowing the controller to remove the finalizer and that this has nothing to do with instance types etc. ./cluster-up/kubectl.sh apply -f https://k8s.io/examples/admin/namespace-dev.yaml ./cluster-up/kubectl.sh apply -f examples/csmall.yaml -f examples/vm-cirros-csmall.yaml -n development [..] ./cluster-up/kubectl.sh get -n development vm/vm-cirros-csmall -o json | jq .spec.instancetype selecting docker as container runtime { "kind": "VirtualMachineInstancetype", "name": "csmall", "revisionName": "vm-cirros-csmall-csmall-6242f4e8-8855-4c33-8e07-c68840db36c7-1" } ./cluster-up/kubectl.sh delete namespace/development [..] ./cluster-up/kubectl.sh get vms -n development vm-cirros-csmall -o json | jq .metadata.finalizers selecting docker as container runtime [ "kubevirt.io/virtualMachineControllerFinalize" ] ./cluster-up/kubectl.sh get namespace/development -o json | jq .status selecting docker as container runtime { "conditions": [ { "lastTransitionTime": "2023-03-15T13:15:18Z", "message": "All resources successfully discovered", "reason": "ResourcesDiscovered", "status": "False", "type": "NamespaceDeletionDiscoveryFailure" }, { "lastTransitionTime": "2023-03-15T13:15:18Z", "message": "All legacy kube types successfully parsed", "reason": "ParsedGroupVersions", "status": "False", "type": "NamespaceDeletionGroupVersionParsingFailure" }, { "lastTransitionTime": "2023-03-15T13:15:18Z", "message": "All content successfully deleted, may be waiting on finalization", "reason": "ContentDeleted", "status": "False", "type": "NamespaceDeletionContentFailure" }, { "lastTransitionTime": "2023-03-15T13:15:18Z", "message": "Some resources are remaining: virtualmachines.kubevirt.io has 1 resource instances", "reason": "SomeResourcesRemain", "status": "True", "type": "NamespaceContentRemaining" }, { "lastTransitionTime": "2023-03-15T13:15:18Z", "message": "Some content in the namespace has finalizers remaining: kubevirt.io/virtualMachineControllerFinalize in 1 resource instances", "reason": "SomeFinalizersRemain", "status": "True", "type": "NamespaceFinalizersRemaining" } ], "phase": "Terminating" } ./cluster-up/kubectl.sh patch -n development vms/vm-cirros-csmall --type='json' -p '[{"op": "remove", "path": "/spec/instancetype/revisionName"}]' selecting docker as container runtime The request is invalid: spec.instancetype: Failure to find instancetype: virtualmachineinstancetypes.instancetype.kubevirt.io "csmall" not found ./cluster-up/kubectl.sh patch -n development vms/vm-cirros-csmall --type='json' -p '[{"op": "remove", "path": "/spec/instancetype"}]' selecting docker as container runtime virtualmachine.kubevirt.io/vm-cirros-csmall patched ./cluster-up/kubectl.sh get -n development vms/vm-cirros-csmall -o json | jq .status selecting docker as container runtime Error from server (NotFound): namespaces "development" not found
Actually the update to remove the finalizer might be failing because the instance type and controller revision have already been removed?
Okay this doesn't appear to happen now with the following in place: $ git diff HEAD~1 diff --git a/pkg/instancetype/instancetype.go b/pkg/instancetype/instancetype.go index 75e76eb50..e35cdc7c4 100644 --- a/pkg/instancetype/instancetype.go +++ b/pkg/instancetype/instancetype.go @@ -382,7 +382,7 @@ func (m *InstancetypeMethods) ApplyToVmi(field *k8sfield.Path, instancetypeSpec } func (m *InstancetypeMethods) FindPreferenceSpec(vm *virtv1.VirtualMachine) (*instancetypev1alpha2.VirtualMachinePreferenceSpec, error) { - if vm.Spec.Preference == nil { + if vm.Spec.Preference == nil || vm.DeletionTimestamp != nil { return nil, nil } @@ -537,7 +537,7 @@ func (m *InstancetypeMethods) findClusterPreferenceByClient(resourceName string) } func (m *InstancetypeMethods) FindInstancetypeSpec(vm *virtv1.VirtualMachine) (*instancetypev1alpha2.VirtualMachineInstancetypeSpec, error) { - if vm.Spec.Instancetype == nil { + if vm.Spec.Instancetype == nil || vm.DeletionTimestamp != nil { return nil, nil } @@ -669,7 +669,7 @@ func (m *InstancetypeMethods) findClusterInstancetypeByClient(resourceName strin } func (m *InstancetypeMethods) InferDefaultInstancetype(vm *virtv1.VirtualMachine) (*virtv1.InstancetypeMatcher, error) { - if vm.Spec.Instancetype == nil || vm.Spec.Instancetype.InferFromVolume == "" { + if vm.Spec.Instancetype == nil || vm.Spec.Instancetype.InferFromVolume == "" || vm.DeletionTimestamp != nil { return nil, nil } defaultName, defaultKind, err := m.inferDefaultsFromVolumes(vm, vm.Spec.Instancetype.InferFromVolume, apiinstancetype.DefaultInstancetypeLabel, apiinstancetype.DefaultInstancetypeKindLabel) @@ -683,7 +683,7 @@ func (m *InstancetypeMethods) InferDefaultInstancetype(vm *virtv1.VirtualMachine } func (m *InstancetypeMethods) InferDefaultPreference(vm *virtv1.VirtualMachine) (*virtv1.PreferenceMatcher, error) { - if vm.Spec.Preference == nil || vm.Spec.Preference.InferFromVolume == "" { + if vm.Spec.Preference == nil || vm.Spec.Preference.InferFromVolume == "" || vm.DeletionTimestamp != nil { return nil, nil } defaultName, defaultKind, err := m.inferDefaultsFromVolumes(vm, vm.Spec.Preference.InferFromVolume, apiinstancetype.DefaultPreferenceLabel, apiinstancetype.DefaultPreferenceKindLabel) I do however see lots of requeues because the namespace doesn't exist but the update removing the finalizer eventually makes it through. I'll write this up as an upstream bug later today and get a PR posted cleaning up the instance type part of this.
https://github.com/kubevirt/kubevirt/issues/9438
Opening again. It was a mistake to close this bug.
*** Bug 2182325 has been marked as a duplicate of this bug. ***
As requested, to workaround this without the fix use the following kubectl commands: $ kubectl patch -n $VMNamespace vms/$VMName --type='json' -p '[{"op": "remove", "path": "/spec/instancetype"}]' $ kubectl patch -n $VMNamespace vms/$VMName --type='json' -p '[{"op": "remove", "path": "/spec/preference"}]' This should result in the update to remove the kubevirt.io/virtualMachineControllerFinalize finalizer from the VirtualMachine controller finally being accepted and the VirtualMachine being deleted.
Verified on v4.13.0.rhel9-2081
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3205