Bug 2178037 - VM termination stuck until instancetype/preference revisionName is cleared
Summary: VM termination stuck until instancetype/preference revisionName is cleared
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Infrastructure
Version: 4.13.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.13.0
Assignee: Lee Yarwood
QA Contact: Roni Kishner
URL:
Whiteboard:
: 2182325 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-14 09:51 UTC by Roni Kishner
Modified: 2023-05-18 02:58 UTC (History)
4 users (show)

Fixed In Version: v4.13.0-351
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-18 02:58:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt issues 9438 0 None closed Removing a namespace containing VirtualMachines using Instance Types or Preferences remains blocked 2023-04-11 11:11:55 UTC
Github kubevirt kubevirt pull 9439 0 None Merged webhooks: Return early when VirtualMachine is being deleted 2023-04-11 11:11:57 UTC
Github kubevirt kubevirt pull 9534 0 None Merged [release-0.59] webhooks: Return early when VirtualMachine is being deleted 2023-04-11 11:11:58 UTC
Red Hat Issue Tracker CNV-26866 0 None None None 2023-03-14 09:53:20 UTC
Red Hat Product Errata RHSA-2023:3205 0 None None None 2023-05-18 02:58:25 UTC

Description Roni Kishner 2023-03-14 09:51:40 UTC
Description of problem:
After creating a VM using instancetype/preference a revision controller is created and the VM revisionName field is being added.
If the ControllerRevision is being deleted before deleting the VM (such is the case when calling "delete namespace"), the VM will be stuck and won't be deleted until
the revisionName field is removed from the VM.

Version-Release number of selected component (if applicable):
4.13

How reproducible:
100%

Steps to Reproduce:
1. Create a VM with instanceType
2. Start the VM (so the controllerRevision will be created)
3. Delete the namespace

Actual results:
The VM (and namespace) are stuck on termination phase until the instancetype.revisionName field is removed

Expected results:
The VM and namespace should be deleted

Additional info:

Comment 1 Lee Yarwood 2023-03-15 12:18:23 UTC
As discussed lets get an actual example in here please.

Comment 2 Lee Yarwood 2023-03-15 13:28:12 UTC
IMHO I think this is a bug with the virtualMachineControllerFinalize implementation, I can't reproduce c#0 but I can cause the removal of the VirtualMachine to get stuck until I remove spec.instancetype. 

I believe this update and eventual re-queue is allowing the controller to remove the finalizer and that this has nothing to do with instance types etc.

./cluster-up/kubectl.sh apply -f https://k8s.io/examples/admin/namespace-dev.yaml
./cluster-up/kubectl.sh apply -f examples/csmall.yaml -f examples/vm-cirros-csmall.yaml -n development
[..]
./cluster-up/kubectl.sh get -n development vm/vm-cirros-csmall -o json | jq .spec.instancetype
selecting docker as container runtime
{
  "kind": "VirtualMachineInstancetype",
  "name": "csmall",
  "revisionName": "vm-cirros-csmall-csmall-6242f4e8-8855-4c33-8e07-c68840db36c7-1"
}
./cluster-up/kubectl.sh delete namespace/development
[..]
./cluster-up/kubectl.sh get vms -n development vm-cirros-csmall -o json | jq .metadata.finalizers
selecting docker as container runtime
[
  "kubevirt.io/virtualMachineControllerFinalize"
]
./cluster-up/kubectl.sh get namespace/development -o json | jq .status
selecting docker as container runtime
{
  "conditions": [
    {
      "lastTransitionTime": "2023-03-15T13:15:18Z",
      "message": "All resources successfully discovered",
      "reason": "ResourcesDiscovered",
      "status": "False",
      "type": "NamespaceDeletionDiscoveryFailure"
    },
    {
      "lastTransitionTime": "2023-03-15T13:15:18Z",
      "message": "All legacy kube types successfully parsed",
      "reason": "ParsedGroupVersions",
      "status": "False",
      "type": "NamespaceDeletionGroupVersionParsingFailure"
    },
    {
      "lastTransitionTime": "2023-03-15T13:15:18Z",
      "message": "All content successfully deleted, may be waiting on finalization",
      "reason": "ContentDeleted",
      "status": "False",
      "type": "NamespaceDeletionContentFailure"
    },
    {
      "lastTransitionTime": "2023-03-15T13:15:18Z",
      "message": "Some resources are remaining: virtualmachines.kubevirt.io has 1 resource instances",
      "reason": "SomeResourcesRemain",
      "status": "True",
      "type": "NamespaceContentRemaining"
    },
    {
      "lastTransitionTime": "2023-03-15T13:15:18Z",
      "message": "Some content in the namespace has finalizers remaining: kubevirt.io/virtualMachineControllerFinalize in 1 resource instances",
      "reason": "SomeFinalizersRemain",
      "status": "True",
      "type": "NamespaceFinalizersRemaining"
    }
  ],
  "phase": "Terminating"
}
./cluster-up/kubectl.sh patch -n development vms/vm-cirros-csmall --type='json' -p '[{"op": "remove", "path": "/spec/instancetype/revisionName"}]'
selecting docker as container runtime
The request is invalid: spec.instancetype: Failure to find instancetype: virtualmachineinstancetypes.instancetype.kubevirt.io "csmall" not found

./cluster-up/kubectl.sh patch -n development vms/vm-cirros-csmall --type='json' -p '[{"op": "remove", "path": "/spec/instancetype"}]'
selecting docker as container runtime
virtualmachine.kubevirt.io/vm-cirros-csmall patched

./cluster-up/kubectl.sh get -n development vms/vm-cirros-csmall -o json | jq .status
selecting docker as container runtime
Error from server (NotFound): namespaces "development" not found

Comment 3 Lee Yarwood 2023-03-15 13:31:52 UTC
Actually the update to remove the finalizer might be failing because the instance type and controller revision have already been removed?

Comment 4 Lee Yarwood 2023-03-15 14:01:40 UTC
Okay this doesn't appear to happen now with the following in place:

$ git diff HEAD~1
diff --git a/pkg/instancetype/instancetype.go b/pkg/instancetype/instancetype.go
index 75e76eb50..e35cdc7c4 100644
--- a/pkg/instancetype/instancetype.go
+++ b/pkg/instancetype/instancetype.go
@@ -382,7 +382,7 @@ func (m *InstancetypeMethods) ApplyToVmi(field *k8sfield.Path, instancetypeSpec
 }
 
 func (m *InstancetypeMethods) FindPreferenceSpec(vm *virtv1.VirtualMachine) (*instancetypev1alpha2.VirtualMachinePreferenceSpec, error) {
-       if vm.Spec.Preference == nil {
+       if vm.Spec.Preference == nil || vm.DeletionTimestamp != nil {
                return nil, nil
        }
 
@@ -537,7 +537,7 @@ func (m *InstancetypeMethods) findClusterPreferenceByClient(resourceName string)
 }
 
 func (m *InstancetypeMethods) FindInstancetypeSpec(vm *virtv1.VirtualMachine) (*instancetypev1alpha2.VirtualMachineInstancetypeSpec, error) {
-       if vm.Spec.Instancetype == nil {
+       if vm.Spec.Instancetype == nil || vm.DeletionTimestamp != nil {
                return nil, nil
        }
 
@@ -669,7 +669,7 @@ func (m *InstancetypeMethods) findClusterInstancetypeByClient(resourceName strin
 }
 
 func (m *InstancetypeMethods) InferDefaultInstancetype(vm *virtv1.VirtualMachine) (*virtv1.InstancetypeMatcher, error) {
-       if vm.Spec.Instancetype == nil || vm.Spec.Instancetype.InferFromVolume == "" {
+       if vm.Spec.Instancetype == nil || vm.Spec.Instancetype.InferFromVolume == "" || vm.DeletionTimestamp != nil {
                return nil, nil
        }
        defaultName, defaultKind, err := m.inferDefaultsFromVolumes(vm, vm.Spec.Instancetype.InferFromVolume, apiinstancetype.DefaultInstancetypeLabel, apiinstancetype.DefaultInstancetypeKindLabel)
@@ -683,7 +683,7 @@ func (m *InstancetypeMethods) InferDefaultInstancetype(vm *virtv1.VirtualMachine
 }
 
 func (m *InstancetypeMethods) InferDefaultPreference(vm *virtv1.VirtualMachine) (*virtv1.PreferenceMatcher, error) {
-       if vm.Spec.Preference == nil || vm.Spec.Preference.InferFromVolume == "" {
+       if vm.Spec.Preference == nil || vm.Spec.Preference.InferFromVolume == "" || vm.DeletionTimestamp != nil {
                return nil, nil
        }
        defaultName, defaultKind, err := m.inferDefaultsFromVolumes(vm, vm.Spec.Preference.InferFromVolume, apiinstancetype.DefaultPreferenceLabel, apiinstancetype.DefaultPreferenceKindLabel)


I do however see lots of requeues because the namespace doesn't exist but the update removing the finalizer eventually makes it through.

I'll write this up as an upstream bug later today and get a PR posted cleaning up the instance type part of this.

Comment 5 Lee Yarwood 2023-03-15 15:54:22 UTC
https://github.com/kubevirt/kubevirt/issues/9438

Comment 10 Geetika Kapoor 2023-03-29 11:56:42 UTC
Opening again. It was a mistake to close this bug.

Comment 11 Guohua Ouyang 2023-03-29 12:33:26 UTC
*** Bug 2182325 has been marked as a duplicate of this bug. ***

Comment 12 Lee Yarwood 2023-03-31 15:59:19 UTC
As requested, to workaround this without the fix use the following kubectl commands:

$ kubectl patch -n $VMNamespace vms/$VMName --type='json' -p '[{"op": "remove", "path": "/spec/instancetype"}]'
$ kubectl patch -n $VMNamespace vms/$VMName --type='json' -p '[{"op": "remove", "path": "/spec/preference"}]'

This should result in the update to remove the kubevirt.io/virtualMachineControllerFinalize finalizer from the VirtualMachine controller finally being accepted and the VirtualMachine being deleted.

Comment 13 Roni Kishner 2023-04-17 13:26:27 UTC
Verified on  v4.13.0.rhel9-2081

Comment 15 errata-xmlrpc 2023-05-18 02:58:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3205


Note You need to log in before you can comment on or make changes to this bug.