Bug 2178037

Summary: VM termination stuck until instancetype/preference revisionName is cleared
Product: Container Native Virtualization (CNV) Reporter: Roni Kishner <rkishner>
Component: InfrastructureAssignee: Lee Yarwood <lyarwood>
Status: CLOSED ERRATA QA Contact: Roni Kishner <rkishner>
Severity: high Docs Contact:
Priority: high    
Version: 4.13.0CC: gkapoor, gouyang, sbennert, ycui
Target Milestone: ---Keywords: AutomationBlocker, Reopened
Target Release: 4.13.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.13.0-351 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-18 02:58:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Roni Kishner 2023-03-14 09:51:40 UTC
Description of problem:
After creating a VM using instancetype/preference a revision controller is created and the VM revisionName field is being added.
If the ControllerRevision is being deleted before deleting the VM (such is the case when calling "delete namespace"), the VM will be stuck and won't be deleted until
the revisionName field is removed from the VM.

Version-Release number of selected component (if applicable):
4.13

How reproducible:
100%

Steps to Reproduce:
1. Create a VM with instanceType
2. Start the VM (so the controllerRevision will be created)
3. Delete the namespace

Actual results:
The VM (and namespace) are stuck on termination phase until the instancetype.revisionName field is removed

Expected results:
The VM and namespace should be deleted

Additional info:

Comment 1 Lee Yarwood 2023-03-15 12:18:23 UTC
As discussed lets get an actual example in here please.

Comment 2 Lee Yarwood 2023-03-15 13:28:12 UTC
IMHO I think this is a bug with the virtualMachineControllerFinalize implementation, I can't reproduce c#0 but I can cause the removal of the VirtualMachine to get stuck until I remove spec.instancetype. 

I believe this update and eventual re-queue is allowing the controller to remove the finalizer and that this has nothing to do with instance types etc.

./cluster-up/kubectl.sh apply -f https://k8s.io/examples/admin/namespace-dev.yaml
./cluster-up/kubectl.sh apply -f examples/csmall.yaml -f examples/vm-cirros-csmall.yaml -n development
[..]
./cluster-up/kubectl.sh get -n development vm/vm-cirros-csmall -o json | jq .spec.instancetype
selecting docker as container runtime
{
  "kind": "VirtualMachineInstancetype",
  "name": "csmall",
  "revisionName": "vm-cirros-csmall-csmall-6242f4e8-8855-4c33-8e07-c68840db36c7-1"
}
./cluster-up/kubectl.sh delete namespace/development
[..]
./cluster-up/kubectl.sh get vms -n development vm-cirros-csmall -o json | jq .metadata.finalizers
selecting docker as container runtime
[
  "kubevirt.io/virtualMachineControllerFinalize"
]
./cluster-up/kubectl.sh get namespace/development -o json | jq .status
selecting docker as container runtime
{
  "conditions": [
    {
      "lastTransitionTime": "2023-03-15T13:15:18Z",
      "message": "All resources successfully discovered",
      "reason": "ResourcesDiscovered",
      "status": "False",
      "type": "NamespaceDeletionDiscoveryFailure"
    },
    {
      "lastTransitionTime": "2023-03-15T13:15:18Z",
      "message": "All legacy kube types successfully parsed",
      "reason": "ParsedGroupVersions",
      "status": "False",
      "type": "NamespaceDeletionGroupVersionParsingFailure"
    },
    {
      "lastTransitionTime": "2023-03-15T13:15:18Z",
      "message": "All content successfully deleted, may be waiting on finalization",
      "reason": "ContentDeleted",
      "status": "False",
      "type": "NamespaceDeletionContentFailure"
    },
    {
      "lastTransitionTime": "2023-03-15T13:15:18Z",
      "message": "Some resources are remaining: virtualmachines.kubevirt.io has 1 resource instances",
      "reason": "SomeResourcesRemain",
      "status": "True",
      "type": "NamespaceContentRemaining"
    },
    {
      "lastTransitionTime": "2023-03-15T13:15:18Z",
      "message": "Some content in the namespace has finalizers remaining: kubevirt.io/virtualMachineControllerFinalize in 1 resource instances",
      "reason": "SomeFinalizersRemain",
      "status": "True",
      "type": "NamespaceFinalizersRemaining"
    }
  ],
  "phase": "Terminating"
}
./cluster-up/kubectl.sh patch -n development vms/vm-cirros-csmall --type='json' -p '[{"op": "remove", "path": "/spec/instancetype/revisionName"}]'
selecting docker as container runtime
The request is invalid: spec.instancetype: Failure to find instancetype: virtualmachineinstancetypes.instancetype.kubevirt.io "csmall" not found

./cluster-up/kubectl.sh patch -n development vms/vm-cirros-csmall --type='json' -p '[{"op": "remove", "path": "/spec/instancetype"}]'
selecting docker as container runtime
virtualmachine.kubevirt.io/vm-cirros-csmall patched

./cluster-up/kubectl.sh get -n development vms/vm-cirros-csmall -o json | jq .status
selecting docker as container runtime
Error from server (NotFound): namespaces "development" not found

Comment 3 Lee Yarwood 2023-03-15 13:31:52 UTC
Actually the update to remove the finalizer might be failing because the instance type and controller revision have already been removed?

Comment 4 Lee Yarwood 2023-03-15 14:01:40 UTC
Okay this doesn't appear to happen now with the following in place:

$ git diff HEAD~1
diff --git a/pkg/instancetype/instancetype.go b/pkg/instancetype/instancetype.go
index 75e76eb50..e35cdc7c4 100644
--- a/pkg/instancetype/instancetype.go
+++ b/pkg/instancetype/instancetype.go
@@ -382,7 +382,7 @@ func (m *InstancetypeMethods) ApplyToVmi(field *k8sfield.Path, instancetypeSpec
 }
 
 func (m *InstancetypeMethods) FindPreferenceSpec(vm *virtv1.VirtualMachine) (*instancetypev1alpha2.VirtualMachinePreferenceSpec, error) {
-       if vm.Spec.Preference == nil {
+       if vm.Spec.Preference == nil || vm.DeletionTimestamp != nil {
                return nil, nil
        }
 
@@ -537,7 +537,7 @@ func (m *InstancetypeMethods) findClusterPreferenceByClient(resourceName string)
 }
 
 func (m *InstancetypeMethods) FindInstancetypeSpec(vm *virtv1.VirtualMachine) (*instancetypev1alpha2.VirtualMachineInstancetypeSpec, error) {
-       if vm.Spec.Instancetype == nil {
+       if vm.Spec.Instancetype == nil || vm.DeletionTimestamp != nil {
                return nil, nil
        }
 
@@ -669,7 +669,7 @@ func (m *InstancetypeMethods) findClusterInstancetypeByClient(resourceName strin
 }
 
 func (m *InstancetypeMethods) InferDefaultInstancetype(vm *virtv1.VirtualMachine) (*virtv1.InstancetypeMatcher, error) {
-       if vm.Spec.Instancetype == nil || vm.Spec.Instancetype.InferFromVolume == "" {
+       if vm.Spec.Instancetype == nil || vm.Spec.Instancetype.InferFromVolume == "" || vm.DeletionTimestamp != nil {
                return nil, nil
        }
        defaultName, defaultKind, err := m.inferDefaultsFromVolumes(vm, vm.Spec.Instancetype.InferFromVolume, apiinstancetype.DefaultInstancetypeLabel, apiinstancetype.DefaultInstancetypeKindLabel)
@@ -683,7 +683,7 @@ func (m *InstancetypeMethods) InferDefaultInstancetype(vm *virtv1.VirtualMachine
 }
 
 func (m *InstancetypeMethods) InferDefaultPreference(vm *virtv1.VirtualMachine) (*virtv1.PreferenceMatcher, error) {
-       if vm.Spec.Preference == nil || vm.Spec.Preference.InferFromVolume == "" {
+       if vm.Spec.Preference == nil || vm.Spec.Preference.InferFromVolume == "" || vm.DeletionTimestamp != nil {
                return nil, nil
        }
        defaultName, defaultKind, err := m.inferDefaultsFromVolumes(vm, vm.Spec.Preference.InferFromVolume, apiinstancetype.DefaultPreferenceLabel, apiinstancetype.DefaultPreferenceKindLabel)


I do however see lots of requeues because the namespace doesn't exist but the update removing the finalizer eventually makes it through.

I'll write this up as an upstream bug later today and get a PR posted cleaning up the instance type part of this.

Comment 5 Lee Yarwood 2023-03-15 15:54:22 UTC
https://github.com/kubevirt/kubevirt/issues/9438

Comment 10 Geetika Kapoor 2023-03-29 11:56:42 UTC
Opening again. It was a mistake to close this bug.

Comment 11 Guohua Ouyang 2023-03-29 12:33:26 UTC
*** Bug 2182325 has been marked as a duplicate of this bug. ***

Comment 12 Lee Yarwood 2023-03-31 15:59:19 UTC
As requested, to workaround this without the fix use the following kubectl commands:

$ kubectl patch -n $VMNamespace vms/$VMName --type='json' -p '[{"op": "remove", "path": "/spec/instancetype"}]'
$ kubectl patch -n $VMNamespace vms/$VMName --type='json' -p '[{"op": "remove", "path": "/spec/preference"}]'

This should result in the update to remove the kubevirt.io/virtualMachineControllerFinalize finalizer from the VirtualMachine controller finally being accepted and the VirtualMachine being deleted.

Comment 13 Roni Kishner 2023-04-17 13:26:27 UTC
Verified on  v4.13.0.rhel9-2081

Comment 15 errata-xmlrpc 2023-05-18 02:58:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3205