Bug 1994504 - Interim PVC not cleaned up when Pending DataVolume is deleted
Summary: Interim PVC not cleaned up when Pending DataVolume is deleted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 4.8.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.8.4
Assignee: Michael Henriksen
QA Contact: Jenia Peimer
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-17 12:28 UTC by Adam Litke
Modified: 2022-01-20 17:21 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-20 17:21:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2022:0213 0 None None None 2022-01-20 17:21:33 UTC

Description Adam Litke 2021-08-17 12:28:50 UTC
Description of problem: On a cluster using the Trident CSI driver we encountered a failure during the smart clone flow.  A snapshot was created successfully but the restore is failing.  This causes a DV to be stuck in Pending state.  When the DV is removed, the cd-tmp PVC in the source namespace is not cleaned up.


Version-Release number of selected component (if applicable): 4.8.1


How reproducible: 100% when cloning fails


Steps to Reproduce:
1. Create a cross ns clone DV in an environment where snapshot restore is failing
2. Delete the DV

Actual results: The cdi-tmp PVC in the source ns is not cleaned up


Expected results: The cdi-tmp PVC should also be removed.


Additional info:

Comment 1 Yan Du 2021-09-01 12:25:37 UTC
Hi, Micheal do we have any update for this bug?

Comment 2 Michael Henriksen 2021-09-29 15:02:19 UTC
Sorry for the delay on this:  The PVC in question should get deleted in this loop:

PVC gets deleted here:  https://github.com/kubevirt/containerized-data-importer/blob/main/pkg/controller/datavolume-controller.go#L1196

My guess is that in this specific situation, it was hanging around because of a finalizer added my the storage provisioner.

Comment 3 Yan Du 2021-12-08 13:21:02 UTC
Base on the above comments, moving the bug to ON_QA

Comment 4 Jenia Peimer 2021-12-29 16:04:29 UTC
Tried reproducing this with CNV v4.8.4-35 using OCS and everything works as expected.

$ oc create -f dv-src.yaml 
datavolume.cdi.kubevirt.io/cirros-dv created

$ oc get dv -A
NAMESPACE                            NAME             PHASE         PROGRESS   RESTARTS   AGE
openshift-virtualization-os-images   cirros-dv        Succeeded     100.0%                87s

$ oc create -f vm-target.yaml 
virtualmachine.kubevirt.io/vm-dv-clone created

$ oc delete volumesnapshot -n openshift-virtualization-os-images $(oc get volumesnapshot -A | grep cdi)
volumesnapshot.snapshot.storage.k8s.io "cdi-tmp-9acd9c18-112a-44de-b554-64966edeccca" deleted
Error from server (NotFound): volumesnapshots.snapshot.storage.k8s.io "openshift-virtualization-os-images" not found
Error from server (NotFound): volumesnapshots.snapshot.storage.k8s.io "false" not found
Error from server (NotFound): volumesnapshots.snapshot.storage.k8s.io "cirros-dv" not found
Error from server (NotFound): volumesnapshots.snapshot.storage.k8s.io "ocs-storagecluster-rbdplugin-snapclass" not found
Error from server (NotFound): volumesnapshots.snapshot.storage.k8s.io "snapcontent-8f2a3a7b-7a82-47c7-85a4-9e471d127dcd" not found
Error from server (NotFound): volumesnapshots.snapshot.storage.k8s.io "1s" not found

$ oc get volumesnapshot -A 
No resources found

$ oc get dv -A
NAMESPACE                            NAME                 PHASE                             PROGRESS   RESTARTS   AGE
default                              target-dv-vm-clone   SnapshotForSmartCloneInProgress                         94s
openshift-virtualization-os-images   cirros-dv            Succeeded                         100.0%                15m


No PVC associated with target VM:

$ oc get pvc -A
NAMESPACE                            NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
openshift-storage                    db-noobaa-db-pg-0             Bound    pvc-f15d2b49-08fc-4dea-bac2-7787350ad56a   50Gi       RWO            ocs-storagecluster-ceph-rbd   5d23h
openshift-storage                    ocs-deviceset-0-data-09bnq2   Bound    local-pv-94284990                          70Gi       RWO            local-block                   5d23h
openshift-storage                    ocs-deviceset-1-data-0bp2nm   Bound    local-pv-b4007a1c                          70Gi       RWO            local-block                   5d23h
openshift-storage                    ocs-deviceset-2-data-0rkswd   Bound    local-pv-c1f38162                          70Gi       RWO            local-block                   5d23h
openshift-virtualization-os-images   cirros-dv                     Bound    pvc-94e8f4e8-5157-4219-a155-f0f33f25db85   30Gi       RWO            ocs-storagecluster-ceph-rbd   16m

$ oc get vmi -A
No resources found

$ oc get vm -A
NAMESPACE   NAME          AGE     VOLUME
default     vm-dv-clone   3m48s   

$ virtctl start vm-dv-clone
Error starting VirtualMachine Operation cannot be fulfilled on virtualmachine.kubevirt.io "vm-dv-clone": Always does not support manual start requests

$ oc delete vm vm-dv-clone 
virtualmachine.kubevirt.io "vm-dv-clone" deleted

$ oc get vm -A
No resources found

$ oc get dv -A
NAMESPACE                            NAME             PHASE         PROGRESS   RESTARTS   AGE
openshift-virtualization-os-images   cirros-dv        Succeeded     100.0%                21m

$ oc delete dv -n openshift-virtualization-os-images cirros-dv 
datavolume.cdi.kubevirt.io "cirros-dv" deleted

No PVC leftovers:

$ oc get pvc -A
NAMESPACE           NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
openshift-storage   db-noobaa-db-pg-0             Bound    pvc-f15d2b49-08fc-4dea-bac2-7787350ad56a   50Gi       RWO            ocs-storagecluster-ceph-rbd   6d
openshift-storage   ocs-deviceset-0-data-09bnq2   Bound    local-pv-94284990                          70Gi       RWO            local-block                   6d
openshift-storage   ocs-deviceset-1-data-0bp2nm   Bound    local-pv-b4007a1c                          70Gi       RWO            local-block                   6d
openshift-storage   ocs-deviceset-2-data-0rkswd   Bound    local-pv-c1f38162                          70Gi       RWO            local-block                   6d


Source DV:

$ cat dv-src.yaml 
apiVersion: cdi.kubevirt.io/v1alpha1
kind: DataVolume
metadata:
 namespace: "openshift-virtualization-os-images"
 name: cirros-dv
spec:
 source:
     http:
        url: "http://.../cirros-0.4.0-x86_64-disk.qcow2"
        secretRef: ""  
 pvc:
   accessModes:
   - ReadWriteOnce
   resources:
     requests:
       storage: 30Gi
   storageClassName: ocs-storagecluster-ceph-rbd
   volumeMode: Block

Target VM:

$ cat vm-target.yaml 
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: vm-dv-clone
  namespace: "default"
  name: vm-dv-clone 
spec:
  running: true
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-dv-clone
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: root-disk
        resources:
          requests:
            memory: 64M
      volumes:
      - dataVolume:
          name: target-dv-vm-clone
        name: root-disk
  dataVolumeTemplates:
  - metadata:
      name: target-dv-vm-clone
    spec:
      pvc:
        storageClassName: ocs-storagecluster-ceph-rbd
        volumeMode: Block
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 32Gi
      source:
        pvc:
          namespace: "openshift-virtualization-os-images"
          name: "cirros-dv"

Comment 10 errata-xmlrpc 2022-01-20 17:21:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 4.8.4 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0213


Note You need to log in before you can comment on or make changes to this bug.