Bug 1648232

Summary: CDI importer retry logic won't stop event when datavolume is deleted
Product: Container Native Virtualization (CNV) Reporter: shiyang.wang <shiywang>
Component: StorageAssignee: John Griffith <jgriffith>
Status: CLOSED ERRATA QA Contact: shiyang.wang <shiywang>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.3CC: alitke, cnv-qe-bugs, ncredi, qixuan.wang, sgordon
Target Milestone: ---   
Target Release: 1.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v1.4.0-6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-26 13:24:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description shiyang.wang 2018-11-09 08:04:02 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. when I create a datavolume which endpoints is invalid, like 
{
    "kind": "List",
    "apiVersion": "v1",
    "metadata": {},
    "items": [
        {
            "apiVersion": "cdi.kubevirt.io/v1alpha1",
            "kind": "DataVolume",
            "metadata": {
                "name": "datavolume2"
            },
            "spec": {
                "pvc": {
                    "accessModes": [
                        "ReadWriteOnce"
                    ],
                    "resources": {
                        "requests": {
                            "storage": "500Mi"
                        }
                    }
                },
                "source": {
                    "http": {
                        "url": "123aadsfasdsk.img"
                    }
                }
            }
        }
    ]
}

2. the importer-pod will crash and retry 
3. then oc delete datavolume datavolume2


Actual results:
importer-pod will still retry even if datavolume not exist
and pvc will also be occupied since importer-pod keep retrying

Expected results:
importer-pod could be deleted even datavolume or pvc not exist

Additional info:

Comment 1 John Griffith 2018-11-14 00:28:22 UTC
Deleting the DV in this case "oc delete datavolume datavolume2" does delete the datavolume object, and it also issues the delete call to the PVC as expected, but the PVC will be in a terminating state because it's attached to the pod that's in the crash-loopback or error state.

If you then delete the pod the pvc deletion will complete as well.  In the case of pod failures, we leave those objects present in an error state so that they can be debugged.

The pod continuing the retry loop after a DV is deleted should be easy enough to fix.  I'll look at adding some logic to the DV controller so that a 'delete datavolume' call will go through and clean up any associated PODs and/or PVCs.

Let me know if there are any other details I'm missing here. Thanks!

Comment 2 John Griffith 2018-11-17 18:37:15 UTC
I've submitted a PR that explicitly cleans up PODs when a Data Volume is deleted; upstream PR is here:  https://github.com/kubevirt/containerized-data-importer/pull/526

Comment 3 Adam Litke 2019-01-08 13:39:15 UTC
The PR https://github.com/kubevirt/containerized-data-importer/pull/526 has been merged.

Comment 4 Qixuan Wang 2019-01-31 03:57:08 UTC
Tested with openshift v3.11.59, CNV v1.4.0 (http://download-node-02.eng.bos.redhat.com/rhel-7/nightly/CNV/CNV-1.4-RHEL-7-20190128.n.0/containers.list), the bug has been fixed, move it to VERIFIED, thanks.

Here are verification results.

1. A request with an invalid URL will be denied.
[root@cnv-executor-qwang-master1 ~]# oc create -f dv.yaml
Error from server: admission webhook "datavolume-create-validator.cdi.kubevirt.io" denied the request:  spec.source Invalid source URL: 123aadsfasdsk.img

2. Let a pod in the retry loop.
[root@cnv-executor-qwang-master1 ~]# oc get all
NAME                             READY     STATUS    RESTARTS   AGE
pod/importer-datavolume2-zl9g9   1/1       Running   2          6m

NAME                                                             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/glusterfs-dynamic-4ef2bc65-250a-11e9-878f-fa163e29a0b5   ClusterIP   172.30.37.239   <none>        1/TCP     6m

[root@cnv-executor-qwang-master1 ~]# oc logs pod/importer-datavolume2-zl9g9
I0131 03:44:10.042528       1 importer.go:45] Starting importer
I0131 03:44:10.042800       1 importer.go:58] begin import process
I0131 03:44:10.042823       1 importer.go:82] begin import process
I0131 03:44:10.042832       1 dataStream.go:293] copying "https://download.fedoraproject.org/pub/fedora/linux/releases/28/Cloud/x86_64/images/Fedora-Cloud-Base-28-1.1.x86_64.qcow2" to "/data/disk.img"...
I0131 03:44:10.845505       1 prlimit.go:107] ExecWithLimits qemu-img, [info --output=json https://download.fedoraproject.org/pub/fedora/linux/releases/28/Cloud/x86_64/images/Fedora-Cloud-Base-28-1.1.x86_64.qcow2]
I0131 03:44:11.790201       1 prlimit.go:107] ExecWithLimits qemu-img, [convert -p -f qcow2 -O raw json: {"file.driver": "https", "file.url": "https://download.fedoraproject.org/pub/fedora/linux/releases/28/Cloud/x86_64/images/Fedora-Cloud-Base-28-1.1.x86_64.qcow2", "file.timeout": 3600} /data/disk.img]
I0131 03:44:11.804786       1 qemu.go:189] 0.00

[root@cnv-executor-qwang-master1 ~]# oc get dv
NAME          AGE
datavolume2   9m

[root@cnv-executor-qwang-master1 ~]# oc delete dv datavolume2
datavolume.cdi.kubevirt.io "datavolume2" deleted

[root@cnv-executor-qwang-master1 ~]# oc get dv
No resources found.

[root@cnv-executor-qwang-master1 ~]# oc get pvc
No resources found.

[root@cnv-executor-qwang-master1 ~]# oc get all
No resources found.

Comment 7 errata-xmlrpc 2019-02-26 13:24:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0417