2025878 – The import cron pod is not deleted after delete the dataimportcron if the import is failed

Bug 2025878 - The import cron pod is not deleted after delete the dataimportcron if the import is failed

Summary: The import cron pod is not deleted after delete the dataimportcron if the imp...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Arnon Gilboa
QA Contact:	Yan Du
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-11-23 09:40 UTC by Yan Du
Modified:	2022-03-16 15:56 UTC (History)
CC List:	3 users (show)
Fixed In Version:	CNV v4.10.0-605
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-16 15:56:33 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt containerized-data-importer pull 2084	None	closed	Set ActiveDeadline for DataImportCron poller job	2022-01-17 14:58:24 UTC
Github	kubevirt containerized-data-importer pull 2088	None	Merged	Cleanup DataImportCron jobs on deletion	2022-01-16 09:48:31 UTC
Github	kubevirt containerized-data-importer pull 2100	None	Merged	[release-v1.43] Cleanup DataImportCron jobs on deletion	2022-01-17 14:58:22 UTC
Red Hat Issue Tracker	CNV-15022	None	None	None	2022-01-17 09:49:57 UTC
Red Hat Product Errata	RHSA-2022:0947	None	None	None	2022-03-16 15:56:49 UTC

Description Yan Du 2021-11-23 09:40:37 UTC

Description of problem:
The import cron pod is not deleted after delete the dataimportcron if the import is failed

Version-Release number of selected component (if applicable):
CNV 4.10

How reproducible:
Always

Steps to Reproduce:
1. Create dataimportcron with en non-existed configmap
---
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataImportCron
metadata:
  name: fedora-image-import-cron
  namespace: openshift-virtualization-os-images
spec:
  template:
    spec:
      source:
        registry:
          url: "docker://quay.io/kubevirt/fedora-cloud-registry-disk-demo:latest"
          pullMethod: node
          certConfigMap: no-certs
      storage:
        resources:
          requests:
            storage: 5Gi
        storageClassName: hostpath-provisioner
  schedule: "* * * * *"
  garbageCollect: Outdated
  managedDataSource: fedora

2. Check the CronJobs
$ oc get cronjobs
NAMESPACE                              NAME                                SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
openshift-cnv                          fedora-image-import-cron-415f66b2   * * * * *      False     0        <none>          36s

3. Check the import pod in openshift-cnv ns - The import will be failed due to the configmap is not existed

4. Delete the DataImportCron
$ oc delete DataImportCron fedora-image-import-cron

5. Check the Cronjobs - the Cronjobs is deleted after the DataImportCron is deleted
 
6. Check the import pod in openshift-cnv ns again


Actual results:
The pod keep in CreateContainerError and never being deleted even the DataImportCron os deleted
$ oc get pod -n openshift-cnv | grep fedora
fedora-image-import-cron-415f66b2-27294271--1-7wvrw             0/1     CreateContainerError   0            35m


Expected results:
The import pod should be deleted 

Additional info:

Comment 1 Arnon Gilboa 2021-11-30 16:28:00 UTC

Yan, reproducing it in my kubevirt ci env I see the cronjob pod (also in your case it's not the importer pod) stuck in ContainerCreating with the following warning in its describe:
  Warning  FailedMount  4m6s                  kubelet            Unable to attach or mount volumes: unmounted volumes=[cdi-cert-vol], unattached volumes=[cdi-cert-vol kube-api-access-d4tjp]: timed out waiting for the condition

What warning do you have in your pod describe?
It's interesting that your pod got to CreateContainerError unlike mine.
How long it took to get to this status?

Comment 2 Yan Du 2021-12-01 03:35:04 UTC

Seems the pod is in Error status now:
$ oc get pod -n openshift-cnv | grep fedora
fedora-image-import-cron-5981a50c-27296764--1-bjps8             0/1     Error               0               6d
fedora-image-import-cron-5981a50c-27296764--1-ftgxf             0/1     ContainerCreating   0               6d
fedora-image-import-cron-5981a50c-27296764--1-kpmhn             0/1     Error               0               6d
fedora-image-import-cron-5981a50c-27296764--1-mctdc             0/1     Error               0               6d
fedora-image-import-cron-9bb74e4e-27296766--1-kw8jv             0/1     ContainerCreating   0               6d
fedora-image-import-cron-fb0f3696-27305437--1-x7x5m             0/1     ContainerCreating   0               20m


describe pod log:
$ oc describe pod fedora-image-import-cron-5981a50c-27296764--1-bjps8 -n openshift-cnv
Name:         fedora-image-import-cron-5981a50c-27296764--1-bjps8
Namespace:    openshift-cnv
Priority:     0
Node:         yadu-kt9ms-worker-0-tvdwk/192.168.0.209
Start Time:   Thu, 25 Nov 2021 02:04:07 +0000
Labels:       controller-uid=8b956f0d-fa45-4dec-ba4d-e2de0aaaf72c
              job-name=fedora-image-import-cron-5981a50c-27296764
Annotations:  k8s.v1.cni.cncf.io/network-status:
                [{
                    "name": "openshift-sdn",
                    "interface": "eth0",
                    "ips": [
                        "10.129.2.134"
                    ],
                    "default": true,
                    "dns": {}
                }]
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "openshift-sdn",
                    "interface": "eth0",
                    "ips": [
                        "10.129.2.134"
                    ],
                    "default": true,
                    "dns": {}
                }]
              openshift.io/scc: restricted
Status:       Failed
IP:           10.129.2.134
IPs:
  IP:           10.129.2.134
Controlled By:  Job/fedora-image-import-cron-5981a50c-27296764
Containers:
  cdi-source-update-poller:
    Container ID:  cri-o://98fbaf43b7bee5e3787300769bdc5713b879146150344c8c8d39053c790894c2
    Image:         registry.redhat.io/container-native-virtualization/virt-cdi-importer@sha256:be877b23a9c5df1e7374e6b1029d1e71a5f02e2ffc6788ea7d5648684be4d1a0
    Image ID:      registry.redhat.io/container-native-virtualization/virt-cdi-importer@sha256:ae1c9df86982ae0a650f08cc49faeee4b07dd004e44e5c8d4d86ffb0236ab174
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/bin/cdi-source-update-poller
      -ns
      openshift-virtualization-os-images
      -cron
      fedora-image-import-cron
      -url
      docker://quay.io/kubevirt/fedora-cloud-registry-disk-demo:latest
      -certdir
      /certs
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 25 Nov 2021 02:04:10 +0000
      Finished:     Thu, 25 Nov 2021 02:04:11 +0000
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /certs from cdi-cert-vol (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xwrlz (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  cdi-cert-vol:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      some-certs
    Optional:  false
  kube-api-access-xwrlz:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>


$ oc logs fedora-image-import-cron-5981a50c-27296764--1-bjps8 -n openshift-cnv
I1125 02:04:10.672251       1 transport.go:227] Inspecting image from 'docker://quay.io/kubevirt/fedora-cloud-registry-disk-demo:latest'
Digest is sha256:6f5afce978111b0968d1d718435df7ef4f0b266715acd41620321b6ed3c28ad6
W1125 02:04:10.892811       1 client_config.go:614] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2021/11/25 02:04:10 Failed getting DataImportCron, cronNamespace openshift-virtualization-os-images cronName fedora-image-import-cron: dataimportcrons.cdi.kubevirt.io "fedora-image-import-cron" not found

Comment 3 Maya Rashish 2022-01-09 10:37:56 UTC

PR got closed, moving back to assigned

Comment 4 Yan Du 2022-01-21 10:26:30 UTC

Test on CNV v4.10.0-605, issue have been fixed.

Comment 9 errata-xmlrpc 2022-03-16 15:56:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0947

Note You need to log in before you can comment on or make changes to this bug.