Description of problem: Importer pod keeps in retarting when dataimportcron has a reference to invalid image sha Version-Release number of selected component (if applicable): CNV-4.10.0-605 How reproducible: Always Steps to Reproduce: 1. Make sure enableCommonBootImageImport is enabled $ oc get hco -n openshift-cnv -o yaml | grep -i bootimage enableCommonBootImageImport: true 2. Annotate the dataimportcron and make it has a reference to invalid image sha oc annotate dic centos-stream8-image-cron --overwrite cdi.kubevirt.io/storage.import.sourceDesiredDigest=sha256:12345678900987654321 3. Check the import process Actual results: Importer pod keeps restarting $ oc get dv centos-stream8-123456789009 NAME PHASE PROGRESS RESTARTS AGE centos-stream8-123456789009 ImportInProgress N/A 10 29m $ oc get pvc centos-stream8-123456789009 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE centos-stream8-123456789009 Bound pvc-47e7df1e-852b-424e-9454-e74ec8697043 68Gi RWO hostpath-provisioner 30m $ oc get pvc centos-stream8-123456789009-scratch NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE centos-stream8-123456789009-scratch Bound pvc-f40008cc-bec4-4635-8e71-ce43f324333d 68Gi RWO hostpath-provisioner 30m Got some unfriendly warning messages: From DV: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pending 20m datavolume-controller PVC centos-stream8-123456789009 Pending Normal CreatingScratchSpace 20m datavolume-controller target PVC centos-stream8-123456789009 Pending and Creating scratch space Normal Claim Pending 20m datavolume-controller target PVC centos-stream8-123456789009 Pending and Claim Pending Normal ImportScheduled 20m datavolume-controller Import into centos-stream8-123456789009 scheduled Normal Claim Pending 20m datavolume-controller Claim Pending Normal Bound 20m datavolume-controller PVC centos-stream8-123456789009 Bound Normal ImportInProgress 20m datavolume-controller Import into centos-stream8-123456789009 in progress Warning Error 4m4s (x8 over 20m) datavolume-controller Unable to process data: invalid reference format Could not parse image kubevirt.io/containerized-data-importer/pkg/importer.readImageSource /remote-source/app/pkg/importer/transport.go:73 kubevirt.io/containerized-data-importer/pkg/importer.copyRegistryImage /remote-source/app/pkg/importer/transport.go:182 kubevirt.io/containerized-data-importer/pkg/importer.CopyRegistryImage /remote-source/app/pkg/importer/transport.go:262 kubevirt.io/containerized-data-importer/pkg/importer.(*RegistryDataSource).Transfer /remote-source/app/pkg/importer/registry-datasource.go:83 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:197 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:172 main.main /remote-source/app/cmd/cdi-importer/importer.go:138 runtime.main /usr/lib/golang/src/runtime/proc.go:255 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1581 Failed to read registry image kubevirt.io/containerized-data-importer/pkg/importer.(*RegistryDataSource).Transfer /remote-source/app/pkg/importer/registry-datasource.go:85 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:197 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:172 main.main /remote-source/app/cmd/cdi-importer/importer.go:138 runtime.main /usr/lib/golang/src/runtime/proc.go:255 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1581 Unable to transfer source data to scratch space kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:202 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:172 main.main /remote-source/app/cmd/cdi-importer/importer.go:138 runtime.main /usr/lib/golang/src/runtime/proc.go:255 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1581 From PVC: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal WaitForFirstConsumer 32m (x2 over 32m) persistentvolume-controller waiting for first consumer to be created before binding Normal WaitForPodScheduled 32m (x2 over 32m) persistentvolume-controller waiting for pod importer-centos-stream8-123456789009 to be scheduled Normal ExternalProvisioning 32m (x2 over 32m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "kubevirt.io/hostpath-provisioner" or manually created by system administrator Normal Provisioning 32m kubevirt.io/hostpath-provisioner_hostpath-provisioner-4hj57_27cafcf6-dd69-4e09-aef9-fbf16a0beed4 External provisioner is provisioning volume for claim "openshift-virtualization-os-images/centos-stream8-123456789009" Normal ProvisioningSucceeded 32m kubevirt.io/hostpath-provisioner_hostpath-provisioner-4hj57_27cafcf6-dd69-4e09-aef9-fbf16a0beed4 Successfully provisioned volume pvc-47e7df1e-852b-424e-9454-e74ec8697043 Warning ErrImportFailed 110s (x946 over 31m) import-controller Unable to process data: invalid reference format Could not parse image kubevirt.io/containerized-data-importer/pkg/importer.readImageSource /remote-source/app/pkg/importer/transport.go:73 kubevirt.io/containerized-data-importer/pkg/importer.copyRegistryImage /remote-source/app/pkg/importer/transport.go:182 kubevirt.io/containerized-data-importer/pkg/importer.CopyRegistryImage /remote-source/app/pkg/importer/transport.go:262 kubevirt.io/containerized-data-importer/pkg/importer.(*RegistryDataSource).Transfer /remote-source/app/pkg/importer/registry-datasource.go:83 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:197 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:172 main.main /remote-source/app/cmd/cdi-importer/importer.go:138 runtime.main /usr/lib/golang/src/runtime/proc.go:255 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1581 Failed to read registry image kubevirt.io/containerized-data-importer/pkg/importer.(*RegistryDataSource).Transfer /remote-source/app/pkg/importer/registry-datasource.go:85 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:197 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:172 main.main /remote-source/app/cmd/cdi-importer/importer.go:138 runtime.main /usr/lib/golang/src/runtime/proc.go:255 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1581 Unable to transfer source data to scratch space kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:202 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:172 main.main /remote-source/app/cmd/cdi-importer/importer.go:138 runtime.main /usr/lib/golang/src/runtime/proc.go:255 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1581 Expected results: 1. Import pod will not keep trying to import and restart 2. PVC should not bound since the image has invalid reference format 3. Better to remove the trace log Additional info:
Arnon, I am pretty sure this is behaving as expected but I want to confirm a few things. When a DataImportCron import is failing do we emit an event? Are we raising an alert as well? If so, then I think we can close this. If not, we must have events and alerts to help customers understand that there is a problem that needs their attention. I also want to note our discussion about handling the case where an updated image appears in the registry while there is a currently failing DataImportCron. We will want to delete the old import and switch to the new image automatically.
Adam, First of all, when a DataImportCron import is failing the usual PVC and DV events are emitted, e.g.: openshift-virtualization-os-images 16m Warning Error datavolume/centos8.4-123456789009 Unable to process data: Unable to transfer source data to scratch space: Failed to read registry image: Could not parse image: invalid reference format openshift-virtualization-os-images 16m Warning ErrImportFailed persistentvolumeclaim/centos8.4-123456789009 Unable to process data: Unable to transfer source data to scratch space: Failed to read registry image: Could not parse image: invalid reference format In addition, when DataImportCron is outdated (like this case) an alert will be raised (kubevirt_cdi_dataimportcron_outdated) When the import pod is restarted too many times (like this case) will also trigger an alert (kubevirt_cdi_import_dv_unusual_restartcount_total) As a manual solution, before deleting the failing import DV, you should first tell the DataImportCron to give up trying to import using the current source digest and revert to digest of the last successful import as noted in the DataSource. Otherwise, deleting the failing DV will cause it's recreation by the DataImportCron. $ oc describe das -n openshift-virtualization-os-images centos8.4 ... Spec: Source: Pvc: Name: centos8.4-4a6a0eee7b4a Use the pvc name suffix to annotate the DataImportCron sourceDesiredDigest: $ oc annotate --overwrite dic -n openshift-virtualization-os-images centos8.4-import-cron cdi.kubevirt.io/storage.import.sourceDesiredDigest=sha256:4a6a0eee7b4a Now you can safely delete the failing DV.
Test on CNV-v4.10.1-62, issue has been fixed. $ oc get po NAME READY STATUS RESTARTS AGE importer-centos-stream8-123456789009 1/2 InvalidImageName 2 (56s ago) 3m8s Normal Created 80s (x2 over 2m22s) kubelet Created container importer Normal Started 80s (x2 over 2m22s) kubelet Started container importer Warning InspectFailed 80s (x8 over 2m22s) kubelet Failed to apply default image tag "quay.io/containerdisks/centos-stream:8@sha256:12345678900987654321": couldn't parse image reference "quay.io/containerdisks/centos-stream:8@sha256:12345678900987654321": invalid reference format Warning Failed 80s (x8 over 2m22s) kubelet Error: InvalidImageName
The fix PR introduced a regression, move back to assigned {"level":"error","ts":1649760447.0815444,"logger":"controller.dataimportcron-controller","msg":"Reconciler error","name":"","namespace":"openshift-virtualization-os-images","error":"values[0][cdi.kubevirt.io/dataImportCron]: Invalid value: \"openshift-virtualization-os-images.\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')","errorCauses":[{"error":"values[0][cdi.kubevirt.io/dataImportCron]: Invalid value: \"openshift-virtualization-os-images.\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')"}],"stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
Marking "ON_QA" because there's a downstream build of this that QA can try. (MODIFIED is for upstream code merged, but not yet in a downstream release)
Oops, the fixed in version failed QA, updated to the one including the pull request fixing the regression.
Test on CNV v4.10.1-78, issue has been fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.10.1 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4668