Bug 2046686
Summary: | Importer pod keeps in retarting when dataimportcron has a reference to invalid image sha | ||
---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | Yan Du <yadu> |
Component: | Storage | Assignee: | Arnon Gilboa <agilboa> |
Status: | CLOSED ERRATA | QA Contact: | Yan Du <yadu> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.10.0 | CC: | agilboa, alitke, cnv-qe-bugs, mrashish |
Target Milestone: | --- | ||
Target Release: | 4.10.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | CNV v4.10.1-75 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-05-18 20:26:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yan Du
2022-01-27 08:46:18 UTC
Arnon, I am pretty sure this is behaving as expected but I want to confirm a few things. When a DataImportCron import is failing do we emit an event? Are we raising an alert as well? If so, then I think we can close this. If not, we must have events and alerts to help customers understand that there is a problem that needs their attention. I also want to note our discussion about handling the case where an updated image appears in the registry while there is a currently failing DataImportCron. We will want to delete the old import and switch to the new image automatically. Adam, First of all, when a DataImportCron import is failing the usual PVC and DV events are emitted, e.g.: openshift-virtualization-os-images 16m Warning Error datavolume/centos8.4-123456789009 Unable to process data: Unable to transfer source data to scratch space: Failed to read registry image: Could not parse image: invalid reference format openshift-virtualization-os-images 16m Warning ErrImportFailed persistentvolumeclaim/centos8.4-123456789009 Unable to process data: Unable to transfer source data to scratch space: Failed to read registry image: Could not parse image: invalid reference format In addition, when DataImportCron is outdated (like this case) an alert will be raised (kubevirt_cdi_dataimportcron_outdated) When the import pod is restarted too many times (like this case) will also trigger an alert (kubevirt_cdi_import_dv_unusual_restartcount_total) As a manual solution, before deleting the failing import DV, you should first tell the DataImportCron to give up trying to import using the current source digest and revert to digest of the last successful import as noted in the DataSource. Otherwise, deleting the failing DV will cause it's recreation by the DataImportCron. $ oc describe das -n openshift-virtualization-os-images centos8.4 ... Spec: Source: Pvc: Name: centos8.4-4a6a0eee7b4a Use the pvc name suffix to annotate the DataImportCron sourceDesiredDigest: $ oc annotate --overwrite dic -n openshift-virtualization-os-images centos8.4-import-cron cdi.kubevirt.io/storage.import.sourceDesiredDigest=sha256:4a6a0eee7b4a Now you can safely delete the failing DV. Test on CNV-v4.10.1-62, issue has been fixed. $ oc get po NAME READY STATUS RESTARTS AGE importer-centos-stream8-123456789009 1/2 InvalidImageName 2 (56s ago) 3m8s Normal Created 80s (x2 over 2m22s) kubelet Created container importer Normal Started 80s (x2 over 2m22s) kubelet Started container importer Warning InspectFailed 80s (x8 over 2m22s) kubelet Failed to apply default image tag "quay.io/containerdisks/centos-stream:8@sha256:12345678900987654321": couldn't parse image reference "quay.io/containerdisks/centos-stream:8@sha256:12345678900987654321": invalid reference format Warning Failed 80s (x8 over 2m22s) kubelet Error: InvalidImageName The fix PR introduced a regression, move back to assigned {"level":"error","ts":1649760447.0815444,"logger":"controller.dataimportcron-controller","msg":"Reconciler error","name":"","namespace":"openshift-virtualization-os-images","error":"values[0][cdi.kubevirt.io/dataImportCron]: Invalid value: \"openshift-virtualization-os-images.\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')","errorCauses":[{"error":"values[0][cdi.kubevirt.io/dataImportCron]: Invalid value: \"openshift-virtualization-os-images.\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')"}],"stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"} Marking "ON_QA" because there's a downstream build of this that QA can try. (MODIFIED is for upstream code merged, but not yet in a downstream release) Oops, the fixed in version failed QA, updated to the one including the pull request fixing the regression. Test on CNV v4.10.1-78, issue has been fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.10.1 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4668 |