Created attachment 1723639 [details] Final error dump from failed image-prune job Description of problem: The nightly image-pruner job occasionally fails when pruning images. When this occurs the image-registry operator is in a degraded state until the job is removed. Version-Release number of selected component (if applicable): Observed in 4.6.0rc3 and 4.6.0rc4 clusters. How reproducible: It seems to be unpredictable. Over the course of the last week I've seen it occur on three separate clusters, not always on the same days. On one cluster it failed two days in a row, succeeded on the third, then failed again on the fourth. On another cluster, it failed on one day, succeeded on the subsequent day, and failed again on the day after that. Steps to Reproduce: Has the potential to occur during the nightly pruning job. Actual results: The prune job fails. The job log contains the error below. The image SHA256 checksum is different per cluster. The full stack dump from one cluster is included as an attachment. The image-registry operator subsequently goes into a degraded state until the job is deleted. F1023 00:00:09.700432 1 helpers.go:115] error: image sha256:e57c71e6bb180424e5d4a27d629609847686385880105fb52fc8fe190ff57a4a: failed to delete image sha256:e57c71e6bb180424e5d4a27d629609847686385880105fb52fc8fe190ff57a4a: images.image.openshift.io "sha256:e57c71e6bb180424e5d4a27d629609847686385880105fb52fc8fe190ff57a4a" not found goroutine 1 [running]: k8s.io/klog/v2.stacks(0xc000012001, 0xc00014a700, 0x152, 0x301) /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:996 +0xb9 k8s.io/klog/v2.(*loggingT).output(0x4d2aec0, 0xc000000003, 0x0, 0x0, 0xc0000d03f0, 0x48a9af4, 0xa, 0x73, 0x41d400) /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:945 +0x191 k8s.io/klog/v2.(*loggingT).printDepth(0x4d2aec0, 0x3, 0x0, 0x0, 0x2, 0xc00280d738, 0x1, 0x1) /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:718 +0x165 k8s.io/klog/v2.FatalDepth(...) Expected results: The prune job should succeed. Additional info:
Verified with 4.8.0-0.nightly-2021-04-22-013545: 1. #oc edit imagepruner spec: failedJobsHistoryLimit: 3 ignoreInvalidImageReferences: false keepTagRevisions: 0 keepYoungerThan: 0 logLevel: Normal schedule: '*/1 * * * *' successfulJobsHistoryLimit: 3 suspend: false 2. $ cat bug #!/bin/bash for (( i=1; i<=100; i++ )) do ./oc new-project wzhengc$i ./oc new-app ruby~https://github.com/openshift/ruby-ex sleep 20 ./oc start-build ruby-ex ./oc start-build ruby-ex ./oc start-build ruby-ex sleep 80 ./oc delete imagestreamtag ruby-ex:latest ./oc adm prune images --keep-younger-than=0 --keep-tag-revisions=1 --prune-registry=true --confirm=true --registry-url=default-route-openshift-image-registry.apps.qe-groupd-0422.qe.devcluster.openshift.com done
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438