Bug 1985192

Summary: Imagepruner met error "Job has reached the specified backoff limit" which causes image registry degraded
Product: OpenShift Container Platform Reporter: XiuJuan Wang <xiuwang>
Component: Image RegistryAssignee: Oleg Bulatov <obulatov>
Status: CLOSED DUPLICATE QA Contact: XiuJuan Wang <xiuwang>
Severity: low Docs Contact:
Priority: low    
Version: 4.7CC: aos-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-11 14:34:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description XiuJuan Wang 2021-07-23 06:46:13 UTC
This bug was initially created as a copy of Bug #1887010

I am copying this bug because: 

Description of problem:
ImagePruner error causes image registry degraded.


Version-Release number of selected component (if applicable):

4.7.20-x86_64

How reproducible:
10%?

Steps to Reproduce:
1.Set up a cluster
2.
3.

Actual results:
Image registry is degraded for "ImagePrunerDegraded: Job has reached the specified backoff limit"
          spec:
            affinity: {}
            containers:
            - args:
              - adm
              - prune
              - images
              - --confirm=true
              - --certificate-authority=/var/run/configmaps/serviceca/service-ca.crt
              - --keep-tag-revisions=3
              - --keep-younger-than=60m
              - --ignore-invalid-refs=true
              - --loglevel=1
              - --prune-registry=true
              - --registry-url=https://image-registry.openshift-image-registry.svc:5000
              command:
              - oc
              image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3a555f211c96700bfe652a6919b8ce0bb1e8b939cd58bb4e874ee11eba183eee
              imagePullPolicy: IfNotPresent


[2021-07-22T06:29:45.114Z] Spec:
[2021-07-22T06:29:45.114Z] Status:
[2021-07-22T06:29:45.114Z]   Conditions:
[2021-07-22T06:29:45.114Z]     Last Transition Time:  2021-07-22T00:04:05Z
[2021-07-22T06:29:45.114Z]     Message:               Available: The registry is ready
[2021-07-22T06:29:45.114Z] ImagePrunerAvailable: Pruner CronJob has been created
[2021-07-22T06:29:45.114Z]     Reason:                Ready
[2021-07-22T06:29:45.114Z]     Status:                True
[2021-07-22T06:29:45.114Z]     Type:                  Available
[2021-07-22T06:29:45.114Z]     Last Transition Time:  2021-07-22T05:57:18Z
[2021-07-22T06:29:45.114Z]     Message:               Progressing: The registry is ready
[2021-07-22T06:29:45.114Z]     Reason:                Ready
[2021-07-22T06:29:45.114Z]     Status:                False
[2021-07-22T06:29:45.114Z]     Type:                  Progressing
[2021-07-22T06:29:45.114Z]     Last Transition Time:  2021-07-21T23:55:32Z
[2021-07-22T06:29:45.114Z]     Message:               ImagePrunerDegraded: Job has reached the specified backoff limit
[2021-07-22T06:29:45.114Z]     Reason:                ImagePrunerJobFailed
[2021-07-22T06:29:45.114Z]     Status:                True
[2021-07-22T06:29:45.114Z]     Type:                  Degraded
[2021-07-22T06:29:45.114Z]   Extension:               <nil>


Expected results:
Should has no such error.

Additional info:

Comment 6 XiuJuan Wang 2021-09-06 02:12:44 UTC
We met this issues several times in ci job, but the ci must-gather log not enough.
I couldn't reproduce it manually.
I will trigger more jobs and keep cluster alive when reproduce it.

Comment 8 Oleg Bulatov 2021-10-11 14:34:41 UTC

*** This bug has been marked as a duplicate of bug 1990125 ***