Bug 1879176

Summary: image-registry operator is Degraded after 4.4->4.5 upgrade
Product: OpenShift Container Platform Reporter: Oleg Bulatov <obulatov>
Component: ImageStreamsAssignee: Ricardo Maraschini <rmarasch>
Status: CLOSED ERRATA QA Contact: XiuJuan Wang <xiuwang>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.5CC: aos-bugs, jokerman, wzheng
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Pruner job degrading the image registry operator when a pod referring to an invalid image reference exists. Consequence: With the image registry operator degraded upgrades are not possible. That would force users to either remove the "offending" pods and wait for the next pruning execution or to suspend the pruner job to be able to upgrade. Fix: 1. Not to take pruner status into account when computing the operator status. 2. Added metric and alert related to the problem (pruning failing) Result: Users are able to upgrade successfully.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-12 15:47:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1880054    
Bug Blocks:    

Description Oleg Bulatov 2020-09-15 15:35:12 UTC
Description of problem:

When 4.4 cluster in upgraded to 4.5, the pruner can Degrade the operator.

It often happens because of invalid references that the pruner cannot parse.

Version-Release number of selected component (if applicable):

4.5.z

How reproducible:

Often

Steps to Reproduce:
1. create replicaset with an invalid reference (for example, `TO_BE_REPLACED` or `FOO_BAR_BAZ`), directly or via a deployment
2. wait until a new job is created for the image pruner cronjob

Actual results:

The operator becomes degraded and blocks upgrades.

Expected results:

There should be a way to unblock upgrades.

Additional info:

Comment 3 XiuJuan Wang 2020-09-28 03:13:37 UTC
Steps to verified:

1. Create replicaset with invalid image on 4.4.26
pod-pull-by-digests   0/1     InvalidImageName   0          3m53s

2.Enable imagepruner

3.Upgrade to 4.5.0-0.nightly-2020-09-26-194704

4.Check image registry clusteroperator after upgrade successfully.
the image registry co is not in degrade.

5.Check Alerting page on webconsole
ImagePrunerIsFailing is on firing

6.Query image_registry_operator_image_pruner_job_status on metrics page 
image_registry_operator_image_pruner_job_status{endpoint="60000",instance="10.129.0.10:60000",job="image-registry-operator",namespace="openshift-image-registry",pod="cluster-image-registry-operator-69bb48d877-kdfx6",service="image-registry-operator"}  1

Comment 6 errata-xmlrpc 2020-10-12 15:47:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.14 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3843