Bug 1879176 - image-registry operator is Degraded after 4.4->4.5 upgrade
Summary: image-registry operator is Degraded after 4.4->4.5 upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: ImageStreams
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.5.z
Assignee: Ricardo Maraschini
QA Contact: XiuJuan Wang
URL:
Whiteboard:
Depends On: 1880054
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-15 15:35 UTC by Oleg Bulatov
Modified: 2020-10-12 15:48 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Pruner job degrading the image registry operator when a pod referring to an invalid image reference exists. Consequence: With the image registry operator degraded upgrades are not possible. That would force users to either remove the "offending" pods and wait for the next pruning execution or to suspend the pruner job to be able to upgrade. Fix: 1. Not to take pruner status into account when computing the operator status. 2. Added metric and alert related to the problem (pruning failing) Result: Users are able to upgrade successfully.
Clone Of:
Environment:
Last Closed: 2020-10-12 15:47:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-image-registry-operator pull 612 0 None closed Bug 1879176: Alerting on failed image prune job 2021-02-01 10:29:19 UTC
Red Hat Product Errata RHBA-2020:3843 0 None None None 2020-10-12 15:48:20 UTC

Description Oleg Bulatov 2020-09-15 15:35:12 UTC
Description of problem:

When 4.4 cluster in upgraded to 4.5, the pruner can Degrade the operator.

It often happens because of invalid references that the pruner cannot parse.

Version-Release number of selected component (if applicable):

4.5.z

How reproducible:

Often

Steps to Reproduce:
1. create replicaset with an invalid reference (for example, `TO_BE_REPLACED` or `FOO_BAR_BAZ`), directly or via a deployment
2. wait until a new job is created for the image pruner cronjob

Actual results:

The operator becomes degraded and blocks upgrades.

Expected results:

There should be a way to unblock upgrades.

Additional info:

Comment 3 XiuJuan Wang 2020-09-28 03:13:37 UTC
Steps to verified:

1. Create replicaset with invalid image on 4.4.26
pod-pull-by-digests   0/1     InvalidImageName   0          3m53s

2.Enable imagepruner

3.Upgrade to 4.5.0-0.nightly-2020-09-26-194704

4.Check image registry clusteroperator after upgrade successfully.
the image registry co is not in degrade.

5.Check Alerting page on webconsole
ImagePrunerIsFailing is on firing

6.Query image_registry_operator_image_pruner_job_status on metrics page 
image_registry_operator_image_pruner_job_status{endpoint="60000",instance="10.129.0.10:60000",job="image-registry-operator",namespace="openshift-image-registry",pod="cluster-image-registry-operator-69bb48d877-kdfx6",service="image-registry-operator"}  1

Comment 6 errata-xmlrpc 2020-10-12 15:47:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.14 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3843


Note You need to log in before you can comment on or make changes to this bug.