Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1857684

Summary: operator interprets running pruning job as success
Product: OpenShift Container Platform Reporter: Oleg Bulatov <obulatov>
Component: Image RegistryAssignee: Ricardo Maraschini <rmarasch>
Status: CLOSED ERRATA QA Contact: Wenjing Zheng <wzheng>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4CC: aos-bugs, pasik, rmarasch
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Operator was taking into account "running" jobs when deriving its own status. Consequence: Running jobs may be not in a failure state yet, therefore the operator could report itself as healthy while the job was still running. Fix: Ignoring running jobs when deriving the operator status. Result: Operator now uses the status of the last finished job always thus reporting its status on the right way.
Story Points: ---
Clone Of:
: 1873534 (view as bug list) Environment:
Last Closed: 2020-10-27 16:15:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1873496    

Description Oleg Bulatov 2020-07-16 10:58:58 UTC
Description of problem:

When the pruner jobs has persistent problem, the operator from time to time can report that the pruner is healthy. It happens when the running job hasn't failed yet. Another problem is that failed pods are automatically removed so we can't check their log output.

Version-Release number of selected component (if applicable):

4.4+?

How reproducible:

Always.

Steps to Reproduce:

1. Create a deployment with an image reference that the pruner cannot parse.
2. Wait until the pruner fails to parse it.
3. Watch the operator conditions.
4. After the failure try to locate the job pod and read it's output (it won't exist)


Actual results:

The operator flakes and we can't read the job output log.

Expected results:

The operator stays Degraded and we can't see why it is degraded (by inspecting the pod log).

Additional info:

Comment 4 Wenjing Zheng 2020-08-20 09:37:05 UTC
Verified on 4.6.0-0.nightly-2020-08-18-165040:
1. Make image pruner degrade;
2. Create a deployment with invalid image name
3. Watch image registry status: it remains degrade.

Comment 5 Ricardo Maraschini 2020-08-28 13:57:17 UTC
*** Bug 1857687 has been marked as a duplicate of this bug. ***

Comment 7 errata-xmlrpc 2020-10-27 16:15:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196