Bug 1749152

Summary: When cluster operator object is deleted, samples operator does not restore the Degraded condition to status for 5+ minutes
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: SamplesAssignee: Gabe Montero <gmontero>
Status: CLOSED ERRATA QA Contact: XiuJuan Wang <xiuwang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.0   
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:40:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2019-09-05 03:10:51 UTC
If the co/samples object is deleted, the samples operator sometimes fails to restore the Degraded false condition to the conditions array.  This leads to visual inconsistency and may cause generic operator actions (test cases, user interfaces, etc) to display an error.

Cluster operators are required to report all three conditions with the appropriate true/false/unknown status, although it is allowable to delay slightly (seconds) before filling out conditions it should not be multiple minutes before the condition is set to false.

apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-09-05T03:00:41Z"
  generation: 1
  name: openshift-samples
  resourceVersion: "48455"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/openshift-samples
  uid: 5897ab71-cf89-11e9-9e6a-126169521af2
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-09-05T03:00:41Z"
    message: Samples installation successful at 4.2.0-0.ci-2019-09-04-220355
    status: "True"
    type: Available
  - lastTransitionTime: "2019-09-05T03:00:41Z"
    message: Samples installation successful at 4.2.0-0.ci-2019-09-04-220355
    status: "False"
    type: Progressing
  - lastTransitionTime: "2019-09-05T03:07:24Z"
    status: "False"
    type: Degraded

Note the transition times (before this, the condition was absent)

It would be acceptable to set Degraded false at the same time Available and Progressing are set, and set Degrade true when an error is actually detected.

Comment 1 Gabe Montero 2019-09-05 20:11:55 UTC
Clayton - please confirm whether "If the co/samples object is deleted" means:

1) oc delete configs.samples cluster
2) oc delete clusteroperator openshift-samples

My initial interpretation is that it is 1), but would like to be sure.

thanks

Comment 3 XiuJuan Wang 2019-09-10 10:02:31 UTC
Before the fix, delete config.samples cluster,
status:
  conditions:
  - lastTransitionTime: "2019-09-10T09:35:14Z"
    status: "False"
    type: Available
  - lastTransitionTime: "2019-09-10T09:35:14Z"
    status: "False"
    type: Progressing
  - lastTransitionTime: "2019-09-10T02:54:20Z"
    status: "False"
    type: Degraded

After including the fix in 4.2.0-0.nightly-2019-09-10-074025
Delete config.samples cluster,the openshift-samples co reports:
status:
  conditions:
  - lastTransitionTime: "2019-09-10T09:37:55Z"
    status: "False"
    type: Available
  - lastTransitionTime: "2019-09-10T09:38:01Z"
    message: Samples processing to 4.2.0-0.nightly-2019-09-10-074025
    status: "True"
    type: Progressing
  - lastTransitionTime: "2019-09-10T09:37:55Z"
    status: "False"
    type: Degraded

Comment 4 errata-xmlrpc 2019-10-16 06:40:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Comment 5 Red Hat Bugzilla 2023-09-14 05:42:51 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days