Bug 1830095

Summary: Cluster overview control plane status reported as degraded when no components are degraded
Product: OpenShift Container Platform Reporter: Samuel Padgett <spadgett>
Component: Management ConsoleAssignee: Rastislav Wagner <rawagner>
Status: CLOSED ERRATA QA Contact: Yadan Pei <yapei>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.5CC: abraren, aos-bugs, jokerman, wking, yanpzhan
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Version: 4.5.0-0.nightly-2020-04-30-112808 Cluster ID: 13ace493-6d50-479a-aead-02258ba49019 Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:75.0) Gecko/20100101 Firefox/75.0
Last Closed: 2020-07-13 17:34:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Popover has no degraded components none

Description Samuel Padgett 2020-04-30 20:16:05 UTC
Created attachment 1683448 [details]
Popover has no degraded components

See screenshot. "API Request Success Rate" is "Not available," which seems to be reported as degraded. Two issues here:

1. We should determine why "API Request Success Rate" metrics aren't available.
2. I don't think we should report "degraded" when we can't fetch metrics. This should be a different status.

Comment 1 Rastislav Wagner 2020-05-04 19:09:07 UTC
Adding Andy.

1. Instead of Not Available, we could show something like 'No data' ?
2. Any ideas what to show here ?

Comment 2 Andy Braren 2020-05-04 20:15:20 UTC
I believe we use "Not available" for similar states elsewhere, so that text string in the popover is probably fine. I may be missing some technical details though.

The substatus string below "Control Plane" should probably become "1 component not available" instead of "1 component degraded," and the icon should probably become the unknown icon, unless we think that's too harsh. That approach would align best with similar unknown/unavailable statuses in the Status card, like the Storage status seen in the screenshot with the unknown icon and a status of "Not available".

Does that approach sound good?

Comment 6 Samuel Padgett 2020-05-12 21:22:16 UTC
Moving back to assigned because we still need to understand why "API Request Success Rate" is unknown and fix or remove that query.

Comment 8 Yanping Zhang 2020-05-14 08:48:02 UTC
Checked on OCP 4.5 cluster with payload 4.5.0-0.nightly-2020-05-13-202437.
Go to Overview page, check "API Request Success Rate" under "Control Plane", it's show 100% now. The bug is fixed.

Comment 9 errata-xmlrpc 2020-07-13 17:34:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409