1740365 – openshift api server operator reporting unavailable

Bug 1740365 - openshift api server operator reporting unavailable

Summary: openshift api server operator reporting unavailable

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	openshift-apiserver
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.3.0
Assignee:	Stefan Schimanski
QA Contact:	Xingxing Xia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-12 18:22 UTC by Ben Parees
Modified:	2020-01-08 11:40 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-11-12 09:14:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Ben Parees 2019-08-12 18:22:33 UTC

Description of problem:
63 clusters reporting unavailable in telemeter, this is a top 5 unavailable operator.  Please investigate why this count is so high.

In addition the reasons being reported include "AvailableMultiple", "AvailableNoApiServerPod".  The former is very confusing/meaningless.  The latter is confusing because it includes the prefix "Available" which i suspect is intended to mean "this is a reason for the Available condition" but it reads as "the reason this operator is unavailable is that it's available with no api server pod".

The reasons text should be cleaned up to make it obvious what the actual issue is.

Comment 2 Michal Fojtik 2019-08-19 08:04:31 UTC

Ben, there is only Available condition, not Unavailable right? So "Available=false, Reason=AvailableNoAPIServerPod" seems fine as there is literally "cluster is not available because of no API server pod". We can change that to "AvailableNoDaemonSetPod" or something. It is not great but not terrible :-)

The source for this condition is in DaemonSet as is reporting NumberAvailable=0 in the status. This is usually caused by nodes being not available to schedule pods or scheduler is down.

Comment 4 Ben Parees 2019-08-19 14:00:57 UTC

> Ben, there is only Available condition, not Unavailable right? So "Available=false, Reason=AvailableNoAPIServerPod" seems fine as there is literally "cluster is not available because of no API server pod". We can change that to "AvailableNoDaemonSetPod" or something. It is not great but not terrible :-)


you should remove the Available prefix from all these reasons.  What is it telling me? I already know this is a reason for the Available condition (which I assume is why that prefix is there....to indicate this is an "Available condition reason").  So the reason it's unavailable is "NoAPIServerPod" not "AvailableNoAPIServerPod".  

For an end user, Seeing "Available" there in a reason for why something is Available=false is confusing and not helpful.

Also you addressed AvailableNoAPIServerPod above (I don't care if you change it to "NoAPIServerPod or NoDaemonSetPod or even just "NoPodsAvailable"), but what about "AvailableMultiple"?  What is that telling me?  Again if we assume "Available" is a meaningless prefix, then the reason is just "Multiple".  Available=false, reason "Multiple"??

Note You need to log in before you can comment on or make changes to this bug.