Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1798450

Summary:	kube-aggregator: unavailableGauge is wrong
Product:	OpenShift Container Platform	Reporter:	Lukasz Szaszkiewicz <lszaszki>
Component:	kube-apiserver	Assignee:	Lukasz Szaszkiewicz <lszaszki>
Status:	CLOSED ERRATA	QA Contact:	Ke Wang <kewang>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	4.4	CC:	aos-bugs, mfojtik, sttts, xxia
Target Milestone:	---
Target Release:	4.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Release Note
Doc Text:	Fixed aggregator_unavailable_apiservice metric to have correct value.	Story Points:	---
Clone Of:
Clones:	1798461 (view as bug list)		Environment:
Last Closed:	2020-05-04 11:33:43 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1798461

Description Lukasz Szaszkiewicz 2020-02-05 11:52:56 UTC

unavailableGauge metric is not always set and might report incorrect values especially in an HA setup where one instance could observe and mark a service as unavailable whereas some other instance might observe it as available. That would prevent the first instance from reflecting that state since it wouldn't observe any changes

Comment 4 Ke Wang 2020-02-21 15:46:56 UTC

Verified with OCP build:
$ oc version
Client Version: v4.4.0
Server Version: 4.4.0-0.nightly-2020-02-20-203407
Kubernetes Version: v1.17.1

Verification steps,
1. Make some apiservice fail, e.g. remove openshift-apiserver by:
 $ oc patch openshiftapiserver cluster --type=json -p '[{"op": "replace", "path": "/spec/managementState", "value": "Removed"}]'
 
2. Open the prometheus UI from OCP web console, enter keyword ‘aggregator_unavailable_apiservice_count’  and Click on'Exuecte', navigate to Console tab, some unavailable apiservices will be displayed with name and count.

Another way, 
We can try to reboot one master node in terminal console with below CLI
$ master=$(oc get node | grep master | awk '{print $1}' | head -1)
$ oc debug no/$master -- chroot /host shutdown -r now

During node restarting, repeat above step Click on'Exuecte' in prometheus UI, the result will be changed. 

We can see the feature works well with PR merged OCP build.

Comment 6 errata-xmlrpc 2020-05-04 11:33:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581