+++ This bug was initially created as a clone of Bug #1798450 +++ unavailableGauge metric is not always set and might report incorrect values especially in an HA setup where one instance could observe and mark a service as unavailable whereas some other instance might observe it as available. That would prevent the first instance from reflecting that state since it wouldn't observe any changes
Verified with OCP build 4.3.0-0.nightly-2020-05-07-171148, Verification steps, 1. Make some apiservice fail, e.g. remove openshift-apiserver by: $ oc patch openshiftapiserver cluster --type=json -p '[{"op": "replace", "path": "/spec/managementState", "value": "Removed"}]' 2. Wait for a while(about half an hour), Recover the apiservice. $ oc patch openshiftapiserver cluster --type=json -p '[{"op": "replace", "path": "/spec/managementState", "value": "Managed"}]' 3. Open the prometheus UI from OCP web console, enter keyword ‘aggregator_unavailable_apiservice_count’ and Click on'Exuecte', navigate to Console tab, some unavailable apiservices will be displayed with name and count.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2129