unavailableGauge metric is not always set and might report incorrect values especially in an HA setup where one instance could observe and mark a service as unavailable whereas some other instance might observe it as available. That would prevent the first instance from reflecting that state since it wouldn't observe any changes
Verified with OCP build: $ oc version Client Version: v4.4.0 Server Version: 4.4.0-0.nightly-2020-02-20-203407 Kubernetes Version: v1.17.1 Verification steps, 1. Make some apiservice fail, e.g. remove openshift-apiserver by: $ oc patch openshiftapiserver cluster --type=json -p '[{"op": "replace", "path": "/spec/managementState", "value": "Removed"}]' 2. Open the prometheus UI from OCP web console, enter keyword ‘aggregator_unavailable_apiservice_count’ and Click on'Exuecte', navigate to Console tab, some unavailable apiservices will be displayed with name and count. Another way, We can try to reboot one master node in terminal console with below CLI $ master=$(oc get node | grep master | awk '{print $1}' | head -1) $ oc debug no/$master -- chroot /host shutdown -r now During node restarting, repeat above step Click on'Exuecte' in prometheus UI, the result will be changed. We can see the feature works well with PR merged OCP build.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581