Bug 1817446 - multus-admission-controller-monitor-service alert with TargetDown is firing
Summary: multus-admission-controller-monitor-service alert with TargetDown is firing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.5.0
Assignee: Douglas Smith
QA Contact: Weibin Liang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-26 11:20 UTC by Lili Cosic
Modified: 2020-07-13 17:24 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:23:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:24:15 UTC

Description Lili Cosic 2020-03-26 11:20:47 UTC
Description of problem:
TargetDown alert is firing in 4.5 clusters and in CI.

{\“metric\“:{\“alertname\“:\“TargetDown\“,\“alertstate\“:\“firing\“,\“job\“:\“multus-admission-controller-monitor-service\“,\“namespace\“:\“openshift-multus\“,\“service\“:\“multus-admission-controller-monitor-service\“,\“severity\“:\“warning\“},\“value\“:[1585130732.425,\“34\“]}]“

Version-Release number of selected component (if applicable):
current 4.5 CI and 4.5.0-0.nightly-2020-03-26-031938

Expected results:
Alert should not be firing.

Comment 1 W. Trevor King 2020-03-26 20:58:19 UTC
This is killing CI:

$ curl -s 'https://search.svc.ci.openshift.org/search?search=promQL+query:+count_over_time.*reported+incorrect+results&type=build-log&maxAge=24h&context=0' | jq -r '. | to_entries[].value | to_entries[].value[].context[]' | sed -n 's/.*incorrect results:\\n\(.*\)",$/\1/p' | sed 's|\\||g' | jq -r '.[] | select(.metric.alertname == "TargetDown").metric | .namespace + " " + .job' | sort | uniq -c | sort -n | tail -n5
      1 openshift-service-catalog-apiserver-operator metrics
      1 openshift-service-catalog-controller-manager-operator metrics
      2 openshift-authentication-operator metrics
     13 openshift-console-operator metrics
    175 openshift-multus multus-admission-controller-monitor-service

Comment 4 Weibin Liang 2020-04-22 18:19:42 UTC
@aputtur, could you include your PR in this bug? Thanks!

Comment 6 Weibin Liang 2020-04-22 22:22:34 UTC
Tested and verified in 4.5.0-0.nightly-2020-04-21-103613

[weliang@weliang networking]$ token=`oc -n openshift-monitoring sa get-token prometheus-k8s`
[weliang@weliang networking]$ oc get routes -A  | grep prometheus
openshift-monitoring       prometheus-k8s      prometheus-k8s-openshift-monitoring.apps.qe-weliangsdn2.qe.devcluster.openshift.com             prometheus-k8s      web     reencrypt/Redirect     None
[weliang@weliang networking]$ curl -k -H "Authorization: Bearer $token"  https://prometheus-k8s-openshift-monitoring.apps.qe-weliangsdn2.qe.devcluster.openshift.com/api/v1/alerts | grep TargetDown 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8128    0  8128    0     0  20125      0 --:--:-- --:--:-- --:--:-- 20118
[weliang@weliang networking]$ 
[weliang@weliang networking]$ 
[weliang@weliang networking]$ 
[weliang@weliang networking]$ curl -k -H "Authorization: Bearer $token"  https://prometheus-k8s-openshift-monitoring.apps.qe-weliangsdn2.qe.devcluster.openshift.com/api/v1/alerts | grep TargetDown | grep multus
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8124    0  8124    0     0  17594      0 --:--:-- --:--:-- --:--:-- 17584
[weliang@weliang networking]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-04-21-103613   True        False         5h32m   Cluster version is 4.5.0-0.nightly-2020-04-21-103613
[weliang@weliang networking]$

Comment 9 errata-xmlrpc 2020-07-13 17:23:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.