Bug 2028928 - TargetDown alerts (false positives) on deleted prometheus-adapter pods
Summary: TargetDown alerts (false positives) on deleted prometheus-adapter pods
Keywords:
Status: CLOSED DUPLICATE of bug 1943860
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Jan Fajerski
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-03 18:26 UTC by ncarmich
Modified: 2022-11-17 14:18 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-22 13:13:37 UTC
Target Upstream Version:
Embargoed:
sthaha: needinfo-


Attachments (Terms of Use)

Description ncarmich 2021-12-03 18:26:55 UTC
Description of problem:

[SF case #: 03094729]

We see inconsistent information between Prometheus and object state checked using oc command. Prometheus has information about pods which were removed, as a result it monitors two additional targets. This triggers false positive "TargetDown" alerts. I attached screenshots from Prometheus UI.


```sh
sd-df2a-ef7d: ~  namicg39021p/openshift-monitoring $ oc get po -l name=prometheus-adapter -n openshift-monitoring 
NAME                                  READY   STATUS    RESTARTS   AGE
prometheus-adapter-6f74bb68fd-c9rvr   1/1     Running   0          5d14h
prometheus-adapter-6f74bb68fd-kmg8l   1/1     Running   0          5d14h


sd-df2a-ef7d: ~  namicg39021p/openshift-monitoring $ oc get po prometheus-adapter-6b765fc44b-kbhxm -n openshift-monitoring 
Error from server (NotFound): pods "prometheus-adapter-6b765fc44b-kbhxm" not found
sd-df2a-ef7d: ~  namicg39021p/openshift-monitoring $ oc get po prometheus-adapter-6b765fc44b-h7nx5 -n openshift-monitoring 
Error from server (NotFound): pods "prometheus-adapter-6b765fc44b-h7nx5" not found
```

Where are you experiencing the behavior? What environment?
prod

When does the behavior occur? Frequency? Repeatedly? At certain times?
Random

What is the business impact? Please also provide timeframe information.
false positive alerts are generated, it creates noise


Additional info:

must-gather is located here: [SF case #: 03094729 - comment#-3]

Comment 6 ncarmich 2021-12-10 16:51:44 UTC
Hi Sunil,

I have verified it again and the alert disappeared from Prometheus, but *not* from Alertmanager. I attached two screenshoots, one is the alert in Alermanager. The second one is the query in Prometheus used to trigger this alert. Please let me know if you need more info.

---

attached screenshots: Alertmanager1.PNG & Prometheus1.PNG

Comment 7 ncarmich 2021-12-10 16:53:04 UTC
Hi Sunil,

I have verified it again and the alert disappeared from Prometheus, but *not* from Alertmanager. I attached two screenshots, one is the alert in Alermanager. The second one is the query in Prometheus used to trigger this alert. Please let me know if you need more info.

---

attached screenshots: Alertmanager1.PNG & Prometheus1.PNG

Comment 12 Jan Fajerski 2021-12-22 13:13:37 UTC
Closing this as a duplicate, as mentioned above. Please feel free to reopen either the workaround in the original bug doesn't work or anyone disagrees on the duplicate status.

*** This bug has been marked as a duplicate of bug 1943860 ***

Comment 15 ncarmich 2022-01-04 20:23:25 UTC
Hi again Team - just to close the loop I wanted to post the latest update from the customer regarding the workaround: (https://access.redhat.com/solutions/6604421) -

"I confirm the workaround is working, I have tested it on another cluster where we had similar problem." so I will go ahead and mark the solution above as verified.

Thanks again for all your help!


Note You need to log in before you can comment on or make changes to this bug.