Description of problem: [SF case #: 03094729] We see inconsistent information between Prometheus and object state checked using oc command. Prometheus has information about pods which were removed, as a result it monitors two additional targets. This triggers false positive "TargetDown" alerts. I attached screenshots from Prometheus UI. ```sh sd-df2a-ef7d: ~ namicg39021p/openshift-monitoring $ oc get po -l name=prometheus-adapter -n openshift-monitoring NAME READY STATUS RESTARTS AGE prometheus-adapter-6f74bb68fd-c9rvr 1/1 Running 0 5d14h prometheus-adapter-6f74bb68fd-kmg8l 1/1 Running 0 5d14h sd-df2a-ef7d: ~ namicg39021p/openshift-monitoring $ oc get po prometheus-adapter-6b765fc44b-kbhxm -n openshift-monitoring Error from server (NotFound): pods "prometheus-adapter-6b765fc44b-kbhxm" not found sd-df2a-ef7d: ~ namicg39021p/openshift-monitoring $ oc get po prometheus-adapter-6b765fc44b-h7nx5 -n openshift-monitoring Error from server (NotFound): pods "prometheus-adapter-6b765fc44b-h7nx5" not found ``` Where are you experiencing the behavior? What environment? prod When does the behavior occur? Frequency? Repeatedly? At certain times? Random What is the business impact? Please also provide timeframe information. false positive alerts are generated, it creates noise Additional info: must-gather is located here: [SF case #: 03094729 - comment#-3]
Hi Sunil, I have verified it again and the alert disappeared from Prometheus, but *not* from Alertmanager. I attached two screenshoots, one is the alert in Alermanager. The second one is the query in Prometheus used to trigger this alert. Please let me know if you need more info. --- attached screenshots: Alertmanager1.PNG & Prometheus1.PNG
Hi Sunil, I have verified it again and the alert disappeared from Prometheus, but *not* from Alertmanager. I attached two screenshots, one is the alert in Alermanager. The second one is the query in Prometheus used to trigger this alert. Please let me know if you need more info. --- attached screenshots: Alertmanager1.PNG & Prometheus1.PNG
Closing this as a duplicate, as mentioned above. Please feel free to reopen either the workaround in the original bug doesn't work or anyone disagrees on the duplicate status. *** This bug has been marked as a duplicate of bug 1943860 ***
Hi again Team - just to close the loop I wanted to post the latest update from the customer regarding the workaround: (https://access.redhat.com/solutions/6604421) - "I confirm the workaround is working, I have tested it on another cluster where we had similar problem." so I will go ahead and mark the solution above as verified. Thanks again for all your help!