Bug 1982238 - TargetDown alert due to deleted pod never clears
Summary: TargetDown alert due to deleted pod never clears
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Jayapriya Pai
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-14 13:39 UTC by Kevin Chung
Modified: 2024-10-01 18:58 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-30 19:22:09 UTC
Target Upstream Version:
Embargoed:
janantha: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RFE-1923 0 Medium Rejected Remove a deleted pod from list of scrapable endpoints upon deletion of the pod. 2021-07-14 13:43:19 UTC

Description Kevin Chung 2021-07-14 13:39:56 UTC
Description of problem:

Our customer is observing that when a pod that is monitored by the TargetDown alert is deleted and recreated, the TargetDown alert triggers but does not clear in a reasonable amount of time.  They have observed this behavior with multiple targets (i.e. ElasticSearch, catalog-operator).


Version-Release number of selected component (if applicable):

OpenShift 4.6.26


How reproducible:

The customer is able to readily reproduce this issue in their environment, but I was unable to reproduce it separately


Steps to Reproduce:
1. Delete the catalog-operator pod from openshift-operator-lifecycle-manager namespace, and the catalog-operator pod should recreate automatically
2. Run this query in Prometheus to observe the target: up{service="catalog-operator-metrics"}
3. Optionally, observe the TargetDown alert is generated in the OpenShift web console dashboard
4. Observe that the alert does not clear in a reasonable amount of time

Actual results:

The customer observed the TargetDown alert triggered for several days after the ElasticSearch pods were deleted, recreated, and in Ready state within minutes


Expected results:

The TargetDown alert should clear quickly after a deleted pod is recreated

Additional info:

Comment 39 Jan Fajerski 2021-07-28 10:59:36 UTC
*** Bug 1943860 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.