Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1982238

Summary:	TargetDown alert due to deleted pod never clears
Product:	OpenShift Container Platform	Reporter:	Kevin Chung <kechung>
Component:	Monitoring	Assignee:	Jayapriya Pai <janantha>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.6	CC:	alegrand, amuller, anpicker, aos-bugs, ccoleman, erooth, gparente, janantha, jfajersk, kakkoyun, kevin.chung, pgough, pkrupa, sthaha, wking
Target Milestone:	---	Flags:	janantha: needinfo-
Target Release:	4.9.0
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-30 19:22:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Kevin Chung 2021-07-14 13:39:56 UTC

Description of problem:

Our customer is observing that when a pod that is monitored by the TargetDown alert is deleted and recreated, the TargetDown alert triggers but does not clear in a reasonable amount of time.  They have observed this behavior with multiple targets (i.e. ElasticSearch, catalog-operator).


Version-Release number of selected component (if applicable):

OpenShift 4.6.26


How reproducible:

The customer is able to readily reproduce this issue in their environment, but I was unable to reproduce it separately


Steps to Reproduce:
1. Delete the catalog-operator pod from openshift-operator-lifecycle-manager namespace, and the catalog-operator pod should recreate automatically
2. Run this query in Prometheus to observe the target: up{service="catalog-operator-metrics"}
3. Optionally, observe the TargetDown alert is generated in the OpenShift web console dashboard
4. Observe that the alert does not clear in a reasonable amount of time

Actual results:

The customer observed the TargetDown alert triggered for several days after the ElasticSearch pods were deleted, recreated, and in Ready state within minutes


Expected results:

The TargetDown alert should clear quickly after a deleted pod is recreated

Additional info:

Comment 39 Jan Fajerski 2021-07-28 10:59:36 UTC

*** Bug 1943860 has been marked as a duplicate of this bug. ***