Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1982238

Summary: TargetDown alert due to deleted pod never clears
Product: OpenShift Container Platform Reporter: Kevin Chung <kechung>
Component: MonitoringAssignee: Jayapriya Pai <janantha>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.6CC: alegrand, amuller, anpicker, aos-bugs, ccoleman, erooth, gparente, janantha, jfajersk, kakkoyun, kevin.chung, pgough, pkrupa, sthaha, wking
Target Milestone: ---Flags: janantha: needinfo-
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-30 19:22:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kevin Chung 2021-07-14 13:39:56 UTC
Description of problem:

Our customer is observing that when a pod that is monitored by the TargetDown alert is deleted and recreated, the TargetDown alert triggers but does not clear in a reasonable amount of time.  They have observed this behavior with multiple targets (i.e. ElasticSearch, catalog-operator).


Version-Release number of selected component (if applicable):

OpenShift 4.6.26


How reproducible:

The customer is able to readily reproduce this issue in their environment, but I was unable to reproduce it separately


Steps to Reproduce:
1. Delete the catalog-operator pod from openshift-operator-lifecycle-manager namespace, and the catalog-operator pod should recreate automatically
2. Run this query in Prometheus to observe the target: up{service="catalog-operator-metrics"}
3. Optionally, observe the TargetDown alert is generated in the OpenShift web console dashboard
4. Observe that the alert does not clear in a reasonable amount of time

Actual results:

The customer observed the TargetDown alert triggered for several days after the ElasticSearch pods were deleted, recreated, and in Ready state within minutes


Expected results:

The TargetDown alert should clear quickly after a deleted pod is recreated

Additional info:

Comment 39 Jan Fajerski 2021-07-28 10:59:36 UTC
*** Bug 1943860 has been marked as a duplicate of this bug. ***