Bug 1870354

Summary: An alert is not triggered when dns health checks fail to an upstream
Product: OpenShift Container Platform Reporter: Daneyon Hansen <dhansen>
Component: NetworkingAssignee: Daneyon Hansen <dhansen>
Networking sub component: DNS QA Contact: jechen <jechen>
Status: CLOSED DEFERRED Docs Contact:
Severity: low    
Priority: low CC: aos-bugs, hongli
Version: 4.6   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-11 16:33:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daneyon Hansen 2020-08-19 20:34:59 UTC
Description of problem:
This BZ comes from https://bugzilla.redhat.com/show_bug.cgi?id=1860142, where the `coredns_forward_healthcheck_broken_count_total` metric was added. Alerts should be created when health check metrics fail above a specified threshold. 

Version-Release number of selected component (if applicable):
4.6

How reproducible:


Steps to Reproduce:
1. Create a cluster
2. Configure DNS forwarding.
3. Cause the upstream resolver to fail and observe the coredns_forward_healthcheck_broken_total and coredns_forward_healthcheck_failures_count_total metric values increase.

Actual results:
An alert is triggered when the health checks fail at a certain threshold. 

Expected results:
No alert.

Additional info:
PR under review: https://github.com/povilasv/coredns-mixin/pull/6

Comment 3 Daneyon Hansen 2020-09-09 15:55:31 UTC
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 4 Daneyon Hansen 2020-10-01 16:24:11 UTC
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 5 Daneyon Hansen 2020-10-23 15:53:05 UTC
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 6 Daneyon Hansen 2020-11-12 16:53:28 UTC
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 7 Daneyon Hansen 2020-12-07 17:49:19 UTC
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 8 Daneyon Hansen 2021-02-25 23:12:06 UTC
I’m adding UpcomingSprint because https://github.com/povilasv/coredns-mixin/pull/6 is still waiting to merge.

Comment 9 Daneyon Hansen 2021-06-11 16:33:36 UTC
Closing since https://github.com/povilasv/coredns-mixin/ appears to no longer be maintained.