Bug 2094068

Summary: No runbook created for NorthboundStale alert
Product: OpenShift Container Platform Reporter: Weibin Liang <weliang>
Component: NetworkingAssignee: Martin Kennelly <mkennell>
Networking sub component: ovn-kubernetes QA Contact: Weibin Liang <weliang>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: ffernand
Version: 4.11   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:49:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Weibin Liang 2022-06-06 18:05:09 UTC
Description of problem:
https://issues.redhat.com/browse/SDN-2736: "Create runbook and link SOP for NorthboundStale alert"

There is no runbook created for NorthboundStale alert in latest v4.11 build.

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-06-04-014713

How reproducible:
Always

Steps to Reproduce:
No runbook created for NorthboundStale alert
[weliang@weliang ~]$ oc -n openshift-ovn-kubernetes get PrometheusRule master-rules -o json
                    {
                        "alert": "NorthboundStale",
                        "annotations": {
                            "description": "Networking control plane is degraded. Networking configuration updates applied to the cluster will not be\nimplemented. Existing workloads should continue to have connectivity. OVN-Kubernetes control plane and/or\nOVN northbound database may not be functional.\n",
                            "summary": "ovn-kubernetes has not written anything to the northbound database for too long."
                        },
                        "expr": "time() - max(ovnkube_master_nb_e2e_timestamp) \u003e 120\n",
                        "for": "10m",
                        "labels": {
                            "severity": "warning"
                        }
                    },
[weliang@weliang ~]$ oc -n openshift-ovn-kubernetes get PrometheusRule master-rules -o jsonpath={.spec.groups[0].rules[3].annotations.runbook_url}

#### Compare NoOvnMasterLeader alert
[weliang@weliang ~]$ oc -n openshift-ovn-kubernetes get PrometheusRule master-rules -o jsonpath={.spec.groups[0].rules[2].annotations.runbook_url}
https://github.com/openshift/runbooks/blob/master/alerts/cluster-network-operator/NoOvnMasterLeader.md

[weliang@weliang ~]$ oc -n openshift-ovn-kubernetes get PrometheusRule master-rules -o json
                    {
                        "alert": "NoOvnMasterLeader",
                        "annotations": {
                            "description": "Networking control plane is degraded. Networking configuration updates applied to the cluster will not be\nimplemented while there is no OVN Kubernetes leader. Existing workloads should continue to have connectivity.\nOVN-Kubernetes control plane is not functional.\n",
                            "runbook_url": "https://github.com/openshift/runbooks/blob/master/alerts/cluster-network-operator/NoOvnMasterLeader.md",
                            "summary": "There is no ovn-kubernetes master leader."
                        },
                        "expr": "max(ovnkube_master_leader) == 0\n",
                        "for": "10m",
                        "labels": {
                            "severity": "critical"
                        }
                    },

Actual results:
[weliang@weliang ~]$ oc -n openshift-ovn-kubernetes get PrometheusRule master-rules -o jsonpath={.spec.groups[0].rules[3].annotations.runbook_url}

Nothing return

Expected results:
[weliang@weliang ~]$ oc -n openshift-ovn-kubernetes get PrometheusRule master-rules -o jsonpath={.spec.groups[0].rules[3].annotations.runbook_url}
https://github.com/openshift/runbooks/blob/master/alerts/cluster-network-operator/NorthboundStale.md

Additional info:

Comment 2 Weibin Liang 2022-08-31 13:51:43 UTC
Test passed in 4.12.0-0.nightly-2022-08-31-064023

Comment 5 errata-xmlrpc 2023-01-17 19:49:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399