Bug 1907475 - Unable to estimate the error rate of ingress across the connected fleet
Summary: Unable to estimate the error rate of ingress across the connected fleet
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: Clayton Coleman
QA Contact: Arvind iyengar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-14 15:15 UTC by Clayton Coleman
Modified: 2022-08-04 22:30 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:43:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
reference for Ingress metrics introduced with the added PR (720.78 KB, image/png)
2020-12-22 10:20 UTC, Arvind iyengar
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1019 0 None closed Bug 1907475: Add recording rules for ingress traffic and error rate 2021-01-19 04:24:28 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:43:59 UTC

Description Clayton Coleman 2020-12-14 15:15:18 UTC
Disruption of ingress workloads is one of the most accurate measures of end-user impact from our upgrades.  While the exact fraction of failing requests cannot be alerted in absolute terms, the rolling average before, during, and after an upgrade is a key indicator for success of an upgrade.

Add a metric that reports the openshift and workload ingress request error fraction as measured at the frontends, a rate calculation for total requests (so we can assess total ingress request traffic), and expose by default the ingress bandwidth in/out and active connection metrics to telemetry.  This data will allow us to better estimate the health of connected clusters and highlight high impact users.

Comment 1 Arvind iyengar 2020-12-22 10:19:28 UTC
Tested in "4.7.0-0.ci.test-2020-12-22-072011-ci-ln-wtikmit" release. With this payload, the newly added metrics appear to register the reading and remain functional. Reference in the attached Prometheus console image.

Comment 2 Arvind iyengar 2020-12-22 10:20:51 UTC
Created attachment 1741338 [details]
reference for Ingress metrics introduced with the added PR

Comment 6 errata-xmlrpc 2021-02-24 15:43:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.