Bug 1907475

Summary: Unable to estimate the error rate of ingress across the connected fleet
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: NetworkingAssignee: Clayton Coleman <ccoleman>
Networking sub component: router QA Contact: Arvind iyengar <aiyengar>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aiyengar, aos-bugs, hongli, mfisher
Version: 4.7   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:43:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reference for Ingress metrics introduced with the added PR none

Description Clayton Coleman 2020-12-14 15:15:18 UTC
Disruption of ingress workloads is one of the most accurate measures of end-user impact from our upgrades.  While the exact fraction of failing requests cannot be alerted in absolute terms, the rolling average before, during, and after an upgrade is a key indicator for success of an upgrade.

Add a metric that reports the openshift and workload ingress request error fraction as measured at the frontends, a rate calculation for total requests (so we can assess total ingress request traffic), and expose by default the ingress bandwidth in/out and active connection metrics to telemetry.  This data will allow us to better estimate the health of connected clusters and highlight high impact users.

Comment 1 Arvind iyengar 2020-12-22 10:19:28 UTC
Tested in "4.7.0-0.ci.test-2020-12-22-072011-ci-ln-wtikmit" release. With this payload, the newly added metrics appear to register the reading and remain functional. Reference in the attached Prometheus console image.

Comment 2 Arvind iyengar 2020-12-22 10:20:51 UTC
Created attachment 1741338 [details]
reference for Ingress metrics introduced with the added PR

Comment 6 errata-xmlrpc 2021-02-24 15:43:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633