Disruption of ingress workloads is one of the most accurate measures of end-user impact from our upgrades. While the exact fraction of failing requests cannot be alerted in absolute terms, the rolling average before, during, and after an upgrade is a key indicator for success of an upgrade. Add a metric that reports the openshift and workload ingress request error fraction as measured at the frontends, a rate calculation for total requests (so we can assess total ingress request traffic), and expose by default the ingress bandwidth in/out and active connection metrics to telemetry. This data will allow us to better estimate the health of connected clusters and highlight high impact users.
Tested in "4.7.0-0.ci.test-2020-12-22-072011-ci-ln-wtikmit" release. With this payload, the newly added metrics appear to register the reading and remain functional. Reference in the attached Prometheus console image.
Created attachment 1741338 [details] reference for Ingress metrics introduced with the added PR
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633