Description of problem: All operators based on library-go in OpenShift use the the "events.Recorder" wrapper that provide convenient way to send events to Kubernetes. This wrapper use default Kubernetes event recorder and broadcaster that allow queue events and by default correlate similar events into one event (so we don't create too many events by accident). However, library-go based operators send much more events than Kubernetes as events provide nice timeline to what is going on in the system. We need to tweak the correlator options, to allow higher QPS and BurstSize. Tweaking these settings cause more events to go through the event correlator/aggregator. Additionally, upstream correlate events based on "reason", we also need to correlate base on "message", as we might have events the same reason, but different message and we don't want to loose these events. Version-Release number of selected component (if applicable): 4.5 How reproducible: Make the operator produce a lot of events, the events should not be correlated. Additionally, this can be verified by looking at "4.4" events.json available in CI artifacts and comparing this to the 4.5 "events.json". The amount of events should be 30-40% bigger. Steps to Reproduce: 1. 2. 3. Actual results: Similar events are being correlated and lost. Only 30 events are allowed per minute, per component. Expected results: Similar events should not be correlated for operators based on Reason. More than 30 events should be allowed per minute, per component. Additional info:
library-go change: https://github.com/openshift/library-go/pull/777
Verified with build OCP 4.5.0-0.nightly-2020-05-14-231228, Force kube-apiserer pods Redeployment, $ oc patch kubeapiservers/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "just a forced test01" } ]' Wait for a while, after the related events fired, count the openshift-kube-apiserver events, $ oc get events -n openshift-kube-apiserver | awk '{ print $1}'| sort | uniq -c | sort -k 1 1 17m 1 2m 1 2m12s 1 2m16s 1 2m2s 1 3m54s 1 4m18s 1 4m28s 16 28m 1 91s 1 93s 1 LAST 2 119s 2 2m13s 23 75m 2 4m27s 26 23m 26 80m 2 81m 28 26m 28 78m 2 94s 29 57m 3 112s 31 60m 3 22m 3 25m 3 3m11s 3 3m55s 3 43s 3 4m20s 3 54m 3 56m 3 59m 3 72m 3 79m 40 29m 4 21m 4 24m 4 27m 4 76m 48 62m 7 61m 7 74m 7 77m 8 4m17s 8 4m21s 9 109s 9 113s From above output, we can see 30 events at 60mins, 40 events at 29 mins, 48 events at 62 mins. So there are more than 30 events should be allowed per minute, per component kube-apiserver. The results was as expected. Move to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409