Now that the changes to reduce unnecessary iptables syncs have landed, we're now firing alerts unnecessarily. This is because we no longer sync every 30 seconds, even when not needed. Fix those alerts, and figure out if we can write good alerts.
Filed https://github.com/kubernetes/kubernetes/pull/90175 to get the metrics we need in to kube-proxy.
*** Bug 1832272 has been marked as a duplicate of this bug. ***
*** Bug 1830098 has been marked as a duplicate of this bug. ***
Next step: pr https://github.com/openshift/sdn/pull/138 to pull the upstream change to sdn.
Verified this bug on 4.5.0-0.nightly-2020-05-19-041951 alert: ClusterProxyApplySlow expr: histogram_quantile(0.95, sum by(le) (rate(kubeproxy_sync_proxy_rules_duration_seconds_bucket[5m]))) > 10 labels: severity: warning annotations: message: The cluster is taking too long, on average, to apply kubernetes service rules to iptables. OK 8.71s ago 1.083ms alert: NodeProxyApplyStale expr: (kubeproxy_sync_proxy_rules_last_queued_timestamp_seconds - kubeproxy_sync_proxy_rules_last_timestamp_seconds) * on(namespace, pod) group_right() topk by(namespace, pod) (1, kube_pod_info{namespace="openshift-sdn",pod=~"sdn-[^-]*"}) > 30 for: 5m labels: severity: warning annotations: message: SDN pod {{ $labels.pod }} on node {{ $labels.node }} has stale kubernetes service rules in iptables. OK 8.709s ago 424.1us alert: SDNPodNotReady expr: kube_pod_status_ready{condition="true",namespace="openshift-sdn"} == 0 for: 10m labels: severity: warning annotations: message: SDN pod {{ $labels.pod }} on node {{ $labels.node }} is not ready.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409