Bug 1854434

Summary:

Traffic spikes every 10 minutes to master API

Product:

OpenShift Container Platform

Reporter:

Dirk Porter <dporter>

Component:

kube-controller-manager

Assignee:

Mike Dame <mdame>

Status:

CLOSED ERRATA

QA Contact:

RamaKasturi <knarra>

Severity:

high

Docs Contact:

Priority:

urgent

Version:

3.11.0

CC:

antgarci, aos-bugs, asoni, bbennett, bshirren, dmellado, gmarkley, jolee, knarra, maszulik, mdame, mfojtik, nchavan, nelluri, openshift-bugs-escalate, rcarrier, rkant, rvanderp, scuppett, suchaudh, svaughn, tsmetana, vlaad

Target Milestone:

---

Target Release:

4.6.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Cause: Various inconsistencies in storage and comparison of endpoints led to the endpoints controller incorrectly assuming there were many more updates that needed to be made to the cluster than there actually were. Consequence: Endpoints controller would send large amounts of API requests at every informer re-sync interval, causing degradation in large clusters with many endpoints. Fix: Update storage and comparison functions for endpoints to be more consistent. Result: Endpoints controller now only sends updates when necessary.

Story Points:

---

Clone Of:

Clones:

1870540 1882030 (view as bug list)

Environment:

Last Closed:

2020-10-27 16:12:46 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1870540, 1882030

Attachments:

Description	Flags
Requested increase log levels	none

Description Dirk Porter 2020-07-07 13:40:58 UTC

Description of problem:
Customer is reporting that their f5 vip is getting ~50G more traffic during a spike that occurs every 10 minutes. Looking at the pcaps, it appears all the traffic is, obviously, going to the api. 

Looking at the amount of nodes on the cluster, it appears there are over 800. 

A early observation was the output on the sosreport of the sdn. However, the sdn pods looked fine when I looked at them: 

"root      16910  0.0  0.0  13272  3044 ?        Ss   Jun17   1:45 /bin/bash -c #!/bin/bash set -euo pipefail  # cleanup old log files rm -f /var/log/openvswitch-old/ovsdb-server.log /var/log/openvswitch-old/ovs-vswitchd.log  mkdir -p /var/log/openvswitch  # if another process is listening on the cni-server socket, wait until it exits trap 'kill $(jobs -p); exit 0' TERM retries=0 while true; do   if 
 /usr/share/openvswitch/scripts/ovs-ctl status &>/dev/null; then     echo "warning: Another process is currently managing OVS, waiting 15s ..." 2>&1     sleep 15 & wait     (( retries += 1 ))   else     break   fi   if [[ "${retries}" -gt 40 ]]; then     echo "error: Another process is currently managing OVS, exiting" 2>&1     exit 1   fi done  # launch OVS function quit {     /usr/share/openvswitch/scripts/
 ovs-ctl stop     exit 0 } trap quit SIGTERM /usr/share/openvswitch/scripts/ovs-ctl start --no-ovs-vswitchd --system-id=random  # Restrict the number of pthreads ovs-vswitchd creates to reduce the # amount of RSS it uses on hosts with many cores # https://bugzilla.redhat.com/show_bug.cgi?id=1571379 # https://bugzilla.redhat.com/show_bug.cgi?id=1572797 if [[ `nproc` -gt 12 ]]; then     ovs-vsctl --no-wait 
 set Open_vSwitch . other_config:n-revalidator-threads=4     ovs-vsctl --no-wait set Open_vSwitch . other_config:n-handler-threads=10 fi /usr/share/openvswitch/scripts/ovs-ctl start --no-ovsdb-server --system-id=random  tail --follow=name /var/log/openvswitch/ovs-vswitchd.log /var/log/openvswitch/ovsdb-server.log & sleep 20 while true; do   if ! /usr/share/openvswitch/scripts/ovs-ctl status &>/dev/null; 
then     echo "OVS seems to have crashed, exiting"     quit   fi   sleep 15 done"

I have been trying to figure out what could be causing a spike every 10 minutes within a cluster, and cannot think of a process that reports every 10 minutes that would have this sort of impact. 

Looking for assistance on what to check next. I think this might be the consequence for the amount of nodes. What could be sending traffic to the API? 

Version-Release number of selected component (if applicable):
3.11

How reproducible:
I have not seen anything that would cause this in my looking 

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Stephen Cuppett 2020-07-07 14:38:53 UTC

*** Bug 1854435 has been marked as a duplicate of this bug. ***

Comment 34 rvanderp 2020-07-21 17:41:41 UTC

Previously mentioned data has been uploaded.

Comment 35 Ben Bennett 2020-07-24 15:24:36 UTC

Based on comment 6, the traffic is coming from the system:serviceaccount:kube-system:endpoint-controller which sounds like the kube controller manager... reassigning.

Comment 40 Dirk Porter 2020-08-14 20:06:07 UTC

Created attachment 1711463 [details]
Requested increase log levels

Comment 51 Mike Dame 2020-08-19 22:00:59 UTC

Upon thorough inspection of the code and debugging trials we found behavior similar to what was described still exists in the current code (4.6/1.19). It's obviously surprising that such an issue has gone undetected for so long, but we isolated the behavior to a key section of code that checks for changes in Services and Endpoints in the cluster. This code could result in a false-negative due to comparison of sorted vs unsorted lists.

I have opened a 3.11 PR to fix the bug at https://github.com/openshift/origin/pull/25424

I have also opened an upstream PR for the latest Kubernetes at https://github.com/kubernetes/kubernetes/pull/94112. The community maintainers will also provide review for these changes, and I expect that the change will be backported through supported versions.

Comment 53 Mike Dame 2020-08-20 13:51:50 UTC

The 3.11 PR has moved to https://github.com/openshift/origin/pull/25426 which includes more code changes that help with this

Comment 55 Tomas Smetana 2020-08-28 13:17:10 UTC

Update about the current progress: The proposed patch has been posted for an upstream review and it turned out the problem lies actually in the way kubernetes is hashing the endpoint structures: https://github.com/kubernetes/kubernetes/pull/94112#discussion_r478397569 and the proposed fix only mitigated the real problem. We need to fix the root issue first.

Comment 56 Maciej Szulik 2020-08-28 13:57:59 UTC

I'm in constant contact with all the people responsible for the endpoints controller, I provided them with possible solutions to the problem discovered by Jordan yesterday. As soon as we get final agreement on the approach I'll be backporting this to all necessary versions of OpenShift.

Comment 57 Tony Garcia 2020-09-04 18:41:19 UTC

Hello Tomas and Maciej,

Is there any type of workaround or temporary fix we can provide to a customer? My customer is experiencing major critical production issues across their environments. Their case was opened on 6/26 and they have been dealing with constant application downtime across their entire network.

Please let me know if there is anything we can do to help them work around this issue until a proper errata is released.

Regards,
Tony Garcia
Red Hat OpenShift TAM

Comment 58 Mike Dame 2020-09-04 19:11:16 UTC

Both the Upstream PR (https://github.com/kubernetes/kubernetes/pull/94112) and the 3.11 backport (https://github.com/openshift/origin/pull/25426) have been approved and are in the merge queue. These should merge later today if CI passes.

Comment 59 Mike Dame 2020-09-10 16:10:04 UTC

I'm adding UpcomingSprint, because the work for this bug is done and waiting to merge

Comment 61 RamaKasturi 2020-09-16 06:07:38 UTC

Moving the bug to verified state as i see that updates are less after the bug fix has happened. Below are the steps followed to verify the bug.
[ramakasturinarra@dhcp35-60 ~]$ oc version
Client Version: 4.6.0-202009150116.p0-3db46dc
Server Version: 4.6.0-0.nightly-2020-09-14-221526
Kubernetes Version: v1.19.0+35ab7c5


Before Fix:
===================
1) Install 4.5 cluster with nightly
2) set kubecontrollermanager logLevel to Trace so that it runs at -v=6
3) Now run the command below to see the frequency of updates happening i.e for every 10 mins.

I0916 05:28:23.909056       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/node-exporter, ready: 6 not ready: 0
I0916 05:28:23.909662       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/alertmanager-main, ready: 6 not ready: 0
I0916 05:28:23.914992       1 endpoints_controller.go:510] Update endpoints for openshift-machine-api/cluster-autoscaler-operator, ready: 2 not ready: 0
I0916 05:28:23.913246       1 endpoints_controller.go:510] Update endpoints for openshift-marketplace/marketplace-operator-metrics, ready: 2 not ready: 0
I0916 05:28:23.919947       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/prometheus-operator, ready: 1 not ready: 0
I0916 05:28:23.923106       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/openshift-state-metrics, ready: 2 not ready: 0
I0916 05:28:23.924974       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/prometheus-operated, ready: 4 not ready: 0
I0916 05:28:23.926701       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/alertmanager-operated, ready: 9 not ready: 0
I0916 05:28:23.930065       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/cluster-monitoring-operator, ready: 1 not ready: 0
I0916 05:28:23.930581       1 endpoints_controller.go:510] Update endpoints for openshift-image-registry/image-registry-operator, ready: 1 not ready: 0
I0916 05:28:23.934368       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/kube-state-metrics, ready: 2 not ready: 0
I0916 05:28:23.936261       1 endpoints_controller.go:510] Update endpoints for openshift-dns/dns-default, ready: 18 not ready: 0
I0916 05:28:23.939285       1 endpoints_controller.go:510] Update endpoints for openshift-sdn/sdn, ready: 6 not ready: 0
I0916 05:28:23.943335       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/telemeter-client, ready: 1 not ready: 0
I0916 05:28:23.945914       1 endpoints_controller.go:510] Update endpoints for openshift-cluster-machine-approver/machine-approver, ready: 1 not ready: 0
I0916 05:28:23.947878       1 endpoints_controller.go:510] Update endpoints for openshift-multus/multus-admission-controller, ready: 6 not ready: 0
I0916 05:28:23.951329       1 endpoints_controller.go:510] Update endpoints for openshift-ingress/router-internal-default, ready: 6 not ready: 0
I0916 05:28:23.951379       1 endpoints_controller.go:510] Update endpoints for openshift-cluster-samples-operator/metrics, ready: 1 not ready: 0
I0916 05:38:23.915181       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/kube-state-metrics, ready: 2 not ready: 0
I0916 05:38:23.916672       1 endpoints_controller.go:510] Update endpoints for openshift-dns/dns-default, ready: 18 not ready: 0
I0916 05:38:23.917024       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/telemeter-client, ready: 1 not ready: 0
I0916 05:38:23.918275       1 endpoints_controller.go:510] Update endpoints for openshift-sdn/sdn, ready: 6 not ready: 0
I0916 05:38:23.928198       1 endpoints_controller.go:510] Update endpoints for openshift-cluster-machine-approver/machine-approver, ready: 1 not ready: 0
I0916 05:38:23.932283       1 endpoints_controller.go:510] Update endpoints for openshift-multus/multus-admission-controller, ready: 6 not ready: 0
I0916 05:38:23.932546       1 endpoints_controller.go:510] Update endpoints for openshift-ingress/router-internal-default, ready: 6 not ready: 0
I0916 05:38:23.932961       1 endpoints_controller.go:510] Update endpoints for openshift-cluster-samples-operator/metrics, ready: 1 not ready: 0
I0916 05:38:23.934526       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/node-exporter, ready: 6 not ready: 0
I0916 05:38:23.938472       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/alertmanager-main, ready: 6 not ready: 0
I0916 05:38:23.943770       1 endpoints_controller.go:510] Update endpoints for openshift-marketplace/marketplace-operator-metrics, ready: 2 not ready: 0
I0916 05:38:23.946585       1 endpoints_controller.go:510] Update endpoints for openshift-machine-api/cluster-autoscaler-operator, ready: 2 not ready: 0
I0916 05:38:23.947588       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/openshift-state-metrics, ready: 2 not ready: 0
I0916 05:38:23.948295       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/prometheus-operated, ready: 4 not ready: 0
I0916 05:38:23.950412       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/alertmanager-operated, ready: 9 not ready: 0
I0916 05:38:23.953437       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/cluster-monitoring-operator, ready: 1 not ready: 0
I0916 05:38:23.956470       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/prometheus-operator, ready: 1 not ready: 0
I0916 05:38:23.960298       1 endpoints_controller.go:510] Update endpoints for openshift-image-registry/image-registry-operator, ready: 1 not ready: 0
I0916 05:48:23.922660       1 endpoints_controller.go:510] Update endpoints for openshift-sdn/sdn, ready: 6 not ready: 0
I0916 05:48:23.923202       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/telemeter-client, ready: 1 not ready: 0
I0916 05:48:23.924504       1 endpoints_controller.go:510] Update endpoints for openshift-cluster-machine-approver/machine-approver, ready: 1 not ready: 0
I0916 05:48:23.935735       1 endpoints_controller.go:510] Update endpoints for openshift-multus/multus-admission-controller, ready: 6 not ready: 0
I0916 05:48:23.935916       1 endpoints_controller.go:510] Update endpoints for openshift-ingress/router-internal-default, ready: 6 not ready: 0
I0916 05:48:23.935922       1 endpoints_controller.go:510] Update endpoints for openshift-cluster-samples-operator/metrics, ready: 1 not ready: 0
I0916 05:48:23.940551       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/node-exporter, ready: 6 not ready: 0
I0916 05:48:23.940787       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/alertmanager-main, ready: 6 not ready: 0
I0916 05:48:23.945877       1 endpoints_controller.go:510] Update endpoints for openshift-marketplace/marketplace-operator-metrics, ready: 2 not ready: 0
I0916 05:48:23.946141       1 endpoints_controller.go:510] Update endpoints for openshift-machine-api/cluster-autoscaler-operator, ready: 2 not ready: 0
I0916 05:48:23.946793       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/prometheus-operator, ready: 1 not ready: 0
I0916 05:48:23.951126       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/prometheus-operated, ready: 4 not ready: 0
I0916 05:48:23.951178       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/openshift-state-metrics, ready: 2 not ready: 0
I0916 05:48:23.954907       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/alertmanager-operated, ready: 9 not ready: 0
I0916 05:48:23.955575       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/cluster-monitoring-operator, ready: 1 not ready: 0
I0916 05:48:23.959774       1 endpoints_controller.go:510] Update endpoints for openshift-image-registry/image-registry-operator, ready: 1 not ready: 0
I0916 05:48:23.962264       1 endpoints_controller.go:510] Update endpoints for openshift-monitoring/kube-state-metrics, ready: 2 not ready: 0
I0916 05:48:23.962831       1 endpoints_controller.go:510] Update endpoints for openshift-dns/dns-default, ready: 18 not ready: 0

[ramakasturinarra@dhcp35-60 ~]$ oc get services --all-namespaces | wc -l
67

[ramakasturinarra@dhcp35-60 ~]$ oc logs pod/kube-controller-manager-ip-10-0-204-252.ap-northeast-1.compute.internal -n openshift-kube-controller-manager -c kube-controller-manager | grep endpoints_controller | grep -c "About to update endpoints"
244

[ramakasturinarra@dhcp35-60 ~]$ oc logs pod/kube-controller-manager-ip-10-0-204-252.ap-northeast-1.compute.internal -n openshift-kube-controller-manager -c kube-controller-manager | grep endpoints_controller | grep -c "Update endpoints for"
72

After the Fix:
===================
1) Install latest 4.6 cluster which has the fix
2) set kube-controller-manager loglevel to Trace so that it runs at -v=6
3) you can see that there are no updates in about 10 minutes.

[ramakasturinarra@dhcp35-60 ~]$ oc get services --all-namespaces | wc -l
76
[ramakasturinarra@dhcp35-60 ~]$ oc logs pod/kube-controller-manager-ip-10-0-211-68.us-east-2.compute.internal -n openshift-kube-controller-manager -c kube-controller-manager | grep endpoints_controller | grep -c "About to update endpoints"
213
[ramakasturinarra@dhcp35-60 ~]$ oc logs pod/kube-controller-manager-ip-10-0-211-68.us-east-2.compute.internal -n openshift-kube-controller-manager -c kube-controller-manager | grep endpoints_controller | grep -c "Update endpoints for"
0

 spec:
    logLevel: Trace
    
Based on the above moving bug to verified state.

Comment 74 errata-xmlrpc 2020-10-27 16:12:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196