Description of problem: kuryr-cni pods tend to use too much CPU on the nodes: 17 root 20 0 1347416 77012 7232 S 40.6 0.5 2082:26 kuryr-daemon: master process [/usr/local/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 42 root 20 0 543320 76612 8352 S 38.9 0.5 2060:28 kuryr-daemon: Prometheus Exporter worker(0) 37 root 20 0 777176 117460 9872 S 0.3 0.7 12:19.91 kuryr-daemon: health worker(0) 1 root 20 0 387148 83720 16348 S 0.0 0.5 0:22.80 kuryr-daemon: master process [/usr/local/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 27 root 20 0 475564 78412 8584 S 0.0 0.5 27:15.51 kuryr-daemon: watcher worker(0) 33 root 20 0 395352 73372 5804 S 0.0 0.4 1:03.99 kuryr-daemon: server worker(0) 197502 root 20 0 12024 3348 2804 S 0.0 0.0 0:00.03 bash 197587 root 20 0 51216 4032 3372 R 0.0 0.0 0:00.14 top -c Version-Release number of selected component (if applicable): How reproducible: Always? Steps to Reproduce: 1. Run installation. 2. Run `top` inside kuryr-cni pods. Actual results: Up to 40% usage in some spikes. Expected results: On a idle cluster usage should be negligible. Additional info:
Verified on 4.7.0-0.nightly-2021-01-27-055755 on OSP16.1 (RHOS-16.1-RHEL-8-20201214.n.3) with OVN-Octavia. UPI installation. Without the fix (4.7.0-0.nightly-2021-01-26-082025),Prometheus Exporter worker is consuming >30% of CPU on kuryr-cni for master-0 and worker-0: $ oc get pods -n openshift-kuryr -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kuryr-cni-4x7xm 1/1 Running 0 18h 10.196.2.151 ostest-kvc8v-worker-2 <none> <none> kuryr-cni-65rvl 1/1 Running 0 18h 10.196.1.228 ostest-kvc8v-worker-0 <none> <none> kuryr-cni-9twv5 1/1 Running 0 18h 10.196.2.213 ostest-kvc8v-worker-1 <none> <none> kuryr-cni-jzgnv 1/1 Running 0 18h 10.196.3.99 ostest-kvc8v-master-2 <none> <none> kuryr-cni-mlrd6 1/1 Running 0 18h 10.196.0.99 ostest-kvc8v-master-0 <none> <none> kuryr-cni-x878q 1/1 Running 0 18h 10.196.2.56 ostest-kvc8v-master-1 <none> <none> kuryr-controller-7cd7df9d65-wrh4v 1/1 Running 0 17h 10.196.0.99 ostest-kvc8v-master-0 <none> <none> $ oc rsh -n openshift-kuryr kuryr-cni-mlrd6 sh-4.4# top -n 1 -c -b top - 11:00:26 up 19:51, 0 users, load average: 3.77, 4.98, 4.87 Tasks: 8 total, 1 running, 7 sleeping, 0 stopped, 0 zombie %Cpu(s): 31.1 us, 14.8 sy, 0.0 ni, 45.9 id, 0.0 wa, 3.3 hi, 1.6 si, 3.3 st MiB Mem : 16024.2 total, 2207.8 free, 5860.3 used, 7956.0 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 9812.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 41 root 20 0 545144 74684 8352 S 33.3 0.5 397:01.57 kuryr-daemon: Prometheus Exporter worker(0) 17 root 20 0 611184 74180 7444 S 26.7 0.5 402:32.24 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 1 root 20 0 388972 79520 13864 S 0.0 0.5 0:06.52 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 27 root 20 0 476804 76376 9004 S 0.0 0.5 3:43.45 kuryr-daemon: watcher worker(0) 33 root 20 0 397432 71788 5912 S 0.0 0.4 0:10.50 kuryr-daemon: server worker(0) 36 root 20 0 700920 93228 10096 S 0.0 0.6 2:22.99 kuryr-daemon: health worker(0) 38621 root 20 0 12024 3248 2820 S 0.0 0.0 0:00.01 /bin/sh 38633 root 20 0 51056 3876 3324 R 0.0 0.0 0:00.00 top -n 1 -c -b sh-4.4# exit exit $ oc rsh -n openshift-kuryr kuryr-cni-65rvl sh-4.4# top -n 1 -c -b top - 11:00:48 up 19:34, 0 users, load average: 2.33, 1.74, 1.60 Tasks: 8 total, 1 running, 7 sleeping, 0 stopped, 0 zombie %Cpu(s): 21.1 us, 12.3 sy, 0.0 ni, 63.2 id, 0.0 wa, 1.8 hi, 1.8 si, 0.0 st MiB Mem : 16024.2 total, 8338.0 free, 3722.9 used, 3963.3 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 12596.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17 root 20 0 536940 73464 7468 S 60.0 0.4 584:58.76 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 42 root 20 0 545144 74776 8444 S 46.7 0.5 574:22.76 kuryr-daemon: Prometheus Exporter worker(0) 1 root 20 0 388972 79232 13576 S 0.0 0.5 0:07.23 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 27 root 20 0 476808 76016 8876 S 0.0 0.5 1:09.75 kuryr-daemon: watcher worker(0) 32 root 20 0 397432 71624 5748 S 0.0 0.4 0:14.16 kuryr-daemon: server worker(0) 37 root 20 0 700920 94764 10080 S 0.0 0.6 2:19.23 kuryr-daemon: health worker(0) 38871 root 20 0 12024 3180 2752 S 0.0 0.0 0:00.00 /bin/sh 38882 root 20 0 51056 3732 3176 R 0.0 0.0 0:00.00 top -n 1 -c -b With the fix, Prometheus Exporter worker is consuming ~0.0% CPU on kuryr-cni from master-0 and worker-0: $ oc get pods -n openshift-kuryr -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kuryr-cni-25vdv 1/1 Running 0 21m 10.196.2.172 ostest-6tvgk-master-0 <none> <none> kuryr-cni-4fpd4 1/1 Running 0 112s 10.196.2.243 ostest-6tvgk-worker-0 <none> <none> kuryr-cni-9njsw 1/1 Running 0 115s 10.196.3.116 ostest-6tvgk-worker-2 <none> <none> kuryr-cni-gk72k 1/1 Running 0 21m 10.196.2.133 ostest-6tvgk-master-1 <none> <none> kuryr-cni-h4g6x 1/1 Running 0 21m 10.196.1.14 ostest-6tvgk-master-2 <none> <none> kuryr-cni-rsfhm 1/1 Running 0 108s 10.196.2.240 ostest-6tvgk-worker-1 <none> <none> kuryr-controller-7b5ccb8857-t7lj8 1/1 Running 2 21m 10.196.2.172 ostest-6tvgk-master-0 <none> <none> $ oc rsh -n openshift-kuryr kuryr-cni-25vdv sh-4.4# top -n 1 -c -b top - 10:58:08 up 25 min, 0 users, load average: 7.01, 5.36, 4.60 Tasks: 12 total, 3 running, 7 sleeping, 0 stopped, 2 zombie %Cpu(s): 56.7 us, 22.4 sy, 0.0 ni, 11.9 id, 0.0 wa, 4.5 hi, 4.5 si, 0.0 st MiB Mem : 16024.2 total, 3364.7 free, 5875.0 used, 6784.5 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 9785.6 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1747 root 20 0 0 0 0 Z 23.5 0.0 0:00.53 [kuryr-daemon: s] <defunct> 1 root 20 0 389080 79488 13832 R 0.0 0.5 0:02.72 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 17 root 20 0 889828 74536 7572 S 0.0 0.5 0:01.83 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 27 root 20 0 476932 75868 8684 S 0.0 0.5 0:02.50 kuryr-daemon: watcher worker(0) 33 root 20 0 397540 71804 5904 S 0.0 0.4 0:00.61 kuryr-daemon: server worker(0) 37 root 20 0 763492 83092 10000 S 0.0 0.5 0:03.45 kuryr-daemon: health worker(0) 41 root 20 0 471396 71508 5432 S 0.0 0.4 0:00.64 kuryr-daemon: Prometheus Exporter worker(0) 1748 root 20 0 12024 3172 2740 S 0.0 0.0 0:00.02 /bin/sh 1756 root 20 0 398108 70956 4700 S 0.0 0.4 0:00.03 kuryr-daemon: server worker(0) 1769 root 20 0 0 0 0 R 0.0 0.0 0:00.03 [kuryr-daemon: s] 1775 root 20 0 0 0 0 Z 0.0 0.0 0:00.02 [kuryr-daemon: s] <defunct> 1781 root 20 0 51056 3708 3156 R 0.0 0.0 0:00.00 top -n 1 -c -b sh-4.4# exit exit $ oc rsh -n openshift-kuryr kuryr-cni-4fpd4 sh-4.4# top -n 1 -c -b top - 10:58:28 up 4 min, 0 users, load average: 3.37, 1.73, 0.71 Tasks: 18 total, 1 running, 9 sleeping, 0 stopped, 8 zombie %Cpu(s): 48.4 us, 26.6 sy, 0.0 ni, 18.8 id, 0.0 wa, 3.1 hi, 3.1 si, 0.0 st MiB Mem : 16024.2 total, 12376.1 free, 1186.7 used, 2461.4 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 14483.3 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 161 root 20 0 0 0 0 Z 6.2 0.0 0:00.36 [kuryr-daemon: s] <defunct> 1 root 20 0 389080 79392 13728 S 0.0 0.5 0:03.20 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 18 root 20 0 905452 73784 7652 S 0.0 0.4 0:00.16 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 28 root 20 0 476016 77008 8628 S 0.0 0.5 0:00.19 kuryr-daemon: watcher worker(0) 34 root 20 0 397540 71864 5964 S 0.0 0.4 0:00.11 kuryr-daemon: server worker(0) 38 root 20 0 696932 77108 9976 S 0.0 0.5 0:00.27 kuryr-daemon: health worker(0) 40 root 20 0 471264 73492 5524 S 0.0 0.4 0:00.07 kuryr-daemon: Prometheus Exporter worker(0) 157 root 20 0 397540 71032 5004 S 0.0 0.4 0:00.02 kuryr-daemon: server worker(0) 159 root 20 0 0 0 0 Z 0.0 0.0 0:00.37 [kuryr-daemon: s] <defunct> 169 root 20 0 397540 70724 4752 S 0.0 0.4 0:00.02 kuryr-daemon: server worker(0) 200 root 20 0 12024 3160 2736 S 0.0 0.0 0:00.02 /bin/sh 207 root 20 0 0 0 0 Z 0.0 0.0 0:00.02 [kuryr-daemon: s] <defunct> 219 root 20 0 0 0 0 Z 0.0 0.0 0:00.02 [kuryr-daemon: s] <defunct> 225 root 20 0 0 0 0 Z 0.0 0.0 0:00.02 [kuryr-daemon: s] <defunct> 250 root 20 0 0 0 0 Z 0.0 0.0 0:00.02 [kuryr-daemon: s] <defunct> 268 root 20 0 0 0 0 Z 0.0 0.0 0:00.02 [kuryr-daemon: s] <defunct> 274 root 20 0 0 0 0 Z 0.0 0.0 0:00.01 [kuryr-daemon: s] <defunct> 280 root 20 0 51056 3768 3216 R 0.0 0.0 0:00.00 top -n 1 -c -b Furthermore, prometheus reporting is working as expected generating its alerts: $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' "AlertmanagerReceiversNotConfigured" "Watchdog"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633