Bug 1920481
Summary: | kuryr-cni pods using unreasonable amount of CPU | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Michał Dulko <mdulko> |
Component: | Networking | Assignee: | Michał Dulko <mdulko> |
Networking sub component: | kuryr | QA Contact: | GenadiC <gcheresh> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | rheinzma, rlobillo |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | All | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:56:23 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1920995 |
Description
Michał Dulko
2021-01-26 11:47:30 UTC
Verified on 4.7.0-0.nightly-2021-01-27-055755 on OSP16.1 (RHOS-16.1-RHEL-8-20201214.n.3) with OVN-Octavia. UPI installation. Without the fix (4.7.0-0.nightly-2021-01-26-082025),Prometheus Exporter worker is consuming >30% of CPU on kuryr-cni for master-0 and worker-0: $ oc get pods -n openshift-kuryr -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kuryr-cni-4x7xm 1/1 Running 0 18h 10.196.2.151 ostest-kvc8v-worker-2 <none> <none> kuryr-cni-65rvl 1/1 Running 0 18h 10.196.1.228 ostest-kvc8v-worker-0 <none> <none> kuryr-cni-9twv5 1/1 Running 0 18h 10.196.2.213 ostest-kvc8v-worker-1 <none> <none> kuryr-cni-jzgnv 1/1 Running 0 18h 10.196.3.99 ostest-kvc8v-master-2 <none> <none> kuryr-cni-mlrd6 1/1 Running 0 18h 10.196.0.99 ostest-kvc8v-master-0 <none> <none> kuryr-cni-x878q 1/1 Running 0 18h 10.196.2.56 ostest-kvc8v-master-1 <none> <none> kuryr-controller-7cd7df9d65-wrh4v 1/1 Running 0 17h 10.196.0.99 ostest-kvc8v-master-0 <none> <none> $ oc rsh -n openshift-kuryr kuryr-cni-mlrd6 sh-4.4# top -n 1 -c -b top - 11:00:26 up 19:51, 0 users, load average: 3.77, 4.98, 4.87 Tasks: 8 total, 1 running, 7 sleeping, 0 stopped, 0 zombie %Cpu(s): 31.1 us, 14.8 sy, 0.0 ni, 45.9 id, 0.0 wa, 3.3 hi, 1.6 si, 3.3 st MiB Mem : 16024.2 total, 2207.8 free, 5860.3 used, 7956.0 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 9812.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 41 root 20 0 545144 74684 8352 S 33.3 0.5 397:01.57 kuryr-daemon: Prometheus Exporter worker(0) 17 root 20 0 611184 74180 7444 S 26.7 0.5 402:32.24 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 1 root 20 0 388972 79520 13864 S 0.0 0.5 0:06.52 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 27 root 20 0 476804 76376 9004 S 0.0 0.5 3:43.45 kuryr-daemon: watcher worker(0) 33 root 20 0 397432 71788 5912 S 0.0 0.4 0:10.50 kuryr-daemon: server worker(0) 36 root 20 0 700920 93228 10096 S 0.0 0.6 2:22.99 kuryr-daemon: health worker(0) 38621 root 20 0 12024 3248 2820 S 0.0 0.0 0:00.01 /bin/sh 38633 root 20 0 51056 3876 3324 R 0.0 0.0 0:00.00 top -n 1 -c -b sh-4.4# exit exit $ oc rsh -n openshift-kuryr kuryr-cni-65rvl sh-4.4# top -n 1 -c -b top - 11:00:48 up 19:34, 0 users, load average: 2.33, 1.74, 1.60 Tasks: 8 total, 1 running, 7 sleeping, 0 stopped, 0 zombie %Cpu(s): 21.1 us, 12.3 sy, 0.0 ni, 63.2 id, 0.0 wa, 1.8 hi, 1.8 si, 0.0 st MiB Mem : 16024.2 total, 8338.0 free, 3722.9 used, 3963.3 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 12596.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17 root 20 0 536940 73464 7468 S 60.0 0.4 584:58.76 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 42 root 20 0 545144 74776 8444 S 46.7 0.5 574:22.76 kuryr-daemon: Prometheus Exporter worker(0) 1 root 20 0 388972 79232 13576 S 0.0 0.5 0:07.23 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 27 root 20 0 476808 76016 8876 S 0.0 0.5 1:09.75 kuryr-daemon: watcher worker(0) 32 root 20 0 397432 71624 5748 S 0.0 0.4 0:14.16 kuryr-daemon: server worker(0) 37 root 20 0 700920 94764 10080 S 0.0 0.6 2:19.23 kuryr-daemon: health worker(0) 38871 root 20 0 12024 3180 2752 S 0.0 0.0 0:00.00 /bin/sh 38882 root 20 0 51056 3732 3176 R 0.0 0.0 0:00.00 top -n 1 -c -b With the fix, Prometheus Exporter worker is consuming ~0.0% CPU on kuryr-cni from master-0 and worker-0: $ oc get pods -n openshift-kuryr -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kuryr-cni-25vdv 1/1 Running 0 21m 10.196.2.172 ostest-6tvgk-master-0 <none> <none> kuryr-cni-4fpd4 1/1 Running 0 112s 10.196.2.243 ostest-6tvgk-worker-0 <none> <none> kuryr-cni-9njsw 1/1 Running 0 115s 10.196.3.116 ostest-6tvgk-worker-2 <none> <none> kuryr-cni-gk72k 1/1 Running 0 21m 10.196.2.133 ostest-6tvgk-master-1 <none> <none> kuryr-cni-h4g6x 1/1 Running 0 21m 10.196.1.14 ostest-6tvgk-master-2 <none> <none> kuryr-cni-rsfhm 1/1 Running 0 108s 10.196.2.240 ostest-6tvgk-worker-1 <none> <none> kuryr-controller-7b5ccb8857-t7lj8 1/1 Running 2 21m 10.196.2.172 ostest-6tvgk-master-0 <none> <none> $ oc rsh -n openshift-kuryr kuryr-cni-25vdv sh-4.4# top -n 1 -c -b top - 10:58:08 up 25 min, 0 users, load average: 7.01, 5.36, 4.60 Tasks: 12 total, 3 running, 7 sleeping, 0 stopped, 2 zombie %Cpu(s): 56.7 us, 22.4 sy, 0.0 ni, 11.9 id, 0.0 wa, 4.5 hi, 4.5 si, 0.0 st MiB Mem : 16024.2 total, 3364.7 free, 5875.0 used, 6784.5 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 9785.6 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1747 root 20 0 0 0 0 Z 23.5 0.0 0:00.53 [kuryr-daemon: s] <defunct> 1 root 20 0 389080 79488 13832 R 0.0 0.5 0:02.72 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 17 root 20 0 889828 74536 7572 S 0.0 0.5 0:01.83 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 27 root 20 0 476932 75868 8684 S 0.0 0.5 0:02.50 kuryr-daemon: watcher worker(0) 33 root 20 0 397540 71804 5904 S 0.0 0.4 0:00.61 kuryr-daemon: server worker(0) 37 root 20 0 763492 83092 10000 S 0.0 0.5 0:03.45 kuryr-daemon: health worker(0) 41 root 20 0 471396 71508 5432 S 0.0 0.4 0:00.64 kuryr-daemon: Prometheus Exporter worker(0) 1748 root 20 0 12024 3172 2740 S 0.0 0.0 0:00.02 /bin/sh 1756 root 20 0 398108 70956 4700 S 0.0 0.4 0:00.03 kuryr-daemon: server worker(0) 1769 root 20 0 0 0 0 R 0.0 0.0 0:00.03 [kuryr-daemon: s] 1775 root 20 0 0 0 0 Z 0.0 0.0 0:00.02 [kuryr-daemon: s] <defunct> 1781 root 20 0 51056 3708 3156 R 0.0 0.0 0:00.00 top -n 1 -c -b sh-4.4# exit exit $ oc rsh -n openshift-kuryr kuryr-cni-4fpd4 sh-4.4# top -n 1 -c -b top - 10:58:28 up 4 min, 0 users, load average: 3.37, 1.73, 0.71 Tasks: 18 total, 1 running, 9 sleeping, 0 stopped, 8 zombie %Cpu(s): 48.4 us, 26.6 sy, 0.0 ni, 18.8 id, 0.0 wa, 3.1 hi, 3.1 si, 0.0 st MiB Mem : 16024.2 total, 12376.1 free, 1186.7 used, 2461.4 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 14483.3 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 161 root 20 0 0 0 0 Z 6.2 0.0 0:00.36 [kuryr-daemon: s] <defunct> 1 root 20 0 389080 79392 13728 S 0.0 0.5 0:03.20 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 18 root 20 0 905452 73784 7652 S 0.0 0.4 0:00.16 kuryr-daemon: master process [/usr/bin/kuryr-daemon --config-file /etc/kuryr/kuryr.conf] 28 root 20 0 476016 77008 8628 S 0.0 0.5 0:00.19 kuryr-daemon: watcher worker(0) 34 root 20 0 397540 71864 5964 S 0.0 0.4 0:00.11 kuryr-daemon: server worker(0) 38 root 20 0 696932 77108 9976 S 0.0 0.5 0:00.27 kuryr-daemon: health worker(0) 40 root 20 0 471264 73492 5524 S 0.0 0.4 0:00.07 kuryr-daemon: Prometheus Exporter worker(0) 157 root 20 0 397540 71032 5004 S 0.0 0.4 0:00.02 kuryr-daemon: server worker(0) 159 root 20 0 0 0 0 Z 0.0 0.0 0:00.37 [kuryr-daemon: s] <defunct> 169 root 20 0 397540 70724 4752 S 0.0 0.4 0:00.02 kuryr-daemon: server worker(0) 200 root 20 0 12024 3160 2736 S 0.0 0.0 0:00.02 /bin/sh 207 root 20 0 0 0 0 Z 0.0 0.0 0:00.02 [kuryr-daemon: s] <defunct> 219 root 20 0 0 0 0 Z 0.0 0.0 0:00.02 [kuryr-daemon: s] <defunct> 225 root 20 0 0 0 0 Z 0.0 0.0 0:00.02 [kuryr-daemon: s] <defunct> 250 root 20 0 0 0 0 Z 0.0 0.0 0:00.02 [kuryr-daemon: s] <defunct> 268 root 20 0 0 0 0 Z 0.0 0.0 0:00.02 [kuryr-daemon: s] <defunct> 274 root 20 0 0 0 0 Z 0.0 0.0 0:00.01 [kuryr-daemon: s] <defunct> 280 root 20 0 51056 3768 3216 R 0.0 0.0 0:00.00 top -n 1 -c -b Furthermore, prometheus reporting is working as expected generating its alerts: $ curl -sk -H "Authorization: Bearer $token" 'https://prometheus-k8s-openshift-monitoring.apps.ostest.shiftstack.com/api/v1/alerts' | jq '.data.alerts[] | select(.labels.alertname) | .labels.alertname' "AlertmanagerReceiversNotConfigured" "Watchdog" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |