Description of problem: pacemaker_remoted spams close() inside of openstack containers due to large file descriptor table size. In this bit from action_launch_child in services_linux.c: https://github.com/ClusterLabs/pacemaker/blob/81d51216d218e39051ba32a8d2f46a91e450217b/lib/services/services_linux.c#L478-L481 getdtablesize() is 2^20, or 1048576, meaning every time a child gets forked off we call close repeatedly on a million non-existent file descriptors. When doing some profiling with perf while running openstack tempest tests, I see that pacemaker_remoted is using about the same amount of cpu time as all of the rest of the control plane services combined, after taking into account the time wasted constantly jumping back and forth across the syscall boundary. I have a patch that I will post upstream shortly and will reference this BZ#. Version-Release number of selected component (if applicable): pacemaker-remote-2.0.1-4.el8_0.4.x86_64 How reproducible: Always Steps to Reproduce: With OSP15 deployment: [root@controller-0 ~]# ps -f 8557 UID PID PPID C STIME TTY STAT TIME CMD root 8557 8114 0 19:32 ? Ss 0:03 /usr/sbin/pacemaker_remoted [root@controller-0 ~]# perf stat -e 'syscalls:sys_enter_close' -p 8557 -- sleep 37 Performance counter stats for process id '8557': 3,147,367 syscalls:sys_enter_close 37.009166783 seconds time elapsed [root@controller-0 ~]# Expected results: Significantly fewer calls to close()
https://github.com/ClusterLabs/pacemaker/pull/1912
After the patch, calls to close() reduce from 3,147,367 above to 1,878 below: [root@controller-0 ~]# ps -f 8939 UID PID PPID C STIME TTY STAT TIME CMD root 8939 8623 0 21:39 ? Ss 0:01 /usr/sbin/pacemaker_remoted [root@controller-0 ~]# perf stat -e 'syscalls:sys_enter_close' -p 8939 -- sleep 37 Performance counter stats for process id '8939': 1,878 syscalls:sys_enter_close 37.007315989 seconds time elapsed [root@controller-0 ~]#
Fixed upstream as of commit 5a73027
Verified , root 46431 1 0 Nov04 ? 00:00:00 /usr/libexec/podman/conmon -s -c f2b67e100201c2580300befece3737f8 d3ce11732ac371712f1545015e339974 -u f2b67e100201c2580300befece3737f8d3ce11732ac371712f1545015e339974 -n redis-bundle-po dman-0 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/f2b67e100201c2580300befece3737f8d3ce11732ac37 1712f1545015e339974/userdata -p /var/run/containers/storage/overlay-containers/f2b67e100201c2580300befece3737f8d3ce1173 2ac371712f1545015e339974/userdata/pidfile --exit-dir /var/run/libpod/exits --exit-command /usr/bin/podman --exit-comman d-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/co ntainers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-com mand-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-com mand-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg container --exit-comman d-arg cleanup --exit-command-arg f2b67e100201c2580300befece3737f8d3ce11732ac371712f1545015e339974 --socket-dir-path /va r/run/libpod/socket -l journald: --log-level error root 46444 46431 0 Nov04 ? 00:00:00 dumb-init --single-child -- /bin/bash /usr/local/bin/kolla_star t root 46458 46444 0 Nov04 ? 00:00:40 /usr/sbin/pacemaker_remoted 42460 46796 46444 1 Nov04 ? 00:17:48 /usr/bin/redis-server 172.17.1.49:6379 [root@overcloud-controller-0 ~]# perf stat -e 'syscalls:sys_enter_close' -p 46458 -- sleep 37 Performance counter stats for process id '46458': 1,049,414 syscalls:sys_enter_close 37.001985306 seconds time elapsed dnf install -y http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/pacemaker/2.0.2/3.el8_1.2/x86_64/pacemaker-2.0.2-3.el8_1.2.x86_64.rpm http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/pacemaker/2.0.2/3.el8_1.2/x86_64/pacemaker-cli-2.0.2-3.el8_1.2.x86_64.rpm http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/pacemaker/2.0.2/3.el8_1.2/x86_64/pacemaker-cluster-libs-2.0.2-3.el8_1.2.x86_64.rpm http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/pacemaker/2.0.2/3.el8_1.2/x86_64/pacemaker-libs-2.0.2-3.el8_1.2.x86_64.rpm http://sts.lab.msp.redhat.com/dist/brewroot/repos/errata-rhel8.1.z/x86_64/pacemaker-schemas-2.0.2-3.el8_1.2.noarch.rpm http://sts.lab.msp.redhat.com/dist/brewroot/repos/errata-rhel8.1.z/x86_64/pacemaker-remote-2.0.2-3.el8_1.2.x86_64.rpm [root@overcloud-controller-0 ~]# rpm -q pacemaker-remote pacemaker-remote-2.0.2-3.el8_1.2.x86_64 [root@overcloud-controller-0 ~]# podman exec -it redis-bundle-podman-0 bash ()[root@overcloud-controller-0 /]# rpm -q pacemaker-remote pacemaker-remote-2.0.2-3.el8_1.2.x86_64 [root@overcloud-controller-0 ~]# ps -efH|grep -A3 redis-bundle-p[o] root 330769 1 0 11:11 ? 00:00:00 /usr/libexec/podman/conmon -s -c 2a54b643a1eb5cecea0058c5fcfbefa5d8900ac8b34f750c02adaaad7829f012 -u 2a54b643a1eb5cecea0058c5fcfbefa5d8900ac8b34f750c02adaaad7829f012 -n redis-bundle-podman-0 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/2a54b643a1eb5cecea0058c5fcfbefa5d8900ac8b34f750c02adaaad7829f012/userdata -p /var/run/containers/storage/overlay-containers/2a54b643a1eb5cecea0058c5fcfbefa5d8900ac8b34f750c02adaaad7829f012/userdata/pidfile --exit-dir /var/run/libpod/exits --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 2a54b643a1eb5cecea0058c5fcfbefa5d8900ac8b34f750c02adaaad7829f012 --socket-dir-path /var/run/libpod/socket -l journald: --log-level error root 330781 330769 3 11:11 ? 00:00:00 dumb-init --single-child -- /bin/bash /usr/local/bin/kolla_start root 330796 330781 2 11:11 ? 00:00:00 /usr/sbin/pacemaker_remoted [root@overcloud-controller-0 ~]# perf stat -e 'syscalls:sys_enter_close' -p 330796 -- sleep 37 Broadcast message from systemd-journald@overcloud-controller-0 (Tue 2019-11-05 11:12:29 UTC): haproxy[300409]: proxy redis has no server available! Performance counter stats for process id '330796': 6,880 syscalls:sys_enter_close 37.002650077 seconds time elapsed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:1609