Bug 1764181

Summary: pacemaker_remoted spams close() with large file descriptor limit [rhel-8.1.0.z]
Product: Red Hat Enterprise Linux 8 Reporter: Oneata Mircea Teodor <toneata>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED ERRATA QA Contact: pkomarov
Severity: high Docs Contact:
Priority: high    
Version: 8.0CC: abeekhof, aherr, cfeist, cluster-maint, cluster-qe, jeckersb, kgaillot, michele, pkomarov
Target Milestone: rcKeywords: ZStream
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-2.0.2-3.el8_1.2 Doc Type: Bug Fix
Doc Text:
Cause: When Pacemaker forked a child process to execute a command, it would close all possible file descriptors. Consequence: The overhead of many unnecessary system calls degraded performance. Fix: When it is possible to detect which file descriptors are open, Pacemaker closes only these in a newly forked child. Result: Performance is improved.
Story Points: ---
Clone Of: 1762025 Environment:
Last Closed: 2019-12-17 10:46:36 UTC Type: Enhancement
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1762025    
Bug Blocks:    

Comment 2 Ken Gaillot 2019-10-22 17:30:03 UTC
Fixed upstream by commits 5a73027 and 68d6a69

Comment 7 pkomarov 2019-11-05 11:17:38 UTC
Verified ,

root       46431       1  0 Nov04 ?        00:00:00   /usr/libexec/podman/conmon -s -c f2b67e100201c2580300befece3737f8
d3ce11732ac371712f1545015e339974 -u f2b67e100201c2580300befece3737f8d3ce11732ac371712f1545015e339974 -n redis-bundle-po
dman-0 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/f2b67e100201c2580300befece3737f8d3ce11732ac37
1712f1545015e339974/userdata -p /var/run/containers/storage/overlay-containers/f2b67e100201c2580300befece3737f8d3ce1173
2ac371712f1545015e339974/userdata/pidfile --exit-dir /var/run/libpod/exits --exit-command /usr/bin/podman --exit-comman
d-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/co
ntainers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-com
mand-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-com
mand-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg container --exit-comman
d-arg cleanup --exit-command-arg f2b67e100201c2580300befece3737f8d3ce11732ac371712f1545015e339974 --socket-dir-path /va
r/run/libpod/socket -l journald: --log-level error
root       46444   46431  0 Nov04 ?        00:00:00     dumb-init --single-child -- /bin/bash /usr/local/bin/kolla_star
t
root       46458   46444  0 Nov04 ?        00:00:40       /usr/sbin/pacemaker_remoted
42460      46796   46444  1 Nov04 ?        00:17:48       /usr/bin/redis-server 172.17.1.49:6379

[root@overcloud-controller-0 ~]# perf stat -e 'syscalls:sys_enter_close' -p 46458 -- sleep 37

 Performance counter stats for process id '46458':

         1,049,414      syscalls:sys_enter_close

      37.001985306 seconds time elapsed

dnf install -y http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/pacemaker/2.0.2/3.el8_1.2/x86_64/pacemaker-2.0.2-3.el8_1.2.x86_64.rpm http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/pacemaker/2.0.2/3.el8_1.2/x86_64/pacemaker-cli-2.0.2-3.el8_1.2.x86_64.rpm http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/pacemaker/2.0.2/3.el8_1.2/x86_64/pacemaker-cluster-libs-2.0.2-3.el8_1.2.x86_64.rpm http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/pacemaker/2.0.2/3.el8_1.2/x86_64/pacemaker-libs-2.0.2-3.el8_1.2.x86_64.rpm http://sts.lab.msp.redhat.com/dist/brewroot/repos/errata-rhel8.1.z/x86_64/pacemaker-schemas-2.0.2-3.el8_1.2.noarch.rpm http://sts.lab.msp.redhat.com/dist/brewroot/repos/errata-rhel8.1.z/x86_64/pacemaker-remote-2.0.2-3.el8_1.2.x86_64.rpm

[root@overcloud-controller-0 ~]# rpm -q pacemaker-remote
pacemaker-remote-2.0.2-3.el8_1.2.x86_64

[root@overcloud-controller-0 ~]# podman exec -it redis-bundle-podman-0 bash
()[root@overcloud-controller-0 /]# rpm -q pacemaker-remote
pacemaker-remote-2.0.2-3.el8_1.2.x86_64


[root@overcloud-controller-0 ~]# ps -efH|grep -A3 redis-bundle-p[o]
root      330769       1  0 11:11 ?        00:00:00   /usr/libexec/podman/conmon -s -c 2a54b643a1eb5cecea0058c5fcfbefa5d8900ac8b34f750c02adaaad7829f012 -u 2a54b643a1eb5cecea0058c5fcfbefa5d8900ac8b34f750c02adaaad7829f012 -n redis-bundle-podman-0 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/2a54b643a1eb5cecea0058c5fcfbefa5d8900ac8b34f750c02adaaad7829f012/userdata -p /var/run/containers/storage/overlay-containers/2a54b643a1eb5cecea0058c5fcfbefa5d8900ac8b34f750c02adaaad7829f012/userdata/pidfile --exit-dir /var/run/libpod/exits --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 2a54b643a1eb5cecea0058c5fcfbefa5d8900ac8b34f750c02adaaad7829f012 --socket-dir-path /var/run/libpod/socket -l journald: --log-level error
root      330781  330769  3 11:11 ?        00:00:00     dumb-init --single-child -- /bin/bash /usr/local/bin/kolla_start
root      330796  330781  2 11:11 ?        00:00:00       /usr/sbin/pacemaker_remoted
[root@overcloud-controller-0 ~]# perf stat -e 'syscalls:sys_enter_close' -p 330796 -- sleep 37

Broadcast message from systemd-journald@overcloud-controller-0 (Tue 2019-11-05 11:12:29 UTC):

haproxy[300409]: proxy redis has no server available!


 Performance counter stats for process id '330796':

             6,880      syscalls:sys_enter_close

      37.002650077 seconds time elapsed

Comment 9 errata-xmlrpc 2019-12-17 10:46:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:4261