Description of problem:
keepalived is generating a high number of close syscalls, when a third party agent never drops close events and processing this amount of events is causing the high CPU usage and node to crash
Version-Release number of selected component (if applicable):
Openshift 3.1.1.6
keepalived-1.2.13-7.el7.x86_64
How reproducible: Have not reproduced, with 3rd party application.
Steps to Reproduce:
1.Configure Ipfailover on openshift
2.Configure 3rd party sysdig to openshift
Actual results:
Openshift node crashes
Expected results:
keepalived to not generate as many syscalls
Additional info:
OpenShift User list
http://lists.openshift.redhat.com/openshift-archives/users/2016-April/msg00045.html
Customer is running sysdig
http://www.sysdig.org/http://www.keepalived.org/changelog.html
keepalived 1.2.20
"Optimise closure of fds before invoking scripts.
Every time before a script was invoked, closeall() was called,
which would spin through 1024 file descriptors closing them, even
though the vast majority were not open, resulting in 1024 system
calls. To avoid that, open all sockets and file descriptors
(except fd 0/1/2) with the CLOEXEC flag set, so that the fds will
be closed by the kernel when the script is exec'd."
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2017:2169