Bug 1429880

Summary: keepalived high number of close syscalls
Product: Red Hat Enterprise Linux 7 Reporter: Jaroslav Reznik <jreznik>
Component: keepalivedAssignee: Ryan O'Hara <rohara>
Status: CLOSED ERRATA QA Contact: Brandon Perkins <bperkins>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: aos-bugs, bmchugh, cluster-maint, csochin, dlbewley, erich, jruemker, mnavrati, nkim, rhowe, rmanes, rohara, yann.morice
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: keepalived-1.2.13-9.el7_3 Doc Type: Bug Fix
Doc Text:
Previously, the keepalived utility attempted to close a large number of file descriptors each time a notification script was invoked. As a consequence, keepalived generated many unnecessary close() system calls in an attempt to close file descriptors that were not open. This bug has been fixed by using the SOCK_CLOEXEC flag when opening all sockets, and the FD_CLOEXEC flag when opening all file descriptors. As a result, the number of close() system calls is no longer excessive.
Story Points: ---
Clone Of: 1324594 Environment:
Last Closed: 2017-05-25 15:37:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1324594    
Bug Blocks:    

Description Jaroslav Reznik 2017-03-07 11:35:54 UTC
This bug has been copied from bug #1324594 and has been proposed
to be backported to 7.3 z-stream (EUS).

Comment 6 errata-xmlrpc 2017-05-25 15:37:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1305

Comment 7 yann.morice 2017-06-20 13:46:57 UTC
Hi,

It seems that this patch of keepalived (1.2.13-9.el7_3) have introduced a regression. In an openstack cloud, keepalived instance of ha routers began to create a lot of pipes until too many open files. ex. Keepalived_vrrp[xxx]: Netlink: Cannot open netlink socket : (Too many open files). This was not the case in previous version (1.2.13-8.el7) => only two pipes...

I suspect this to be caused by SIGHUP signal from configuration changes (add of floating ip) as a similar router with no config changes stay ok.

In fact, I think the bug was introduced is in lib/signals.c. I don't see any mechanism to replace signal_handler_destroy for closing these two pipes. This piece of code is still in the trunk (and latest versions) of keepalived. I think these four lines should not be deleted :

-       close(signal_pipe[1]);
-       close(signal_pipe[0]);
-       signal_pipe[1] = -1;
-       signal_pipe[0] = -1;

For information (but not directly linked), we never use the code in #ifdef HAVE_PIPE2 because set of variable in configure is not propagated in Makefiles*. In fact, if we do strings /usr/sbin/keepalived |grep pipe on installed keepalived, we got pipe (instead of pipe2). I suspect we should have something like DEFS	 = @DFLAGS@ -D@SNMP_SUPPORT@ @DEFS@ in lib/Makefile.in to do this.

Comment 8 Ryan O'Hara 2017-06-21 15:41:06 UTC
(In reply to yann.morice from comment #7)
> Hi,
> 
> It seems that this patch of keepalived (1.2.13-9.el7_3) have introduced a
> regression. In an openstack cloud, keepalived instance of ha routers began
> to create a lot of pipes until too many open files. ex.
> Keepalived_vrrp[xxx]: Netlink: Cannot open netlink socket : (Too many open
> files). This was not the case in previous version (1.2.13-8.el7) => only two
> pipes...
> 
> I suspect this to be caused by SIGHUP signal from configuration changes (add
> of floating ip) as a similar router with no config changes stay ok.
> 
> In fact, I think the bug was introduced is in lib/signals.c. I don't see any
> mechanism to replace signal_handler_destroy for closing these two pipes.
> This piece of code is still in the trunk (and latest versions) of
> keepalived. I think these four lines should not be deleted :
> 
> -       close(signal_pipe[1]);
> -       close(signal_pipe[0]);
> -       signal_pipe[1] = -1;
> -       signal_pipe[0] = -1;
> 
> For information (but not directly linked), we never use the code in #ifdef
> HAVE_PIPE2 because set of variable in configure is not propagated in
> Makefiles*. In fact, if we do strings /usr/sbin/keepalived |grep pipe on
> installed keepalived, we got pipe (instead of pipe2). I suspect we should
> have something like DEFS	 = @DFLAGS@ -D@SNMP_SUPPORT@ @DEFS@ in
> lib/Makefile.in to do this.

Can you please provide some details regarding how you are seeing an increased number of pipes? This will help reproduce and fix any regression that was introduced by the patch. Thanks.

Comment 9 yann.morice 2017-06-23 07:14:47 UTC
* If we run a simple config using vrrp only :

# keepalived -P -f /etc/keepalived/keepalived.conf

vrrp_instance VR_1 {
    state BACKUP
    interface eth0
    virtual_router_id 1
    priority 50
    garp_master_delay 60
    nopreempt
    advert_int 2
    track_interface {
        eth0
    }
    virtual_ipaddress {
        169.254.0.2/24 dev eth0
    }
}

* We have then two new processes : 

# ps -eafwww |grep keepalived
root     13722     1  0 09:00 ?        00:00:00 keepalived -P -f /etc/keepalived/keepalived-simple.conf
root     13723 13722  0 09:00 ?        00:00:00 keepalived -P -f /etc/keepalived/keepalived-simple.conf

* First one has two pipes  :

# lsof |grep 13722|grep pipe|wc -l
2

* Second one has four pipes at the beginning :
# lsof |grep 13723|grep pipe|wc -l
4

* If we do SIGHUP to the main process to live update configuration :
# kill -HUP 13722

* and do again :
# lsof |grep 13723|grep pipe|wc -l
6

We have now six pipes... (+2 pipes at each SIGHUP in fact)

With the version 1.2.13-8.el7, doing the same, the second process stay at only two pipes across all SIGHUPs.

The problem is that this is widely used by openstack to live update configuration of routers (floating-ips, etc...) and do the router go off-line after too many open files...

Comment 10 Ryan O'Hara 2017-06-23 15:19:11 UTC
Thanks. Please open a new bugzilla for this issue.

Comment 11 yann.morice 2017-06-26 06:47:45 UTC
Ok. done => Bug #1464869