Bug 1299971

Summary: keepalived child segfaults when unicast_src_ip option does not have value
Product: Red Hat Enterprise Linux 6 Reporter: Petr Barta <pbarta>
Component: keepalivedAssignee: Ryan O'Hara <rohara>
Status: CLOSED WONTFIX QA Contact: Brandon Perkins <bperkins>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.4CC: cluster-maint, cww, quentin, rohara, zpytela
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1435361 (view as bug list) Environment:
Last Closed: 2017-08-15 18:30:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1269194, 1435361    

Description Petr Barta 2016-01-19 16:01:56 UTC
Description of problem:

Keepalive segfaults when there is no IP value defined for unicast_src_ip option in keepalived.conf. No error message is logged


Version-Release number of selected component (if applicable):
keepalived-1.2.13-5.el6_6.x86_64


How reproducible:
Always


Steps to Reproduce:
1. Prepare basic keepalived configuration with unicast_src_ip option without value.
Example basic keepalived as follows:
vrrp_instance unify_lvs_service_private {
    priority 100
    advert_int 1
    authentication {
        auth_pass 1111
        auth_type PASS
    }
    state BACKUP
    unicast_peer {
    }
    unicast_src_ip
    interface eth1
    nopreempt
    virtual_ipaddress {
        192.168.10.3/24
    }
    virtual_router_id 52
}
vrrp_instance unify_lvs_service_public {
    priority 100
    advert_int 1
    authentication {
        auth_pass 1111
        auth_type PASS
    }
    unicast_peer {
        10.39.84.145
    }
    state BACKUP
    unicast_src_ip 10.39.84.150
    interface eth0
    nopreempt
    virtual_ipaddress {
        10.39.84.221/25
        10.39.84.220/25
    }
    virtual_router_id 51
}
vrrp_sync_group unify_lvs_service {
    group {
        unify_lvs_service_private
        unify_lvs_service_public
    }
    !group_name unify_lvs_service
}

2. start keepalived service
3. check /var/log messages for respawning child

Actual results:
Child respowning without providing any clue in /var/log/messages why.


Expected results:
Log in /var/log/messages providing information why parsing of keepalived.conf fails and child respawns


Additional info:

My reproducer shows following in /var/log/messages:
Jan 18 12:21:22 rhel6 Keepalived_vrrp[22726]: Netlink reflector reports IP 192.168.122.202 added
Jan 18 12:21:22 rhel6 Keepalived_vrrp[22726]: Netlink reflector reports IP fe80::5054:ff:fed7:7bf4 added
Jan 18 12:21:22 rhel6 Keepalived_vrrp[22726]: Netlink reflector reports IP fe80::5054:ff:feab:b9d9 added
Jan 18 12:21:22 rhel6 Keepalived_vrrp[22726]: Netlink reflector reports IP fe80::5054:ff:fe01:8bdf added
Jan 18 12:21:22 rhel6 Keepalived_vrrp[22726]: Registering Kernel netlink reflector
Jan 18 12:21:22 rhel6 Keepalived_vrrp[22726]: Registering Kernel netlink command channel
Jan 18 12:21:22 rhel6 Keepalived_vrrp[22726]: Registering gratuitous ARP shared channel
Jan 18 12:21:22 rhel6 Keepalived_vrrp[22726]: Opening file '/etc/keepalived/keepalived.conf'.


Related strace:
----
getuid()                                = 0
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 10
connect(10, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(10)                               = 0
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 10
connect(10, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(10)                               = 0
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 10
fstat(10, {st_mode=S_IFREG|0644, st_size=1865, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb1a40a2000
read(10, "root:x:0:0:root:/root:/bin/bash\nbin:x:1:1:bin:/bin:/sbin/nologin\ndaemon:x:2:2:daemon:/sbin:/sbin/nologin\nadm:x:3:4:adm:/var/adm:/sbin/nologin\nlp:x:4:7:lp:/var/spool/lpd:/sbin/nologin\nsync:x:5:0:sync:/sbin:/bin/sync\nshutdown:x:6:0:shutdown:/sbin:/sbin/shutd"..., 4096) = 1865
close(10)                               = 0
munmap(0x7fb1a40a2000, 4096)            = 0
stat("/etc/keepalived/keepalived.conf", {st_mode=S_IFREG|0644, st_size=862, ...}) = 0
gettimeofday({1453116082, 925966}, NULL) = 0
sendto(3, "<142>Jan 18 12:21:22 Keepalived_vrrp[22726]: Opening file '/etc/keepalived/keepalived.conf'.", 92, MSG_NOSIGNAL, NULL, 0) = 92
open("/etc/keepalived/keepalived.conf", O_RDONLY) = 10
getcwd("/", 1024)                       = 2
chdir("/etc/keepalived")                = 0
fstat(10, {st_mode=S_IFREG|0644, st_size=862, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb1a40a2000
read(10, "vrrp_instance unify_lvs_service_private {\n    priority 100\n    advert_int 1\n    authentication {\n        auth_pass 1111\n        auth_type PASS\n    }\n    state BACKUP\n    unicast_peer {\n    }\n    unicast_src_ip\n    interface eth1\n    nopreempt\n    virtual_i"..., 4096) = 862
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x70695f637270} ---
+++ killed by SIGSEGV (core dumped) +++

Comment 2 Ryan O'Hara 2016-01-19 20:13:46 UTC
This is pretty much always the case with misconfigured keepalived.conf. The config file parser does not deal with errors very well, to say the least. I will look into this, but be advised that the problem here is more widespread than just 'unicast_src_ip' -- most parameters that require an value in keepalived.conf will cause a segfault if the value is missing. This amounts to a fair amount of work to address.

Comment 3 Petr Barta 2016-01-20 06:28:47 UTC
Hello Ryan,
  Thank for the comment. Yes, I thought so, forgot to add this to the description. I would guess that there should be some logging on all options requiring values, but not having set ones in conf file. I'm aware that this may be quite a big amount of work and time, but I think it's worth it.

BR,
Petr

Comment 6 Quentin Armitage 2016-07-29 19:31:06 UTC
I have it on my list of things to do with keepalived to resolve issues such as this with the parser. The only reason this is on my low priority list is that, so far as I know, any correct configuration is properly parsed.

The problems (that I am aware of) with the parser are:
i. It doesn't handle missing parameters (this problem)
ii. When it does find a problem it isn't helpful in identifing where the problem is in the config file
iii. The formatting for '{' and '}' is restrictive and inconsistent
iv. Keywords that are unique to the vrrp or checker child processes are reported as unknown keywords by the other child process. Before I changed the behaviour, unknown keywords were silently ignored, which made it a nightmare to track down typos in a config file.

I think the only solution is a major restructuring of the parser. Does anyone know of a parser in any other package that might be suitable to be adapted for keepalived? If so, it would certainly reduce the time to implementing a revised parser.

Comment 7 Ryan O'Hara 2016-07-29 20:14:29 UTC
(In reply to Quentin Armitage from comment #6)
> I think the only solution is a major restructuring of the parser. Does
> anyone know of a parser in any other package that might be suitable to be
> adapted for keepalived? If so, it would certainly reduce the time to
> implementing a revised parser.

Look at multipath-tools. The parser is the same code, but the dm-multipath developers have added to it and it seems much better at detecting and handling errors.

To be specific, the git repo for multipath-tools is here:

http://git.opensvc.com/multipath-tools/.git

In libmultipath directory there is a parser.[ch] files that will look very familiar.

Comment 10 Quentin Armitage 2017-08-15 18:43:26 UTC
The segfault problem with the parser was fixed in keepalived v1.3.0

Comment 11 Brandon Perkins 2017-08-15 20:50:54 UTC
(In reply to Quentin Armitage from comment #10)
> The segfault problem with the parser was fixed in keepalived v1.3.0

Yes, but RHEL6 ships with keepalived-1.2.13

From what I understand from Ryan, the cherry-pick from 1.3 to 1.2 for this change would not be trivial.  If that's not the case, we could re-evaluate, but it would still need a business justification to fix.