Bug 1692788
| Summary: | keepalived crashes in a loop when the vrrp interface does not exist | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Michele Baldessari <michele> |
| Component: | keepalived | Assignee: | Ryan O'Hara <rohara> |
| Status: | CLOSED DUPLICATE | QA Contact: | Brandon Perkins <bperkins> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 8.0 | CC: | cfeist, cluster-maint, lmiccini |
| Target Milestone: | rc | ||
| Target Release: | 8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-05-10 16:47:56 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I am unable to reproduce this.
# rpm -q keepalived
keepalived-2.0.7-2.el8.x86_64
# cat /etc/keepalived/keepalived.conf
global_defs {
router_id MESA-01
}
vrrp_instance VRRP-01 {
interface foo
priority 141
advert_int 1
state BACKUP
virtual_router_id 31
virtual_ipaddress {
10.15.85.31
}
}
# systemctl start keepalived
# journalctl -afu keepalived
May 10 12:34:55 mesa-virt-01_RHEL8 systemd[1]: Starting LVS and VRRP High Availability Monitor...
May 10 12:34:55 mesa-virt-01_RHEL8 systemd[1]: keepalived.service: Can't open PID file /var/run/keepalived.pid (yet?) after start: No such file or directory
May 10 12:34:55 mesa-virt-01_RHEL8 Keepalived[30315]: Starting VRRP child process, pid=30316
May 10 12:34:55 mesa-virt-01_RHEL8 Keepalived[30315]: Keepalived_vrrp exited with permanent error CONFIG. Terminating
May 10 12:34:55 mesa-virt-01_RHEL8 systemd[1]: Started LVS and VRRP High Availability Monitor.
No core dump for non-existent interface.
Seems more likely that you're hitting the double-free bug that was fixed here:
https://bugzilla.redhat.com/show_bug.cgi?id=1693706
Agree looks to be the same root-cause. Will reopen if that is not the case *** This bug has been marked as a duplicate of bug 1693706 *** OK, with a more recent version of keepalived I can recreate this problem. # rpm -q keepalived keepalived-2.0.10-1.el8.x86_64 Starting keepalived will repeatedly die (coredump) and log the following: May 10 13:09:08 mesa-virt-01_RHEL8 Keepalived[30730]: Starting VRRP child process, pid=31311 May 10 13:09:08 mesa-virt-01_RHEL8 Keepalived_vrrp[31311]: Registering Kernel netlink reflector May 10 13:09:08 mesa-virt-01_RHEL8 Keepalived_vrrp[31311]: Registering Kernel netlink command channel May 10 13:09:08 mesa-virt-01_RHEL8 Keepalived_vrrp[31311]: Opening file '/etc/keepalived/keepalived.conf'. May 10 13:09:08 mesa-virt-01_RHEL8 Keepalived_vrrp[31311]: (Line 17) WARNING - interface foo for vrrp_instance VRRP-01 doesn't exist May 10 13:09:08 mesa-virt-01_RHEL8 Keepalived_vrrp[31311]: Non-existent interface specified in configuration May 10 13:09:08 mesa-virt-01_RHEL8 Keepalived[30730]: Keepalived_vrrp exited due to signal 6 May 10 13:09:08 mesa-virt-01_RHEL8 Keepalived[30730]: VRRP child process(31311) died: Respawning Note that you can stop keepalived from respawning by using the -R option. But the reason for the coredump has nothing to do with the non-existent interface but rather the smtp configuration and a double free when keepalived stops. See rhbz#1693706. I've tested with the latest build for 8.1 and it works as expected. Closing this as duplicate. |
Description of problem: keepalived goes in a crashing loop if the interface on which vrrp is configured does not exist (which might be triggered by an ovs restart for example) Version-Release number of selected component (if applicable): keepalived-2.0.10-1.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1. Use the following keepalived.conf global_defs { notification_email { root } notification_email_from keepalived smtp_server localhost smtp_connect_timeout 30 router_id undercloud-0 } static_ipaddress { } vrrp_script haproxy { script "test -S /var/lib/haproxy/stats && echo "show info" | socat /var/lib/haproxy/stats stdio" interval 2 weight 2 } vrrp_instance 51 { virtual_router_id 51 # Advert interval advert_int 1 # for electing MASTER, highest priority wins. priority 101 state MASTER interface br-ctlplane virtual_ipaddress { 192.168.24.3 dev br-ctlplane } track_script { haproxy } } vrrp_instance 52 { virtual_router_id 52 # Advert interval advert_int 1 # for electing MASTER, highest priority wins. priority 101 state MASTER interface br-ctlplane virtual_ipaddress { 192.168.24.2 dev br-ctlplane } track_script { haproxy } } 2. systemctl start keepalived 3. Observe the crash: Mar 26 13:18:27 rhel8.int.rhx systemd-coredump[9615]: Process 9613 (keepalived) of user 0 dumped core. Stack trace of thread 9613: #0 0x00007f77d5f6593f raise (libc.so.6) #1 0x00007f77d5f4fd5e abort (libc.so.6) #2 0x00007f77d5fa8d57 __libc_message (libc.so.6) #3 0x00007f77d5faf68c malloc_printerr (libc.so.6) #4 0x00007f77d5fb1027 _int_free (libc.so.6) #5 0x00005616b1e8bccd free_global_data (keepalived) #6 0x00005616b1ea8155 vrrp_terminate_phase2 (keepalived) #7 0x00005616b1ea8361 stop_vrrp (keepalived) #8 0x00005616b1ea86ee stop_vrrp (keepalived) #9 0x00005616b1ea8c5f start_vrrp_child (keepalived) #10 0x00005616b1ea8cb6 vrrp_respawn_thread (keepalived) #11 0x00005616b1ed7623 thread_call (keepalived) #12 0x00005616b1e8ada6 keepalived_main (keepalived) #13 0x00007f77d5f51813 __libc_start_main (libc.so.6) #14 0x00005616b1e890ee _start (keepalived) Note that the crash loops all the time: [root@rhel8 var]# coredumpctl list |grep keepalived|wc -l 3223 (gdb) bt full #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 set = {__val = {0, 18446744073709551615 <repeats 12 times>, 140152668463007, 0, 532575944823}} pid = <optimized out> tid = <optimized out> ret = <optimized out> #1 0x00007f77d5f4fd5e in __GI_abort () at abort.c:100 act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {18446744073709551615 <repeats 16 times>}}, sa_flags = 0, sa_restorer = 0x0} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x00007f77d5fa8d57 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f77d60b6178 "%s\n") at ../sysdeps/posix/libc_fatal.c:181 ap = {{gp_offset = 24, fp_offset = 0, overflow_arg_area = 0x7fffb654ba70, reg_save_area = 0x7fffb654ba00}} fd = <optimized out> list = <optimized out> nlist = <optimized out> cp = <optimized out> written = <optimized out> #3 0x00007f77d5faf68c in malloc_printerr (str=str@entry=0x7f77d60b7e10 "double free or corruption (fasttop)") at malloc.c:5364 No locals. #4 0x00007f77d5fb1027 in _int_free (av=0x7f77d62ecc60 <main_arena>, p=0x5616b25588f0, have_lock=<optimized out>) at malloc.c:4244 idx = 0 old = <optimized out> old2 = <optimized out> size = <optimized out> fb = 0x7f77d62ecc70 <main_arena+16> nextchunk = <optimized out> nextsize = <optimized out> nextinuse = <optimized out> prevsize = <optimized out> bck = <optimized out> fwd = <optimized out> __PRETTY_FUNCTION__ = "_int_free" #5 0x00005616b1e8bccd in free_global_data (data=0x5616b2553820) at global_data.c:325 No locals. #6 0x00005616b1ea8155 in vrrp_terminate_phase2 (exit_status=exit_status@entry=3) at vrrp_daemon.c:261 usage = {ru_utime = {tv_sec = 94655481232080, tv_usec = 94655474374192}, ru_stime = {tv_sec = 94655481220944, tv_usec = 94655474377042}, {ru_maxrss = 94655481258176, __ru_maxrss_word = 94655481258176}, {ru_ixrss = 94655474389450, __ru_ixrss_word = 94655474389450}, {ru_idrss = 94655481209472, __ru_idrss_word = 94655481209472}, { ru_isrss = 94655474398014, __ru_isrss_word = 94655474398014}, {ru_minflt = 0, __ru_minflt_word = 0}, {ru_majflt = 94655481268912, __ru_majflt_word = 94655481268912}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 94655481208496, __ru_inblock_word = 94655481208496}, {ru_oublock = 94655481220928, __ru_oublock_word = 94655481220928}, {ru_msgsnd = 94655474261594, __ru_msgsnd_word = 94655474261594}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 94655481192528, __ru_nsignals_word = 94655481192528}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, {ru_nivcsw = 94655474262307, __ru_nivcsw_word = 94655474262307}} #7 0x00005616b1ea8361 in stop_vrrp (status=status@entry=3) at vrrp_daemon.c:429 No locals. #8 0x00005616b1ea86ee in stop_vrrp (status=3) at ../../lib/bitops.h:49 No locals. #9 start_vrrp (old_global_data=old_global_data@entry=0x0) at vrrp_daemon.c:467 No locals. #10 0x00005616b1ea8c5f in start_vrrp_child () at vrrp_daemon.c:1002 pid = <optimized out> syslog_ident = <optimized out> pid = <optimized out> syslog_ident = <optimized out> #11 0x00005616b1ea8cb6 in vrrp_respawn_thread (thread=<optimized out>) at vrrp_daemon.c:832 No locals. #12 0x00005616b1ed7623 in thread_call (thread=0x5616b25590a0) at scheduler.c:1720 No locals. #13 process_threads (m=0x5616b2558f40) at scheduler.c:1720 thread = 0x5616b25590a0 thread_list = <optimized out> thread_type = <optimized out> #14 0x00005616b1ed7ff5 in launch_thread_scheduler (m=<optimized out>) at scheduler.c:1815 No locals. #15 0x00005616b1e8ada6 in keepalived_main (argc=2, argv=<optimized out>) at main.c:1897 report_stopped = true uname_buf = {sysname = "Linux", '\000' <repeats 59 times>, nodename = "rhel8.int.rhx", '\000' <repeats 51 times>, release = "4.18.0-80.el8.x86_64", '\000' <repeats 44 times>, version = "#1 SMP Wed Mar 13 12:02:46 UTC 2019", '\000' <repeats 29 times>, machine = "x86_64", '\000' <repeats 58 times>, domainname = "(none)", '\000' <repeats 58 times>} end = 0x7fffb654be46 ".int.rhx" #16 0x00007f77d5f51813 in __libc_start_main (main=0x5616b1e890b0 <main>, argc=2, argv=0x7fffb654c628, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffb654c618) at ../csu/libc-start.c:308 self = <optimized out> result = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, 4084918265210392554, 94655474077888, 140736252397088, 0, 0, 7737893304674471914, 7670267560097533930}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x7fffb654c640, 0x7f77d8540150}, data = {prev = 0x0, cleanup = 0x0, canceltype = -1235958208}}} not_first_call = <optimized out> #17 0x00005616b1e890ee in _start () No symbol table info available. I think it is fine if keepalived keeps retrying the vrrp (after all we configured it for that interface), but it should not crash as that is really bringing the whole system down due to coredump constantly kicking in.