Bug 1413320

Summary: keepalived starts too soon
Product: [Fedora] Fedora Reporter: Steve Bennett <s.bennett>
Component: keepalivedAssignee: Ryan O'Hara <rohara>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 25CC: athmanem, bperkins, matthias, rohara, ruben, s.bennett
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: keepalived-1.3.5-1.fc26 keepalived-1.3.5-1.fc24 keepalived-1.3.5-1.fc25 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-12 14:50:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1425828    
Attachments:
Description Flags
journal log showing that keepalived starts too soon
none
keepalived config file used to reproduce the problem none

Description Steve Bennett 2017-01-14 23:59:19 UTC
Created attachment 1240831 [details]
journal log showing that keepalived starts too soon

Description of problem:
Service dependency on keepalived appears to be incorrect.
Keepalived attempts to start before network initialisation is complete, and in some circumstances this will cause service startup to fail

Version-Release number of selected component (if applicable):
1.3.2-1

How reproducible:
For me, it happens on every reboot (YMMV)

Steps to Reproduce:
1. Configure keepalived to use VRRP
2. enable keepalived
3. reboot

Actual results:
keepalived fails to start, reporting error similar to:
  Keepalived_vrrp[642]: (VI_1): Cannot find an IP address to use for interface ens192

Expected results:
keepalived starts successfully

Additional info:
Manually starting the service works as expected.

Changing the systemd service dependency from 'network-online.target' to 'network.target' appears to fix the problem. I believe that the behaviour of 'network.target' vs 'network-online.target' is the opposite of the behaviour implied by the systemd documentation, but many other (working) network services also appear to use 'network.target' rather than 'network-online.target'.

In my environment (VMs + fast storage) I'm seeing reasonably quick boot times (<5s from kernel start to network available), perhaps that's a contributory factor in the manifestation of this problem.

I've attached a journal log showing that keepalived is being started before the network is ready (and also before any other network-dependent services)

Comment 1 Steve Bennett 2017-01-15 00:08:51 UTC
Created attachment 1240832 [details]
keepalived config file used to reproduce the problem

Comment 2 Ryan O'Hara 2017-02-07 20:37:43 UTC
Thank you for reporting this. I am now seeing similar errors with keepalived-1.3.2 in F25 virtual machines. I am working on verifying your suggested fix. Once confirmed I will get an update out ASAP.

Comment 3 Ryan O'Hara 2017-02-09 01:56:35 UTC
Hi, Steve.

I changed the keepalived.service file on my development machines to have "After=network.target" instead of "After=network-online.target" and it did not solve the problem. Are you using the LSB network service or NetworkManager? My machines were using NetworkManager.

I came across this link [1] and it seems quite useful. There are two approaches we can take. First, users could enable NetworkManager-wait-online.service or systemd-networkd-wait-online.service, depending on use of network or NetworkManager. The other solution is to add "Wants=network-online.target" just below the "After=network-online.target". I tried this and is appears to work.

Would you mind adding "Wants=network-online.target" to the keepalived.service file and testing? I want to make sure that this also works in your environment before doing a new build. Be sure to do 'systemctl daemon-reload' after you edit the service file. Thanks.

[1] https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

Comment 4 Steve Bennett 2017-03-14 12:49:44 UTC
Hi Ryan,

Adding "Wants=network-online.target" seems to work for me too, and both the 'failing' behaviour (before making that change) and the 'succeeding' behaviour (after the change) make sense too - without that entry there's no guarantee that the network-online target will be pulled in.

It also explains why the problem is difficult to reproduce: you need a fast/simple service configuration (I only have keepalived and httpd), a 'slow' network configuration (e.g. I'm using DHCP) and no other services that request the network-online target.

Thanks!

Comment 5 Fedora Update System 2017-03-27 19:21:33 UTC
keepalived-1.3.5-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-a5c4484334

Comment 6 Fedora Update System 2017-03-28 01:49:26 UTC
keepalived-1.3.5-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-4f7cb7b55f

Comment 7 Fedora Update System 2017-03-28 08:51:21 UTC
keepalived-1.3.5-1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-5337793e7c

Comment 8 Fedora Update System 2017-04-12 14:50:07 UTC
keepalived-1.3.5-1.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 9 Fedora Update System 2017-04-12 19:49:32 UTC
keepalived-1.3.5-1.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 10 Fedora Update System 2017-04-12 20:22:49 UTC
keepalived-1.3.5-1.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.