Bug 1413320 - keepalived starts too soon
Summary: keepalived starts too soon
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: keepalived
Version: 25
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Ryan O'Hara
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1425828
TreeView+ depends on / blocked
 
Reported: 2017-01-14 23:59 UTC by Steve Bennett
Modified: 2017-04-12 20:22 UTC (History)
6 users (show)

Fixed In Version: keepalived-1.3.5-1.fc26 keepalived-1.3.5-1.fc24 keepalived-1.3.5-1.fc25
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-12 14:50:07 UTC
Type: Bug


Attachments (Terms of Use)
journal log showing that keepalived starts too soon (149.74 KB, text/x-vhdl)
2017-01-14 23:59 UTC, Steve Bennett
no flags Details
keepalived config file used to reproduce the problem (602 bytes, text/plain)
2017-01-15 00:08 UTC, Steve Bennett
no flags Details

Description Steve Bennett 2017-01-14 23:59:19 UTC
Created attachment 1240831 [details]
journal log showing that keepalived starts too soon

Description of problem:
Service dependency on keepalived appears to be incorrect.
Keepalived attempts to start before network initialisation is complete, and in some circumstances this will cause service startup to fail

Version-Release number of selected component (if applicable):
1.3.2-1

How reproducible:
For me, it happens on every reboot (YMMV)

Steps to Reproduce:
1. Configure keepalived to use VRRP
2. enable keepalived
3. reboot

Actual results:
keepalived fails to start, reporting error similar to:
  Keepalived_vrrp[642]: (VI_1): Cannot find an IP address to use for interface ens192

Expected results:
keepalived starts successfully

Additional info:
Manually starting the service works as expected.

Changing the systemd service dependency from 'network-online.target' to 'network.target' appears to fix the problem. I believe that the behaviour of 'network.target' vs 'network-online.target' is the opposite of the behaviour implied by the systemd documentation, but many other (working) network services also appear to use 'network.target' rather than 'network-online.target'.

In my environment (VMs + fast storage) I'm seeing reasonably quick boot times (<5s from kernel start to network available), perhaps that's a contributory factor in the manifestation of this problem.

I've attached a journal log showing that keepalived is being started before the network is ready (and also before any other network-dependent services)

Comment 1 Steve Bennett 2017-01-15 00:08:51 UTC
Created attachment 1240832 [details]
keepalived config file used to reproduce the problem

Comment 2 Ryan O'Hara 2017-02-07 20:37:43 UTC
Thank you for reporting this. I am now seeing similar errors with keepalived-1.3.2 in F25 virtual machines. I am working on verifying your suggested fix. Once confirmed I will get an update out ASAP.

Comment 3 Ryan O'Hara 2017-02-09 01:56:35 UTC
Hi, Steve.

I changed the keepalived.service file on my development machines to have "After=network.target" instead of "After=network-online.target" and it did not solve the problem. Are you using the LSB network service or NetworkManager? My machines were using NetworkManager.

I came across this link [1] and it seems quite useful. There are two approaches we can take. First, users could enable NetworkManager-wait-online.service or systemd-networkd-wait-online.service, depending on use of network or NetworkManager. The other solution is to add "Wants=network-online.target" just below the "After=network-online.target". I tried this and is appears to work.

Would you mind adding "Wants=network-online.target" to the keepalived.service file and testing? I want to make sure that this also works in your environment before doing a new build. Be sure to do 'systemctl daemon-reload' after you edit the service file. Thanks.

[1] https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

Comment 4 Steve Bennett 2017-03-14 12:49:44 UTC
Hi Ryan,

Adding "Wants=network-online.target" seems to work for me too, and both the 'failing' behaviour (before making that change) and the 'succeeding' behaviour (after the change) make sense too - without that entry there's no guarantee that the network-online target will be pulled in.

It also explains why the problem is difficult to reproduce: you need a fast/simple service configuration (I only have keepalived and httpd), a 'slow' network configuration (e.g. I'm using DHCP) and no other services that request the network-online target.

Thanks!

Comment 5 Fedora Update System 2017-03-27 19:21:33 UTC
keepalived-1.3.5-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-a5c4484334

Comment 6 Fedora Update System 2017-03-28 01:49:26 UTC
keepalived-1.3.5-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-4f7cb7b55f

Comment 7 Fedora Update System 2017-03-28 08:51:21 UTC
keepalived-1.3.5-1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-5337793e7c

Comment 8 Fedora Update System 2017-04-12 14:50:07 UTC
keepalived-1.3.5-1.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 9 Fedora Update System 2017-04-12 19:49:32 UTC
keepalived-1.3.5-1.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 10 Fedora Update System 2017-04-12 20:22:49 UTC
keepalived-1.3.5-1.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.