Description of problem: Using OpenStack Neutron with highly available routers, configure three L3 agents and create an HA router. Go into the router namespace of the master and set the HA device to down. Observe syslog of the other two nodes, on one you will see (Over and over again): Transition to MASTER STATE Entering MASTER STATE Received lower prio advert in nopreempt mode Entering BACKUP STATE In the other node: Transition to MASTER STATE Entering MASTER STATE Received higher prio advert Entering BACKUP STATE Version-Release number of selected component (if applicable): Does not reproduce on 1.2.9-1.fc20, does reproduce on 1.2.15-2.fc20. How reproducible: 100% Steps to Reproduce: Detailed in problem description. Actual results: Nodes flapping from standby to master and back. Expected results: One should go to master, the other should remain in standby. Additional information: On each node: keepalived.conf: Node 1 - http://www.fpaste.org/198754/ Node 2 - http://www.fpaste.org/198756/ Node 3 - http://www.fpaste.org/198757/ 'ip a' output in namespace of the router: Node 1 - http://www.fpaste.org/198758/ Node 2 - http://www.fpaste.org/198759/ Node 3 - http://www.fpaste.org/198761/ syslog summary is directly in the bug report above. To work around the issue I tried specifying unique priorities, specifying the source advertisement address (A unique address per router instance, the one you can see in the 'ip a' output above), setting the initial state to EQUAL. I tried these in pretty much all permutations, nothing seems to make an effect. Working with pre-emption turned on eliminated the issue entirely, but no-preemption should work, and is preferred for the Neutron use case (We don't want elections when the faulty node comes back on for thousands of routers, there's just no need for another disruption in the data plane).
More information, these two patches were introduced in 1.2.14: e18370cb165d21db954c08ddbce1b39d97858012 13693a2d1b834c749394ef0bdee6afe9eb1fad2d And changed the behavior.
Fixed upstream with this commit: https://github.com/acassen/keepalived/commit/2bab517b2b50c1e784e79a082d971f4855e9e0ab Should land in Fedora packages today.
keepalived-1.2.15-3.fc22 has been submitted as an update for Fedora 22. https://admin.fedoraproject.org/updates/keepalived-1.2.15-3.fc22
keepalived-1.2.15-3.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/keepalived-1.2.15-3.fc21
keepalived-1.2.15-3.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/keepalived-1.2.15-3.fc20
Package keepalived-1.2.15-3.fc22: * should fix your issue, * was pushed to the Fedora 22 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing keepalived-1.2.15-3.fc22' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2015-4177/keepalived-1.2.15-3.fc22 then log in and leave karma (feedback).
New RPM verified to fix the bug.
keepalived-1.2.15-3.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.
keepalived-1.2.15-3.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.
keepalived-1.2.15-3.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.