1202584 – Keepalived instances flapping to MASTER then back to STANDBY on failover with nopreempt

Bug 1202584 - Keepalived instances flapping to MASTER then back to STANDBY on failover with nopreempt

Summary: Keepalived instances flapping to MASTER then back to STANDBY on failover with...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	keepalived
Sub Component:
Version:	20
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Matthias Saou
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-03-16 23:43 UTC by Assaf Muller
Modified:	2015-03-29 04:34 UTC (History)
CC List:	4 users (show)
Fixed In Version:	keepalived-1.2.15-3.fc20
Clone Of:
Environment:
Last Closed:	2015-03-29 04:24:31 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Assaf Muller 2015-03-16 23:43:38 UTC

Description of problem:
Using OpenStack Neutron with highly available routers, configure three L3 agents and create an HA router. Go into the router namespace of the master and set the HA device to down. Observe syslog of the other two nodes, on one you will see (Over and over again):

Transition to MASTER STATE
Entering MASTER STATE
Received lower prio advert in nopreempt mode
Entering BACKUP STATE

In the other node:
Transition to MASTER STATE
Entering MASTER STATE
Received higher prio advert
Entering BACKUP STATE

Version-Release number of selected component (if applicable):
Does not reproduce on 1.2.9-1.fc20, does reproduce on 1.2.15-2.fc20.

How reproducible:
100%

Steps to Reproduce:
Detailed in problem description.

Actual results:
Nodes flapping from standby to master and back.

Expected results:
One should go to master, the other should remain in standby.

Additional information:
On each node: keepalived.conf:
Node 1 - http://www.fpaste.org/198754/
Node 2 - http://www.fpaste.org/198756/
Node 3 - http://www.fpaste.org/198757/

'ip a' output in namespace of the router:
Node 1 - http://www.fpaste.org/198758/
Node 2 - http://www.fpaste.org/198759/
Node 3 - http://www.fpaste.org/198761/

syslog summary is directly in the bug report above.

To work around the issue I tried specifying unique priorities, specifying the source advertisement address (A unique address per router instance, the one you can see in the 'ip a' output above), setting the initial state to EQUAL. I tried these in pretty much all permutations, nothing seems to make an effect. Working with pre-emption turned on eliminated the issue entirely, but no-preemption should work, and is preferred for the Neutron use case (We don't want elections when the faulty node comes back on for thousands of routers, there's just no need for another disruption in the data plane).

Comment 1 Assaf Muller 2015-03-17 01:28:07 UTC

More information, these two patches were introduced in 1.2.14:
e18370cb165d21db954c08ddbce1b39d97858012
13693a2d1b834c749394ef0bdee6afe9eb1fad2d

And changed the behavior.

Comment 2 Ryan O'Hara 2015-03-18 13:55:44 UTC

Fixed upstream with this commit:

https://github.com/acassen/keepalived/commit/2bab517b2b50c1e784e79a082d971f4855e9e0ab

Should land in Fedora packages today.

Comment 3 Fedora Update System 2015-03-18 14:56:43 UTC

keepalived-1.2.15-3.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/keepalived-1.2.15-3.fc22

Comment 4 Fedora Update System 2015-03-18 14:56:49 UTC

keepalived-1.2.15-3.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/keepalived-1.2.15-3.fc21

Comment 5 Fedora Update System 2015-03-18 14:56:54 UTC

keepalived-1.2.15-3.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/keepalived-1.2.15-3.fc20

Comment 6 Fedora Update System 2015-03-19 18:41:47 UTC

Package keepalived-1.2.15-3.fc22:
* should fix your issue,
* was pushed to the Fedora 22 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing keepalived-1.2.15-3.fc22'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-4177/keepalived-1.2.15-3.fc22
then log in and leave karma (feedback).

Comment 7 Assaf Muller 2015-03-20 17:58:09 UTC

New RPM verified to fix the bug.

Comment 8 Fedora Update System 2015-03-29 04:24:31 UTC

keepalived-1.2.15-3.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 9 Fedora Update System 2015-03-29 04:34:37 UTC

keepalived-1.2.15-3.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 10 Fedora Update System 2015-03-29 04:34:44 UTC

keepalived-1.2.15-3.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.