Bug 1110363

Summary: Keepalived on base repository have a serious bug
Product: Red Hat Enterprise Linux 6 Reporter: roxyrob <roberto.cazzato>
Component: keepalivedAssignee: Ryan O'Hara <rohara>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.5CC: bperkins, cluster-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-17 15:48:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Keepalived nodes /var/log/messages with comment none

Description roxyrob 2014-06-17 13:47:49 UTC
Created attachment 909587 [details]
Keepalived nodes /var/log/messages with comment

Description of problem:
We have a serious problem with Keepalived distributed on red-hat/CentOS base repository.

We installed and configured Keepalived on 2 HA firewalls Virtual Machines (VMWare ESXi infrastructure). Suddenly, Keepalived BACKUP instance (secondary), probably for a little unresponsiveness of network connection, go in "Transition to MASTER STATE", immediately see MASTER (Received higher prio advert) and goes to Backup state "Entering BACKUP STATE". During this sudden transition, VIP remain only on the MASTER but communications on networks managed by MASTER are lost. No communications take place until we restart Keepalived service on the MASTER. Restarting service manually works fine but surviving to this very little BACKUP fluctuation does not work. Also arp advertising does not work correctly on virtualized environment for this Keepalived version (keepalived developers have released many fix for these problems).

At least, on Keepalived site, there is a Change Log similar to the first problem: 2014-01-03 Release 1.2.10 "vrrp: fix/extend gratuitous ARP handling... In some cases those virtualized env offer a virtualized layer2...". Other Change Log with various fix for gratuitous arp behavior.

Can you upgrade keepalived on base repo as we benefit from these fix without manual compiling, breaking future repo meneged upgrades ?
Thank you very much


Version-Release number of selected component (if applicable):
We use RedHat/CentOS 6.5 with Keepalived v1.2.7 (02/21,2013) installed from base repository.


How reproducible:
First issue normally can happen also without specific action, probably due to virtualization mode of network behavior. If you need, you can reproduce this problem making 2 virtual machine with Keepalived in BACKUP-BACKUP with no-preempt configuration. Take a vm snapshot. Little unresponsiveness of a machine reveal this issue (snapshot can be an example, but probably also little network communication interruption between 2 Keepalived nodes).

Second issue is simple. ON virtualization environment (we use VMWare vSphere ESXi 5.5), gratuitous arp does not work correctly, we added a garp command to correct this behavior (keepalived work on layer2).

Steps to Reproduce:
First issue
1. Make 2 Virtual Machine with Keeaplived (BACKUP-BACKUP with no-preempt)
2. Induce a little network interruption between nodes

Second issue
1. Make 2 Virtual Machine with Keeaplived (BACKUP-BACKUP with no-preempt)
2. manually restart service on active node (service keepalived restart)


Actual results:
Firs issue
You can see non-complete transition in /var/log/messages of each node. The Active node is not passing packet for some hosts on connected networks for owned VIPs as soon as you manually restart keepalived after which all works again.

Second issue
The Active node is not passing packet for some hosts on connected networks for owned VIPs. You have to add garp command on keepalived notifications scripts.

Expected results:
Keepalived must automatically manage gratuitous arp and short network problems or short split brain or node unresponsiveness returning to normal situation

Additional info:

Comment 2 Ryan O'Hara 2014-06-17 15:48:48 UTC

*** This bug has been marked as a duplicate of bug 1077201 ***