Bug 140414
Summary: | Piranha/LVS of RH Cluster Suite doesn't re-arp vip after a split-brain scenario | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Lon Hohberger <lhh> | ||||||
Component: | piranha | Assignee: | Lon Hohberger <lhh> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3 | CC: | cluster-maint, mikem, van.okamura | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2004-11-23 14:56:13 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Lon Hohberger
2004-11-22 20:27:46 UTC
Created attachment 107236 [details]
Patch fixing gratuitous ARP problem after scenario
This patch fixes piranha so that gratuitous ARPs are sent after a split brain.
There is also a race condition, even with the above patch applied. It can be visualized in the following manner: Working case: Master Backup active ------> <------ inact active ------> <------ inact active -> Z Z <- inact active -> Z Z <- inact active -> Z (activate / send arps) Z <- active active -> Z arp... <------ active active ------> (shuts down) <------ inact Non-working case: the master doesn't send a new gratuitous ARP after the split-brain: Master Backup active ------> <------ inact active ------> <------ inact active -> Z Z <- inact active -> Z Z <- inact active -> Z (activate / send arps) Z <- active active -> Z Z <- active active ------> (shuts down) <------ inact So, we need to ensure that the backup always notifies the master that it is or was the master so that the master sends gratuitous arps for the virtual IP address. This can be done by using a "dying breath" heartbeat packet. When backup receives a heartbeat from the master, it can send one last ACTIVE packet prior to shutting down LVS services. This can cause the master to send gratuitous arps twice in some cases, but this should not cause problems: Master Backup <------ inact active ------> <------ inact active -> Z Z <- inact active -> Z Z <- inact active -> Z (activate / send arps) Z <- active active -> Z arp... <------ active active ------> } <------ dying "ACTIVE" packet } arp sent a second time arp... (shuts down) } <------ inact active ------> Created attachment 107238 [details]
Patch fixing race condition
The patch ensures that in the non-working case that at least one gratuitous ARP is sent for the virtual IP address; the fact that two are sent in the working case is a side effect. There are other ways to go about this; however, this is the simplest. One way to reproduce the original split-brain is to simply 'killall -STOP pulse' on the master while monitoring /var/log/messages on the backup. When the backup takes over and sends ARPs, type 'killall -CONT pulse' on the master. Sometimes it will send arps; other times it won't. Correction: The previous comment is how to recreate the race condition with relative ease. This will go out as an errata in the next update. |