Bug 164576 - Piranha does not detect link-down events which would prevent proper operation
Piranha does not detect link-down events which would prevent proper operation
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: piranha (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
: 160103 (view as bug list)
Depends On:
Blocks: 160101 162166
  Show dependency treegraph
Reported: 2005-07-28 16:31 EDT by Lon Hohberger
Modified: 2010-10-21 23:12 EDT (History)
3 users (show)

See Also:
Fixed In Version: RHBA-2005-689
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-09-30 10:55:54 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Implementation for 0.7.10 (RHCS3) (17.33 KB, patch)
2005-07-28 16:31 EDT, Lon Hohberger
no flags Details | Diff
Implementation for 0.8.0 (RHCS4) (12.80 KB, patch)
2005-07-28 16:32 EDT, Lon Hohberger
no flags Details | Diff

  None (edit)
Description Lon Hohberger 2005-07-28 16:31:29 EDT
Description of problem:

When piranha is used with a public + private NIC, any loss of a network path on
the master director constitutes a service outage.

For example, a normal NAT piranha cluster.  The left side are NICs on the public
network, and the right side is the private network.  The primary node is active:

                        +--> rs0
   <---> primary <----> |
          |   |         +--> rs1
   <---> backup  <----> |
                        +--> rs2

If we disconnect the public NIC on the primary node, heartbeats are still being
received over the public network, and thus, no failover occurs.  This causes an

                        +--> rs0
   X---X primary <----> |
          X   |         +--> rs1
   <---> backup  <----> |
                        +--> rs2

The same thing happens if we disconnect the private NIC: heartbeats are still
being sent over the public interface, so no failover occurs.  

                        +--> rs0
   <---> primary X----X |
          |   X         +--> rs1
   <---> backup  <----> |
                        +--> rs2

This reduces the availability of the piranha director cluster.

Version-Release number of selected component (if applicable): 0.7.10, 0.8.0

How reproducible: 100%

Steps to Reproduce:
1. Create a piranha cluster with a backup.  Configure the IPs for the public
NICs and private NICs.
2. Start piranha on both nodes.
3. Unplug either the public or private NIC on the master server.
Actual results:
No failover.  Any virtual services will malfunction.

Expected results:
Failover.  Virtual services should be made available.

Additional info:
Comment 1 Lon Hohberger 2005-07-28 16:31:29 EDT
Created attachment 117241 [details]
Implementation for 0.7.10 (RHCS3)
Comment 2 Lon Hohberger 2005-07-28 16:32:23 EDT
Created attachment 117242 [details]
Implementation for 0.8.0 (RHCS4)
Comment 5 Lon Hohberger 2005-08-11 17:19:47 EDT
QA: Testing this option requires configuring piranha with two networks (one
public, one private).

- Configure a 2-node piranha cluster with 1 public and 1 private NIC each.  Both
sides must be connected to switches or hubs.  In the redundancy portion of the
GUI, do NOT check the "monitor links" box.  Note: You must manually synchronize
the configuration file.

- Start pulse on both nodes.  Wait for LVS services to come up on the master.

- Unplug the master server's public interface and wait 60 seconds.  Nothing
should happen on the backup server.

- Plug in the master server's public interface.  Unplug its private interface
and wait 60 seconds.  Again, nothing should happen on the backup server.

- Unplug the primary interface on the master and wait 60 seconds.  Because it is
fully disconnected, the backup will take over LVS services.

- Reconnect both of the master server's interfaces.  The backup will surrender
and LVS will move back to the master.

- Check the "monitor links" box in the redundancy tab of the piranha gui.

- Stop pulse on both nodes and copy the configuration so that both nodes have
the correct config (with monitor_links enabled).

- Start pulse on both nodes.

- Unplugging either the public or private NIC on the master server should cause
the backup to take over now.

Comment 6 Red Hat Bugzilla 2005-09-30 10:55:55 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

Comment 7 Lon Hohberger 2005-09-30 11:10:49 EDT
*** Bug 160103 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.