Description of problem: When piranha is used with a public + private NIC, any loss of a network path on the master director constitutes a service outage. For example, a normal NAT piranha cluster. The left side are NICs on the public network, and the right side is the private network. The primary node is active: +--> rs0 <---> primary <----> | | | +--> rs1 <---> backup <----> | +--> rs2 If we disconnect the public NIC on the primary node, heartbeats are still being received over the public network, and thus, no failover occurs. This causes an outage. +--> rs0 X---X primary <----> | X | +--> rs1 <---> backup <----> | +--> rs2 The same thing happens if we disconnect the private NIC: heartbeats are still being sent over the public interface, so no failover occurs. +--> rs0 <---> primary X----X | | X +--> rs1 <---> backup <----> | +--> rs2 This reduces the availability of the piranha director cluster. Version-Release number of selected component (if applicable): 0.7.10, 0.8.0 How reproducible: 100% Steps to Reproduce: 1. Create a piranha cluster with a backup. Configure the IPs for the public NICs and private NICs. 2. Start piranha on both nodes. 3. Unplug either the public or private NIC on the master server. Actual results: No failover. Any virtual services will malfunction. Expected results: Failover. Virtual services should be made available. Additional info:
Created attachment 117241 [details] Implementation for 0.7.10 (RHCS3)
Created attachment 117242 [details] Implementation for 0.8.0 (RHCS4)
QA: Testing this option requires configuring piranha with two networks (one public, one private). - Configure a 2-node piranha cluster with 1 public and 1 private NIC each. Both sides must be connected to switches or hubs. In the redundancy portion of the GUI, do NOT check the "monitor links" box. Note: You must manually synchronize the configuration file. - Start pulse on both nodes. Wait for LVS services to come up on the master. - Unplug the master server's public interface and wait 60 seconds. Nothing should happen on the backup server. - Plug in the master server's public interface. Unplug its private interface and wait 60 seconds. Again, nothing should happen on the backup server. - Unplug the primary interface on the master and wait 60 seconds. Because it is fully disconnected, the backup will take over LVS services. - Reconnect both of the master server's interfaces. The backup will surrender and LVS will move back to the master. - Check the "monitor links" box in the redundancy tab of the piranha gui. - Stop pulse on both nodes and copy the configuration so that both nodes have the correct config (with monitor_links enabled). - Start pulse on both nodes. - Unplugging either the public or private NIC on the master server should cause the backup to take over now.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-689.html
*** Bug 160103 has been marked as a duplicate of this bug. ***