Description of problem: This is related to bug 123342. The patch for that bug added two identical guards in activateFOSMonitors and sendLvsArps: + if (!config->failoverServices[j].isActive) + continue; + This causes us to skip inactive services so that we do not incorrectly think their VIPs are already active if they are used by other services. A similar problem exists in deactivateLvs: /* deactivate the interfaces */ for (i = 0; i < config->numVirtServers; i++) { if (config->virtServers[i].failover_service) { piranha_log (flags, (char *) "Warning; skipping failover service"); continue; /* This should not be possbile anymore */ } for (j = 0; j < i; j++) { if (!memcmp (&config->virtServers[i].virtualAddress, &config->virtServers[j].virtualAddress, sizeof (config->virtServers[i].virtualAddress))) break; } if (j == i) disableInterface (config->virtServers[i].virtualDevice, flags); } In the inner loop, we will incorrectly break and avoid deactivating the interface in the case that virtServers[j] is inactive but its virtualAddress matches the other service. This needs another check to see if virtServers[j] is inactive and continue if that is the case. This problem causes the VIP to remain active on the LVS router that is shutting down, leading to it being active on both the primary and backup router in the case of a failover. Version-Release number of selected component (if applicable): piranha-0.8.3-1 How reproducible: 100% Steps to Reproduce: 1. Create an LVS configuration with at least two virtual servers sharing a single VIP 2. Disable the first service by setting "active = 0" in lvs.cf 3. Start the pulse service on both primary & backup routers 4. VIP should start correctly on primary 5. Stop pulse on the primary router 6. Confirm that VIP has been failed over to the backup router Actual results: The VIP is active on both primary and backup LVS routers Expected results: The VIP is active only on one router at a time (the backup router in this example). Additional info: The same effect is seen when failing back to the primary by re-starting pulse on the primary router then stopping pulse on the backup router.
Created attachment 148004 [details] Add check for inactive services to pulse's deactivateLvs
Created attachment 148008 [details] example lvs.cf that reproduces the problem
Reassigning to component owner
Patch is in the CVS branch RHEL4
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0794.html