In some cases, nanny will fail in a connect attempt to a real-server when that real server is actually alive and responsive. As soon as the re-entry timeout is reached, that server will be added right back into the LVS tables. Syslog logs messages such as: Jan 21 04:02:11 lvs nanny[30387]: shutting down 192.168.1.1:80 due to connection failure Jan 21 04:02:11 lvs nanny[30387]: running command "/usr/sbin/ipvsadm" "-d" "-t" "192.168.1.1:80" "-r" "192.168.1.1" Jan 21 04:03:12 lvs nanny[30295]: making 192.168.1.1:80 available Jan 21 04:03:12 lvs nanny[30295]: running command "/usr/sbin/ipvsadm" "-a" "-t" "192.168.1.1:80" "-r" "192.168.1.1" "-m" "-w" "100" The cause: nanny creates a socket descriptor that it uses for communicating with the real-server it's supposed to be monitoring. an fcntl (on line 357 of the 0.4.17-7 package source that I have) attempts to clear the O_NONBLOCK flag set when the socket is created which should wait until all I/o has been completed on that descriptor so it can be re-used, however in at least some cases this does not actually wait until that socket is finished being used and the connect() immediately following will fail with an EISCONN error which nanny reports as a connection failure to the real-server and will remove it from IPVS tables. Turning on verbose logging with nanny seems to give the socket just enough extra time to clear while the piranha_log() functions are being called so that the failures do not happen. (Or at least happen an extremely small fraction of the number of times they do without verbose logging enabled.) However nanny's verbose logging on an LVS server with more than a couple of dozen nanny processing watching real-servers/services generates a tremendous amount of logging data which causes syslog to consume most of the available CPU time and disk space.
I am going to look into actually fixing the problem rather than play games with the timeout value.
Had a simular problem. I updated to the lastest non-beta lvs set of rpms and the problem went away and have not returned. Also had to do the kernal patch for RH 6.2.