The piranha-0.4.16-3 and 0.4.17-2 packages both have file handle leaks. When pulse is started, the number of allocated/used filehandles as reported in /proc/sys/fs/file-nr jumps to a hugh number of file handles, the more services you have in /etc/lvs.cf, the more file handles get allocated. On our production LVS systems, they may slowly "drain out" over the course of a few days, but can immediately crash the machine if file-max is not set very large, and in our LVS test setup, both our LVS boxes crashed over the weekend due to hitting file-max limits while running (number of file-handles used/allocated _increased_ over the weekend during normal testing use.) Piranha 0.4.16-7 did not appear to have this leaky problem, but refused to start up the nat-router alias/VIP automatically.
I'm in the process of testing the updates to 0.4.17-3 and I hope to have these up tonight in the experimental ftp area as getting it into production takes about a week of jumping up and down on the HA frame with various tests. The main changes being that you can now stipulate the netmasks being used on the external VIP and the internal NAT device You should be able to find this shortly from. ftp://people.redhat.com/kbarrett/HA/experimental/ I had a quick lok at the file handles problem you mentioned but wasn't able to get anything beyond the following [root@ha6 piranha]# cat /proc/sys/fs/file-nr 763 271 4096 when I placed the system under extreme load (basically saturated the full 100Mbit in duplex mode) Phil =--=
Note that these symptoms only occur for us when there are active primary and backup LVS machines setup using "piranha." This site has 27 active "virtual" tags in its /etc/lvs.cf and was started fresh this afternoon with RedHat 6.2 and HA packages applied with piranha-0.4.17-2 (and it's mostly idle): # cat /proc/sys/fs/file-nr 18818 15088 262144 our testbed server(s) with identical software installs but only two active services (We've just moved into new facilities, so we don't have addressing/services set up quite yet for me to pound on our test servers the way we are on the production servers. I'd like to dump a bunch of services onto them as well and see what the behaviour is.) were rebooted this afternoon and show: # cat /proc/sys/fs/file-nr 832 0 4096 I *am* a little confused why the two behave so differently... Our production server will jump into the 10's of thousands of handles immediately upon starting 'pulse' which will slowly "drain away" while our test servers will slowly increase the amount of filehandles they use at any given time until they feel like crashing, but the test servers seem to de-allocate them successfully when pulse/nanny is done with them (as long as they don't crash from trying to allocate more than file-max.) Both piranha-0.4.16-3 and 0.4.17-2 have exactly the same symptoms. 0.4.16-7 didn't seem to have any problems with file-handle leaks or inordinate usage, but wouldn't bring up the nat-router VIP for us.
Ok,. thats the latest RPMS up for all architctures.. lets see how that behaves I've started putting on extra servers on our HA rack system here to see if I can duplicate your problem. Lets see what happens Phil =--=
I forgot to mention that the previous version that didn't bring up the other interfaces was because of a NULL being passed into execve() which caused ifconfig not to be called. Appologies Please try the 0.4.17-3 rpm and let me know if this is good for you or of further problems still exist ftp://people.redhat.com/kbarrett/HA/experimental/ Thanks Phil =--=
0.4.17-3 seems to be much better behaved than 0.4.17-2 was, more like 0.4.16-7's behaviour: # cat /proc/sys/fs/file-nr 855 24 262144 with plenty of lvs virtual services and about a hundred portfw rules installed. However 0.4.17-2 had some sort of catastrophic meltdown last night that caused the VM system to start randomly killing processes on both boxes, so both the primary and backup machines were down when we came in this morning. I'll have to keep an eye on this latest code for the next couple of days to see if it has any problems like that.
Please let us know what happens. We have not seen this behavior.
piranha-0.4.17-4 is available to play with I can't create this file handles problem with it and you get a few extra features you'd probably want. ftp://people.redhat.com/kbarrett/HA/experimental/ Phil =--=