Hi, I have a request for an enhancement to nanny. I want to be able to specify a number, say n, so that a real server is made unavailable only after it missed n polls instead of just one. Thanks Tinus
Are you asking this for FOS, LVS, or both? And why isn't a single failure acceptable? It would be easy to use this feature as a method of hiding a different problem.
Hi, I was using LVS. I had real servers that will accept connection on the port of the service even though they cannot really service the requests. The send/expect string functionality of nanny was not of use in this situation. For this reason I set the re-entry time to something like 20 minutes. BUT. . . I had a problem (at the real servers) where nanny sometimes declared a real server dead when it, in fact, was not. And it waited 20 minutes to be made available again. I thought that the probability of missing, say, two consecutive polls to be sufficiently low to solve my problem. Eventually I "managed" the problem by running an additional service on the real servers to test their states. It is not elegent but is is sufficient. When the servers start there is a delay after the actual service starts and before the additional service starts. The re-entry time is thus a couple of seconds again. I had some other issues with this setup for which I needed the fwmark functionality, which piranha did not have support for at that time. I decided to use heartbeat plus ldirectord. . . So basically I had some issues and I thought this "enhancement" might help. Currently I'm not using piranha, as mentioned above. Cheers Tinus
>I was using LVS. I had real servers that will accept connection on the port of >the service even though they cannot really service the requests. Isn't this the same as saying the service was overloaded and declaring it dead for further connection attempts would be a good thing? Why wouldn't the correct "fix" be to added additional servers to respond to the load? > The send/expect string > functionality of nanny was not of use in this situation. For this reason I >set the re-entry > time to something like 20 minutes. Why not just not use a send/expect string? If successful connections was the best test of validation, then you could have limited nanny to just doing that. >Eventually I "managed" the problem by running an additional service >on the real servers to test their states. This actually sounds like a better solution than -- you are solving a response problem by distributing the load.