I have a request for an enhancement to nanny.
I want to be able to specify a number, say n, so that a real server is made unavailable
only after it missed n polls instead of just one.
Are you asking this for FOS, LVS, or both? And why isn't a single failure
acceptable? It would be easy to use this feature as a method of hiding a
I was using LVS. I had real servers that will accept connection on the port of the
service even though they cannot really service the requests. The send/expect string
functionality of nanny was not of use in this situation. For this reason I set the re-entry
time to something like 20 minutes.
BUT. . .
I had a problem (at the real servers) where nanny sometimes declared a real server
dead when it, in fact, was not. And it waited 20 minutes to be made available again.
I thought that the probability of missing, say, two consecutive polls to be sufficiently low
to solve my problem. Eventually I "managed" the problem by running an additional service
on the real servers to test their states. It is not elegent but is is sufficient. When the
servers start there is a delay after the actual service starts and before the additional service
starts. The re-entry time is thus a couple of seconds again.
I had some other issues with this setup for which I needed the fwmark functionality, which
piranha did not have support for at that time. I decided to use heartbeat plus ldirectord. . .
So basically I had some issues and I thought this "enhancement" might help.
Currently I'm not using piranha, as mentioned above.
>I was using LVS. I had real servers that will accept connection on the port of
>the service even though they cannot really service the requests.
Isn't this the same as saying the service was overloaded and declaring it dead
for further connection attempts would be a good thing? Why wouldn't the correct
"fix" be to added additional servers to respond to the load?
> The send/expect string
> functionality of nanny was not of use in this situation. For this reason I
>set the re-entry
> time to something like 20 minutes.
Why not just not use a send/expect string? If successful connections was the
best test of validation, then you could have limited nanny to just doing that.
>Eventually I "managed" the problem by running an additional service
>on the real servers to test their states.
This actually sounds like a better solution than -- you are solving a response
problem by distributing the load.