Description of problem:
I've installed a load balancer using lvs. My problem is when i am running
"ipvsadm -lcn", I can see a lot of connections with the CLOSE (or others states)
state going from 00:59 to 00:01, and then going back to 00:59. In other words
these connections should be dropped after they timed out but the counter is
reseted to 60. I wanted to compare these entries on my real servers with netstat
and I can say that these connections are not on my real servers and they should
be dropped from ip_vs_conn entries.
My connection table is growing and I'm wondering if this connections table will
not be too huge after a long time.
I use LVS with a heartbeat configuration and ldirectord and I don't use
Version-Release number of selected component:
I use the latest kernel :2.6.9-42.0.3.ELsmp
My ipvsadm version is : ipvsadm v1.24 2003/06/07 (compiled with popt and IPVS
Install ipvsadm and use wlc as distribution algorithm
Steps to Reproduce:
1."ipvsadm -lcn" and choose an entry with a low counter
2. try again "ipvsadm -lcn" and compare the counter for the same entry
Counters from "ipvsadm -lcn" go back to 60.
The entries that are not on my real servers should be removed after 60sec
There are a couple of reasons in the kernel function which would extend the
timeout beyond 60 seconds; in all cases, none of them are bugs in ipvsadm.
The most common reasons connection times are extended is:
- firewall marks
- special multi-port protocols (e.g. ftp)
I don't see anything at all in the ipvs (kernel) or ipvsadm which would explain
the behavior if none of the above are used.
I didn't see anything in the WLC scheduler, either; whatever you're seeing
shouldn't be specific to WLC.
I would add that I use LVS with direct routing.
So there is is no firewall mark nor persistence connexions and LVS is used with
smtp, http, https or pop/imap connexions (no multi-ports)
In addition, before I use a cluster lvs, I had a lvs box (on Red Hat Linux
release 7.3 with a 2.4.20-27 kernel) and there was not this kind of problem.
That's why I tought it was a bug with LVS on kernel 2.6.9
There is a thread on lvs mailing (
list but there isn't a response for this problem.
I hope someone will help me.
Good reference. FYI, I think this is a kernel bug. Here's the scoop... There
are only two ways that I can *see* that a connection would not get expired normally:
(a) The connection is a "controlling" connection. This means that it has
another associated connection with it.
(b) Unhashing fails.
(c) Reference count is not 1 (something else is reading / writing it at the time).
Since you're not using firewall marks or persistence, n_control should be 0 - so
it should not be (a). (b) shouldn't happen, so it looks like it would be (c)
being caused by a refcount leak somewhere, unless for some reason, the n_control
field isn't getting properly initialized in non-persistent cases...
fwiw, n_control is set to 0 in ip_vs_conn_new(); so that's not the problem.
Ok so if I understand, it seems there is something which reactivate the timer of
the connection (maybe this funtion: ip_vs_conn_put).
But I forgot to say that: in addition of ldirectord I use ipvsadm daemon to have
an active/passive cluster and synchronize ipvs_conn table between the 2 nodes of
my cluster. So, if there is no bug with LVS, maybe there is something with
ldirectord or ipvsadm daemon which reactivate my connection timer.
To be more precise, my direct routing's configuration is :
on the lvs active node (vip are /32 addresses) and on my real servers I use
arptables_jf solution with vip/32 addresses to.
I hope this add will help you.
Created attachment 145159 [details]
I think I may found a part of the solution regarding this thread :
The user seems to have the same problem and it solved it by using a kernel patch
provided by another guy. This patch suppress the call to __ip_vs_conn_put(cp) in
the function ip_vs_icmp_xmit of ip_vs_xmit.c file. I compared kernel source from
rhel 4.4 with sources from 188.8.131.52 fedora core's kernel. You will see that this
call has been suppressed.
So if it is the solution, is there any chance to see it in the next rhel4's kernel ?
*** This bug has been marked as a duplicate of 167398 ***