Description of problem: I've installed a load balancer using lvs. My problem is when i am running "ipvsadm -lcn", I can see a lot of connections with the CLOSE (or others states) state going from 00:59 to 00:01, and then going back to 00:59. In other words these connections should be dropped after they timed out but the counter is reseted to 60. I wanted to compare these entries on my real servers with netstat and I can say that these connections are not on my real servers and they should be dropped from ip_vs_conn entries. My connection table is growing and I'm wondering if this connections table will not be too huge after a long time. I use LVS with a heartbeat configuration and ldirectord and I don't use persistent connections. Version-Release number of selected component: I use the latest kernel :2.6.9-42.0.3.ELsmp My ipvsadm version is : ipvsadm v1.24 2003/06/07 (compiled with popt and IPVS v1.2.0) How reproducible: Install ipvsadm and use wlc as distribution algorithm Steps to Reproduce: 1."ipvsadm -lcn" and choose an entry with a low counter 2. try again "ipvsadm -lcn" and compare the counter for the same entry Actual results: Counters from "ipvsadm -lcn" go back to 60. Expected results: The entries that are not on my real servers should be removed after 60sec
There are a couple of reasons in the kernel function which would extend the timeout beyond 60 seconds; in all cases, none of them are bugs in ipvsadm. The most common reasons connection times are extended is: - firewall marks - persistence - special multi-port protocols (e.g. ftp) I don't see anything at all in the ipvs (kernel) or ipvsadm which would explain the behavior if none of the above are used. I didn't see anything in the WLC scheduler, either; whatever you're seeing shouldn't be specific to WLC.
I would add that I use LVS with direct routing. So there is is no firewall mark nor persistence connexions and LVS is used with smtp, http, https or pop/imap connexions (no multi-ports) In addition, before I use a cluster lvs, I had a lvs box (on Red Hat Linux release 7.3 with a 2.4.20-27 kernel) and there was not this kind of problem. That's why I tought it was a bug with LVS on kernel 2.6.9 There is a thread on lvs mailing ( http://marc.theaimsgroup.com/?l=linux-virtual-server&m=116476566020553&w=2 ) list but there isn't a response for this problem. I hope someone will help me. Thanks.
Good reference. FYI, I think this is a kernel bug. Here's the scoop... There are only two ways that I can *see* that a connection would not get expired normally: net/ipv4/ipvs/ip_vs_conn.c:ip_vs_conn_expire: (a) The connection is a "controlling" connection. This means that it has another associated connection with it. (b) Unhashing fails. (c) Reference count is not 1 (something else is reading / writing it at the time). Since you're not using firewall marks or persistence, n_control should be 0 - so it should not be (a). (b) shouldn't happen, so it looks like it would be (c) being caused by a refcount leak somewhere, unless for some reason, the n_control field isn't getting properly initialized in non-persistent cases...
fwiw, n_control is set to 0 in ip_vs_conn_new(); so that's not the problem.
Ok so if I understand, it seems there is something which reactivate the timer of the connection (maybe this funtion: ip_vs_conn_put). But I forgot to say that: in addition of ldirectord I use ipvsadm daemon to have an active/passive cluster and synchronize ipvs_conn table between the 2 nodes of my cluster. So, if there is no bug with LVS, maybe there is something with ldirectord or ipvsadm daemon which reactivate my connection timer. To be more precise, my direct routing's configuration is : on the lvs active node (vip are /32 addresses) and on my real servers I use arptables_jf solution with vip/32 addresses to. I hope this add will help you.
Created attachment 145159 [details] remove __ip_vs_conn_put(cp)
I think I may found a part of the solution regarding this thread : http://marc.theaimsgroup.com/?l=linux-virtual-server&m=111494344303632&w=2 The user seems to have the same problem and it solved it by using a kernel patch provided by another guy. This patch suppress the call to __ip_vs_conn_put(cp) in the function ip_vs_icmp_xmit of ip_vs_xmit.c file. I compared kernel source from rhel 4.4 with sources from 2.6.9.15 fedora core's kernel. You will see that this call has been suppressed. So if it is the solution, is there any chance to see it in the next rhel4's kernel ? Thanks you.
*** This bug has been marked as a duplicate of 167398 ***