Bug 220149
Summary: | ipvs connections entries not dropped | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | saveline <aveseb> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.4 | ||||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-02-01 15:06:22 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
saveline
2006-12-19 09:34:21 UTC
There are a couple of reasons in the kernel function which would extend the timeout beyond 60 seconds; in all cases, none of them are bugs in ipvsadm. The most common reasons connection times are extended is: - firewall marks - persistence - special multi-port protocols (e.g. ftp) I don't see anything at all in the ipvs (kernel) or ipvsadm which would explain the behavior if none of the above are used. I didn't see anything in the WLC scheduler, either; whatever you're seeing shouldn't be specific to WLC. I would add that I use LVS with direct routing. So there is is no firewall mark nor persistence connexions and LVS is used with smtp, http, https or pop/imap connexions (no multi-ports) In addition, before I use a cluster lvs, I had a lvs box (on Red Hat Linux release 7.3 with a 2.4.20-27 kernel) and there was not this kind of problem. That's why I tought it was a bug with LVS on kernel 2.6.9 There is a thread on lvs mailing ( http://marc.theaimsgroup.com/?l=linux-virtual-server&m=116476566020553&w=2 ) list but there isn't a response for this problem. I hope someone will help me. Thanks. Good reference. FYI, I think this is a kernel bug. Here's the scoop... There are only two ways that I can *see* that a connection would not get expired normally: net/ipv4/ipvs/ip_vs_conn.c:ip_vs_conn_expire: (a) The connection is a "controlling" connection. This means that it has another associated connection with it. (b) Unhashing fails. (c) Reference count is not 1 (something else is reading / writing it at the time). Since you're not using firewall marks or persistence, n_control should be 0 - so it should not be (a). (b) shouldn't happen, so it looks like it would be (c) being caused by a refcount leak somewhere, unless for some reason, the n_control field isn't getting properly initialized in non-persistent cases... fwiw, n_control is set to 0 in ip_vs_conn_new(); so that's not the problem. Ok so if I understand, it seems there is something which reactivate the timer of the connection (maybe this funtion: ip_vs_conn_put). But I forgot to say that: in addition of ldirectord I use ipvsadm daemon to have an active/passive cluster and synchronize ipvs_conn table between the 2 nodes of my cluster. So, if there is no bug with LVS, maybe there is something with ldirectord or ipvsadm daemon which reactivate my connection timer. To be more precise, my direct routing's configuration is : on the lvs active node (vip are /32 addresses) and on my real servers I use arptables_jf solution with vip/32 addresses to. I hope this add will help you. Created attachment 145159 [details]
remove __ip_vs_conn_put(cp)
I think I may found a part of the solution regarding this thread : http://marc.theaimsgroup.com/?l=linux-virtual-server&m=111494344303632&w=2 The user seems to have the same problem and it solved it by using a kernel patch provided by another guy. This patch suppress the call to __ip_vs_conn_put(cp) in the function ip_vs_icmp_xmit of ip_vs_xmit.c file. I compared kernel source from rhel 4.4 with sources from 2.6.9.15 fedora core's kernel. You will see that this call has been suppressed. So if it is the solution, is there any chance to see it in the next rhel4's kernel ? Thanks you. |