Bug 1040405
Summary: | Error: keepalived + ipvs with persistence: doesn't balance to other node if initial persistent node fails. | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jose Luis Godoy <joseluis.gms> | ||||
Component: | keepalived | Assignee: | Ryan O'Hara <rohara> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 19 | CC: | b38617, matthias, rohara | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-12-11 20:44:00 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Did you ask upstream? This seems like it might be a problem with ipvs instead of keepalived. Yes, I did: https://sourceforge.net/p/keepalived/bugs/11/ For now we don't have any answer. We think it is a problem with ipvs module and/or ipvsadm. Should we open the bug in bugzilla.kernel.org? Thanks. Jose Luis (In reply to Jose Luis Godoy from comment #2) > Yes, I did: https://sourceforge.net/p/keepalived/bugs/11/ OK. > For now we don't have any answer. We think it is a problem with ipvs module > and/or ipvsadm. Should we open the bug in bugzilla.kernel.org? Before you do that you might want to also ask on the lvs-devel mailing list. In the meantime I will try to recreate this bug and see if I can figure out what is going on. Can you removed "nb_get_retry" and "delay_before_retry" from your keepalived.conf? This is a long-shot, but those options aren't valid for TCP_CHECK and sometimes keepalived does strange things when it encounters bogus options. OK. It is the "inhibit_on_failure" option. Since this sets your real servers to be quiescent, the persistence rule remains intact when the real server fails (weight is set to zero). You have two options: 1. Remove "inhibit_on_failure" from keepalived.conf. 2. Set "expire_quiescent_template" to 1. % echo 1 > /proc/sys/net/ipv4/vs/expire_quiescent_template Of course this can be set permamently in your sysctl config file. This sysctil option will cause your persistence templates are expired when the server becomes quiescent. I've tested both and they work. Hi Ryan, You are right, it works perfectly. Many thanks. Jose Luis |
Created attachment 835232 [details] keepalived.conf Description of problem: Configuring keepalived + ipvs with persistence: doesn't balance to other node if initial persistent node fails. Version-Release number of selected component (if applicable): keepalived-1.2.9-1.fc19.x86_64 ipvsadm-1.27-1.fc19.x86_64 /lib/modules/3.11.9-200.fc19.x86_64/kernel/net/netfilter/ipvs/ip_vs.ko How reproducible: Always. Steps to Reproduce: 1. /etc/keepalived/keepalived.conf -> this is part of the configuration file: ... virtual_server 192.168.58.10 443 { delay_loop 5 lb_algo rr lb_kind DR persistence_timeout 3600 persistence_granularity 255.255.255.255 protocol TCP sorry_server 192.168.58.200 443 real_server 192.168.58.201 443 { weight 1 inhibit_on_failure TCP_CHECK { connect_timeout 3 nb_get_retry 3 delay_before_retry 1 } } real_server 192.168.58.202 443 { weight 1 inhibit_on_failure TCP_CHECK { connect_timeout 3 nb_get_retry 3 delay_before_retry 1 } } ... 2. systemctl restart keepalived # ipvsadm -L -n --persistent-conn IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Weight PersistConn ActiveConn InActConn -> RemoteAddress:Port TCP 192.168.58.10:80 rr -> 192.168.58.201:80 1 0 0 0 -> 192.168.58.202:80 1 0 0 0 TCP 192.168.58.10:443 rr persistent 3600 -> 192.168.58.201:443 1 0 0 0 -> 192.168.58.202:443 1 0 0 0 3. At first we connect to 192.168.58.10:443 four times from the same source address. # ipvsadm -L -n --persistent-conn IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Weight PersistConn ActiveConn InActConn -> RemoteAddress:Port TCP 192.168.58.10:80 rr -> 192.168.58.201:80 1 0 0 0 -> 192.168.58.202:80 1 0 0 0 TCP 192.168.58.10:443 rr persistent 3600 -> 192.168.58.201:443 1 1 0 4 -> 192.168.58.202:443 1 0 0 0 4. We stop the persistent node 192.168.58.201: # ipvsadm -L -n --persistent-conn IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Weight PersistConn ActiveConn InActConn -> RemoteAddress:Port TCP 192.168.58.10:80 rr -> 192.168.58.201:80 1 0 0 0 -> 192.168.58.202:80 1 0 0 0 TCP 192.168.58.10:443 rr persistent 3600 -> 192.168.58.201:443 0 1 0 4 -> 192.168.58.202:443 1 0 0 0 4. We connect again to 192.168.58.10:443 four times from the initial source address. # ipvsadm -L -n --persistent-conn IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Weight PersistConn ActiveConn InActConn -> RemoteAddress:Port TCP 192.168.58.10:80 rr -> 192.168.58.201:80 1 0 0 0 -> 192.168.58.202:80 1 0 0 0 TCP 192.168.58.10:443 rr persistent 3600 -> 192.168.58.201:443 0 1 0 4 -> 192.168.58.202:443 1 0 0 0 <<<--- There aren't connections Expected results: IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Weight PersistConn ActiveConn InActConn -> RemoteAddress:Port TCP 192.168.58.10:80 rr -> 192.168.58.201:80 1 0 0 0 -> 192.168.58.202:80 1 0 0 0 TCP 192.168.58.10:443 rr persistent 3600 -> 192.168.58.201:443 0 1 0 4 -> 192.168.58.202:443 1 0 0 4 <<<--- We must get four connections Additional info: The first node is stopped, IPVS doesn't balance to the other web server node 192.168.58.202:443 and waits 3600 seconds (in this case) to balance to it. We don't know te reason to do this, in our opinion it must balance to the other web server node, we think it could be a possible bug. Can anyone explain this behaviour or is really it a bug? Thanks! Jose Luis