Bug 1040405

Summary: Error: keepalived + ipvs with persistence: doesn't balance to other node if initial persistent node fails.
Product: [Fedora] Fedora Reporter: Jose Luis Godoy <joseluis.gms>
Component: keepalivedAssignee: Ryan O'Hara <rohara>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 19CC: b38617, matthias, rohara
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-11 20:44:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
keepalived.conf none

Description Jose Luis Godoy 2013-12-11 11:26:10 UTC
Created attachment 835232 [details]
keepalived.conf

Description of problem:

Configuring keepalived + ipvs with persistence: doesn't balance to other node if initial persistent node fails.

Version-Release number of selected component (if applicable):

keepalived-1.2.9-1.fc19.x86_64
ipvsadm-1.27-1.fc19.x86_64
/lib/modules/3.11.9-200.fc19.x86_64/kernel/net/netfilter/ipvs/ip_vs.ko

How reproducible:
Always.

Steps to Reproduce:
1. /etc/keepalived/keepalived.conf -> this is part of the configuration file:
...
virtual_server 192.168.58.10 443 {
delay_loop 5
lb_algo rr
lb_kind DR
persistence_timeout 3600
persistence_granularity 255.255.255.255
protocol TCP

sorry_server 192.168.58.200 443

real_server 192.168.58.201 443 {
    weight 1
    inhibit_on_failure
    TCP_CHECK {
        connect_timeout 3
        nb_get_retry 3
        delay_before_retry 1
    }
}
real_server 192.168.58.202 443 {
    weight 1
    inhibit_on_failure
    TCP_CHECK {
        connect_timeout 3
        nb_get_retry 3
        delay_before_retry 1
    }
}
...

2. systemctl restart keepalived

# ipvsadm -L -n --persistent-conn

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Weight PersistConn ActiveConn InActConn
-> RemoteAddress:Port
TCP 192.168.58.10:80 rr
-> 192.168.58.201:80 1 0 0 0
-> 192.168.58.202:80 1 0 0 0
TCP 192.168.58.10:443 rr persistent 3600
-> 192.168.58.201:443 1 0 0 0
-> 192.168.58.202:443 1 0 0 0

3. At first we connect to 192.168.58.10:443 four times from the same source address.

# ipvsadm -L -n --persistent-conn

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Weight PersistConn ActiveConn InActConn
  -> RemoteAddress:Port
TCP 192.168.58.10:80 rr
  -> 192.168.58.201:80 1 0 0 0
  -> 192.168.58.202:80 1 0 0 0
TCP 192.168.58.10:443 rr persistent 3600
  -> 192.168.58.201:443 1 1 0 4
  -> 192.168.58.202:443 1 0 0 0

4. We stop the persistent node 192.168.58.201:

# ipvsadm -L -n --persistent-conn

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Weight PersistConn ActiveConn InActConn
  -> RemoteAddress:Port
TCP 192.168.58.10:80 rr
  -> 192.168.58.201:80 1 0 0 0
  -> 192.168.58.202:80 1 0 0 0
TCP 192.168.58.10:443 rr persistent 3600
  -> 192.168.58.201:443 0 1 0 4
  -> 192.168.58.202:443 1 0 0 0

4. We connect again to 192.168.58.10:443 four times from the initial source address.

# ipvsadm -L -n --persistent-conn

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Weight PersistConn ActiveConn InActConn
  -> RemoteAddress:Port
TCP 192.168.58.10:80 rr
  -> 192.168.58.201:80 1 0 0 0
  -> 192.168.58.202:80 1 0 0 0
TCP 192.168.58.10:443 rr persistent 3600
  -> 192.168.58.201:443 0 1 0 4
  -> 192.168.58.202:443 1 0 0 0 <<<--- There aren't connections

Expected results:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Weight PersistConn ActiveConn InActConn
  -> RemoteAddress:Port
TCP 192.168.58.10:80 rr
  -> 192.168.58.201:80 1 0 0 0
  -> 192.168.58.202:80 1 0 0 0
TCP 192.168.58.10:443 rr persistent 3600
  -> 192.168.58.201:443 0 1 0 4
  -> 192.168.58.202:443 1 0 0 4 <<<--- We must get four connections


Additional info:

The first node is stopped, IPVS doesn't balance to the other web server node 192.168.58.202:443 and waits 3600 seconds (in this case) to balance to it. We don't know te reason to do this, in our opinion it must balance to the other web server node, we think it could be a possible bug.

Can anyone explain this behaviour or is really it a bug?

Thanks!

Jose Luis

Comment 1 Ryan O'Hara 2013-12-11 14:26:24 UTC
Did you ask upstream? This seems like it might be a problem with ipvs instead of keepalived.

Comment 2 Jose Luis Godoy 2013-12-11 14:41:21 UTC
Yes, I did: https://sourceforge.net/p/keepalived/bugs/11/

For now we don't have any answer. We think it is a problem with ipvs module and/or ipvsadm. Should we open the bug in bugzilla.kernel.org?

Thanks.

Jose Luis

Comment 3 Ryan O'Hara 2013-12-11 15:13:32 UTC
(In reply to Jose Luis Godoy from comment #2)
> Yes, I did: https://sourceforge.net/p/keepalived/bugs/11/

OK.

> For now we don't have any answer. We think it is a problem with ipvs module
> and/or ipvsadm. Should we open the bug in bugzilla.kernel.org?

Before you do that you might want to also ask on the lvs-devel mailing list. In the meantime I will try to recreate this bug and see if I can figure out what is going on.

Comment 4 Ryan O'Hara 2013-12-11 15:35:50 UTC
Can you removed "nb_get_retry" and "delay_before_retry" from your keepalived.conf? This is a long-shot, but those options aren't valid for TCP_CHECK and sometimes keepalived does strange things when it encounters bogus options.

Comment 5 Ryan O'Hara 2013-12-11 16:06:15 UTC
OK. It is the "inhibit_on_failure" option. Since this sets your real servers to be quiescent, the persistence rule remains intact when the real server fails (weight is set to zero). You have two options:

1. Remove "inhibit_on_failure" from keepalived.conf.

2. Set "expire_quiescent_template" to 1.

% echo 1 > /proc/sys/net/ipv4/vs/expire_quiescent_template

Of course this can be set permamently in your sysctl config file. This sysctil option will cause your persistence templates are expired when the server becomes quiescent.

I've tested both and they work.

Comment 6 Jose Luis Godoy 2013-12-11 16:24:20 UTC
Hi Ryan,

You are right, it works perfectly.

Many thanks.

Jose Luis