Bug 1389720 - keepalived ipfailover cannot monitor service
Summary: keepalived ipfailover cannot monitor service
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Phil Cameron
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-28 10:09 UTC by Meng Bo
Modified: 2022-08-04 22:20 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-01-18 12:47:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0066 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 17:23:26 UTC

Description Meng Bo 2016-10-28 10:09:35 UTC
Description of problem:
Try to deploy the keepalived ipfailover to monitor the service and make the service high available by following the guide in https://docs.openshift.org/latest/admin_guide/high_availability.html#ip-failover

The ipfailover failed to monitor the service port since it will not bind the container port to host port for containers.

Version-Release number of selected component (if applicable):
openshift v3.4.0.16+cc70b72
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0
docker-common-1.12.1-7.el7.x86_64
openshift3/ose-keepalived-ipfailover                                       v3.4.0.16           fe424a5e49e2

How reproducible:
always

Steps to Reproduce:
1. Create HA network service by following the guide in https://docs.openshift.org/latest/admin_guide/high_availability.html#ip-failover
2. Check the VIPs on the node
3. Check the listening ports on the node

Actual results:
2. No VIP will be assigned to the nodes.
3. The hostport specified in the pod json file will not be bound to the host.


Expected results:
The ipfailover should work well for service. 

Additional info:

My steps:
# oadm policy add-scc-to-user privileged -z default
# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/ha-network-service.json
# oadm policy add-scc-to-user privileged -z ipfailover
# oadm ipfailover ipf --create --selector=ha-service=ha --virtual-ips=10.66.140.105-106 --watch-port=9736 --replicas=2 --service-account=ipfailover



Check the ip and port on the node:
# ip a s eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:b9:4e:04 brd ff:ff:ff:ff:ff:ff
    inet 10.66.140.165/23 brd 10.66.141.255 scope global dynamic eth0
       valid_lft 73729sec preferred_lft 73729sec
    inet6 fe80::5054:ff:feb9:4e04/64 scope link 
       valid_lft forever preferred_lft forever
# netstat -vantp | grep 9736
#
# iptables -nL -t nat | grep 9736
KUBE-HP-DGSQR537BJBDRN72  tcp  --  0.0.0.0/0            0.0.0.0/0            /* ha-service-cjb5c_default hostport 9736 */ tcp dpt:9736
KUBE-MARK-MASQ  all  --  10.128.0.9           0.0.0.0/0            /* ha-service-cjb5c_default hostport 9736 */
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            /* ha-service-cjb5c_default hostport 9736 */ tcp to:10.128.0.9:8080
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/ha-service: */ tcp to:10.129.0.15:9736
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/ha-service: */ tcp to:10.128.0.9:9736
KUBE-SVC-34IH27QAS73674C2  tcp  --  0.0.0.0/0            172.30.126.234       /* default/ha-service: cluster IP */ tcp dpt:9736



Check the ipfailover pod:
# oc logs ipf-1-5eaag
  - Loading ip_vs module ...
  - Checking if ip_vs module is available ...
ip_vs_rr               12600  0
ip_vs                 141092  2 ip_vs_rr
  - Module ip_vs is loaded.
  - check for iptables rule for keepalived multicast (224.0.0.18) ...
  - Generating and writing config to /etc/keepalived/keepalived.conf
  - Starting failover services ...
Starting Healthcheck child process, pid=142
Starting VRRP child process, pid=143
Initializing ipvs 2.6
Netlink reflector reports IP 10.66.140.165 added
Netlink reflector reports IP 10.66.140.165 added
Netlink reflector reports IP 10.128.0.1 added
Netlink reflector reports IP fe80::5054:ff:feb9:4e04 added
Netlink reflector reports IP fe80::20bf:dbff:fedb:a496 added
Netlink reflector reports IP fe80::428:68ff:fec2:ef6b added
Registering Kernel netlink reflector
Registering Kernel netlink command channel
Registering gratuitous ARP shared channel
Opening file '/etc/keepalived/keepalived.conf'.
Netlink reflector reports IP 10.128.0.1 added
Netlink reflector reports IP fe80::5054:ff:feb9:4e04 added
Netlink reflector reports IP fe80::20bf:dbff:fedb:a496 added
Netlink reflector reports IP fe80::428:68ff:fec2:ef6b added
Registering Kernel netlink reflector
Registering Kernel netlink command channel
Configuration is using : 73196 Bytes
Using LinkWatch kernel netlink reflector...
Opening file '/etc/keepalived/keepalived.conf'.
VRRP_Instance(ipf_VIP_1) Entering BACKUP STATE
VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(9,10)]
Configuration is using : 8409 Bytes
Using LinkWatch kernel netlink reflector...
VRRP_Instance(ipf_VIP_2) Now in FAULT state
Process [144] didn't respond to SIGTERM
VRRP_Instance(ipf_VIP_1) Now in FAULT state
VRRP_Group(group_ipf) Syncing instances to FAULT state
Process [148] didn't respond to SIGTERM
Process [154] didn't respond to SIGTERM
Process [171] didn't respond to SIGTERM
Process [207] didn't respond to SIGTERM

# oc exec ipf-1-5eaag -- bash -c "ps -ef"
UID         PID   PPID  C STIME TTY          TIME CMD
root          1      0  0 05:51 ?        00:00:00 /bin/bash /var/lib/ipfailover/keepalived/monitor.sh
root        141      1  0 05:51 ?        00:00:00 /usr/sbin/keepalived -D -n --log-console
root        142    141  0 05:51 ?        00:00:00 /usr/sbin/keepalived -D -n --log-console
root        143    141  0 05:51 ?        00:00:00 /usr/sbin/keepalived -D -n --log-console
root        431    143  6 05:55 ?        00:00:00 /usr/sbin/keepalived -D -n --log-console
root        432    431  0 05:55 ?        00:00:00 sh -c </dev/tcp/10.66.140.165/9736
root        433      0  0 05:55 ?        00:00:00 ps -ef

Comment 1 Phil Cameron 2016-10-28 18:58:37 UTC
See openshift-docs PR 3051 for a more recent description of high availability.

From what you show this appears to be working properly. 
There should be two nodes involved here but you only report on one of them.
The ipfailover log indicates a FAULT state which means the watch port (9736) is not open. The </dev/tcp/10.66.140.165/9736 fails.
There is no VIP set up on eth0 which is expected. This assumes eth0 is the host NIC.

The two VIPS=10.66.140.105-106 may be served on the same or different nodes.

The vip will be assigned on the host interface (not the pod since these are externally visible IP addresses)

For example, I have a test setup with 2 nodes (netdev28 and netdev35) serving 2 VIPS=10.250.2.101-102
In my case, at present both are on the same node:

netdev28:
# ip a s em3
2: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether ec:f4:bb:dc:03:74 brd ff:ff:ff:ff:ff:ff
    inet 10.19.17.22/24 brd 10.19.17.255 scope global dynamic em3
       valid_lft 77908sec preferred_lft 77908sec
    inet 10.250.2.101/32 scope global em3             <<<< one VIP
       valid_lft forever preferred_lft forever
    inet 10.250.2.102/32 scope global em3             <<<< other VIP
       valid_lft forever preferred_lft forever

netdev35:
# ip a s em3
2: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether ec:f4:bb:db:fc:14 brd ff:ff:ff:ff:ff:ff
    inet 10.19.17.36/24 brd 10.19.17.255 scope global dynamic em3
       valid_lft 76620sec preferred_lft 76620sec

==============
 

Please run and send results:
# oc rsh ipf-1-5eaag cat /etc/keepalived/keepalived.conf
Also, run on the other node.

Provide log for other node

please run:
# ip a 
on both nodes

Please verify that the 
rc/ha-service does have 2 pods up and running. The pods need to be running on at least 1 of the ipfailover nodes.

Comment 2 Meng Bo 2016-10-31 07:10:29 UTC
Tried again with openshift build v3.4.0.17 and cannot reproduce the bug.

Not sure what happened last week. Close the bug for now, will reopen it if I can reproduce next time.

Comment 3 Meng Bo 2016-11-02 11:30:57 UTC
@Phil 
I found some more things on this bug.

I am using the following script to test this.

$ cat ipfailover.sh
#!/bin/bash
node1=ose-node1.bmeng.local
node2=ose-node2.bmeng.local
ip="10.66.140.105-106"

# check ips
for i in 105 106 ; do ping -c1 10.66.140.$i ; if [ $? -ne 1 ] ; then exit ; fi ; done

# add labels to node
oc label node $node1 ha-service=ha --overwrite
oc label node $node2 ha-service=ha --overwrite

# create router on each node
oadm policy add-scc-to-user privileged -z default
oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/ha-network-service.json

# wait the routers are running
while [ `oc get pod | grep ha | grep Running | wc -l` -lt 2 ] ; do ; sleep 5 ; done

for i in $node1 $node2 ;do curl $i:9736 ; done

# create ipfailover for each router
oadm policy add-scc-to-user privileged -z ipfailover
oadm ipfailover ipf --create --selector=ha-service=ha --virtual-ips=${ip} --watch-port=9736 --replicas=2 --service-account=ipfailover --interface=eth0

# wait the keepaliveds are running
while [ `oc get pod | grep ipf | grep -v deploy | grep Running | wc -l` -lt 2 ] ; do ; sleep 5 ; done

for i in $node1 $node2 ;do curl $i:9736 ; done





If I access the pods via the host port before I create the ipfailover, then it will work well. But if I don't run the `for i in $node1 $node2 ;do curl $i:9736 ; done` on the first time, then the last line will also fail.

Comment 4 Phil Cameron 2016-11-03 19:36:32 UTC
@Meng

Some thoughts here. I am not entirely sure about what is going on. 

The json seems to create a service with port 9736 with hello-openshift pods 

The ipfailover creates the VIPS 10.66.140.105-106 and the 9736 watch port. The node IPs should be different.

You can look at the ipfailover pod logs and see which nodes are master for the two VIPs. "ip a" should show the VIP on the desired interface on each node. The logs you initially reported showed both VIPs in FAULT state. This means they didn't port 9736 open.

Once "ip a" shows the VIP on an interface, the curl should work.
I think the curl should target 10.66.140.105:9736 and 10.66.140.106:9736

The service should have ExternalIP set for the VIPs as well. This may be the problem. There is also the nodePort which may need to be set. See
http://kubernetes.io/docs/user-guide/services/#ips-and-vips

I am going to experiment with this and get back to you later.

Comment 5 Phil Cameron 2016-11-08 19:13:02 UTC
@meng The service must have  externalIPs set to the VIPs for this to work.

keepalived is not finding a open port on the VIPs so the VIPs are in a FAULT state.

phil

Comment 6 Ben Bennett 2016-11-08 21:11:41 UTC
Phil the problem is probably that the target port needs to be the nodeport... the external IP only works if the target is actually the IP address, and this doesn't do that.

I think I cc'ed you on an email about this recently.  We need to work out a clean way to do this without requiring each service to have a failover configured directly.  Ideally we just test that the node is still live and rely on the service proxy to do the rest... not sure how to do that yet.

Comment 7 Phil Cameron 2016-11-08 21:46:58 UTC
Ben, I didn't read the kubernetes doc that way. I think there needs to be an externalIPs:<list of vips>
in the service's definition.

When the user wants ipfailover to front a service, the VIPs must be set in the ipfailover configuration and in the service's externalIPs: list. The ipfailover watch port must be the same as the service's spec.port. That's how they match up.

The doc says that the spec.port is exposed on the clusterIP and the externalIPs. This is what we need here. keepalived watch port is the service's spec.port and it is checking that the spec.port is open.

In Meng's case the port is not exposed on the VIPs and keepalived is in FAULT state.

The node port is a different way of doing this. There are caveats about assigning the node port. Both ways should be tested.

Look in the updated docs PR 3051, I just wrote about this.

Comment 8 Meng Bo 2016-11-09 02:46:58 UTC
The example file https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/ha-network-service.json I used in my step, it has the same contents with the one in the ipfailover doc geo-cache.json with just replaced the image name.

If it requires a nodePort or externalIP to be set. We should update the example we provided.

But the same json file works well in the previous testing before 3.4.

Comment 9 Ben Bennett 2016-11-09 17:54:07 UTC
I don't see an update to https://github.com/openshift/openshift-docs/pull/3051

Comment 10 Phil Cameron 2016-11-10 14:19:13 UTC
Ben, w.r.t. the doc update, I made a mistake with github and will fix it.

Ben and I are discussing this and I am reproducing the test on my setup. It looks like the documentation needs to address how to do this and I am working on the doc changes. At present, I don't think there is a bug here. I think the test/setup needs to be changed. I also think we need to document how to properly set this up to get it working.

More later...

Comment 11 Ben Bennett 2016-11-10 21:30:14 UTC
Phil is right, there either needs to be an ExternalIP or a NodePort on the service definition.  If you do not set that, traffic can not get in to the service.

Phil has a docs PR open to get that clarified now.

Comment 12 Weibin Liang 2016-11-11 15:35:46 UTC
When I follow Mengbo's above steps, I got exact same results as he did.

In current ipfailover doc
https://docs.openshift.org/latest/admin_guide/high_availability.html#ip-failover,
it look like we support two different ways to use ipfailover, first case is routing service which need install router, second case is network service which is using hostport setup in service.

I think the geo-cache.json from doc and Mengbo also used is using hostport which Kubernets recommended not to use it. 
http://kubernetes.io/docs/user-guide/config-best-practices/
http://kubernetes.io/docs/user-guide/application-troubleshooting/

Mengbo above steps make hostport work only when he do "for i in $node1 $node2 ;do curl $i:9736 ; done" before he create ipfailover instances, we may need decide if this is a bug or we should not support hostport according to Kubernets recommendations.

For https://github.com/openshift/openshift-docs/pull/3051, I am doing some sanity testing now:

Case1: routing service which need install router: passed
Case2: network service with hostport setup: failed
Case3: ipfailover with manaul externalIP: testing now
Case4: ipfailover with Maul externalIP: will test
Case5: ipfailover with nodeport: will test

Comment 13 Phil Cameron 2016-11-11 18:34:11 UTC
The new docs PR 3051 goes into more detail. PTAL

Comment 14 Weibin Liang 2016-11-11 19:58:13 UTC
@bmeng, if you change --watch-port=9736 to --watch-port=80 and replicas three routers (you did not have in your steps), then your test case should pass even curl use hostport.

Here is my passed testing results:

ip="10.18.41.250-252"
oc new-project pro-ipfailover 
oc create serviceaccount harp -n pro-ipfailover
oadm policy add-scc-to-user privileged system:serviceaccount:pro-ipfailover:harp
oadm router ha-router --replicas=3 --selector="infra=ha-router" --labels="infra=ha-router" --service-account=harp

oadm ipfailover ipf-har --replicas=3 --watch-port=80 --selector="infra=ha-router" --virtual-ips="10.18.41.250-252" --service-account=harp 

oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/OSE3.3/ipfailover-hostport.json

[root@dhcp-41-65 byo]# for i in 250 251 252 ; do curl 10.18.41.$i:9736 ; done
Hello OpenShift!
Hello OpenShift!
Hello OpenShift!

Comment 15 Meng Bo 2016-11-14 07:43:06 UTC
@weliang 
For your steps till the `oadm ipfailver`, they should work since the ipfailover is watching the router pods which created with host network, and the 80 port is listening on the nodes at this point.

And for the last step to create the list, I have never tested like that. And I am not sure if it is related to the problem I met.

Comment 16 Meng Bo 2016-11-16 12:04:33 UTC
Tested with the new steps:
1. Create the service and pod
$ oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/ha-network-service.json

2. Modify the service to type NodePort and change the targetPort to 8080 which the pod is containerPort 

3. Get the assigned node port of the service, eg 30547

4. Create ipfailover on the node port
# oadm ipfailover ipf --create --selector=ha-service=ha --virtual-ips=10.66.140.105-106 --watch-port=30547 --replicas=2 --service-account=ipfailover --interface=eth0

5. Access the node port via VIPs
$ curl 10.66.140.105:30547
$ curl 10.66.140.106:30547


Above steps passed for me.

@Phil Could you help review if my steps are correct? Thanks.

Comment 17 Phil Cameron 2016-11-16 15:55:23 UTC
@Meng - The steps look correct to me.

Comment 18 Meng Bo 2016-11-17 02:58:43 UTC
Thanks. I am going to mark the bug as VERIFIED.

Comment 20 errata-xmlrpc 2017-01-18 12:47:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066


Note You need to log in before you can comment on or make changes to this bug.