Description of problem: Try to deploy the keepalived ipfailover to monitor the service and make the service high available by following the guide in https://docs.openshift.org/latest/admin_guide/high_availability.html#ip-failover The ipfailover failed to monitor the service port since it will not bind the container port to host port for containers. Version-Release number of selected component (if applicable): openshift v3.4.0.16+cc70b72 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 docker-common-1.12.1-7.el7.x86_64 openshift3/ose-keepalived-ipfailover v3.4.0.16 fe424a5e49e2 How reproducible: always Steps to Reproduce: 1. Create HA network service by following the guide in https://docs.openshift.org/latest/admin_guide/high_availability.html#ip-failover 2. Check the VIPs on the node 3. Check the listening ports on the node Actual results: 2. No VIP will be assigned to the nodes. 3. The hostport specified in the pod json file will not be bound to the host. Expected results: The ipfailover should work well for service. Additional info: My steps: # oadm policy add-scc-to-user privileged -z default # oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/ha-network-service.json # oadm policy add-scc-to-user privileged -z ipfailover # oadm ipfailover ipf --create --selector=ha-service=ha --virtual-ips=10.66.140.105-106 --watch-port=9736 --replicas=2 --service-account=ipfailover Check the ip and port on the node: # ip a s eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 52:54:00:b9:4e:04 brd ff:ff:ff:ff:ff:ff inet 10.66.140.165/23 brd 10.66.141.255 scope global dynamic eth0 valid_lft 73729sec preferred_lft 73729sec inet6 fe80::5054:ff:feb9:4e04/64 scope link valid_lft forever preferred_lft forever # netstat -vantp | grep 9736 # # iptables -nL -t nat | grep 9736 KUBE-HP-DGSQR537BJBDRN72 tcp -- 0.0.0.0/0 0.0.0.0/0 /* ha-service-cjb5c_default hostport 9736 */ tcp dpt:9736 KUBE-MARK-MASQ all -- 10.128.0.9 0.0.0.0/0 /* ha-service-cjb5c_default hostport 9736 */ DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 /* ha-service-cjb5c_default hostport 9736 */ tcp to:10.128.0.9:8080 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 /* default/ha-service: */ tcp to:10.129.0.15:9736 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 /* default/ha-service: */ tcp to:10.128.0.9:9736 KUBE-SVC-34IH27QAS73674C2 tcp -- 0.0.0.0/0 172.30.126.234 /* default/ha-service: cluster IP */ tcp dpt:9736 Check the ipfailover pod: # oc logs ipf-1-5eaag - Loading ip_vs module ... - Checking if ip_vs module is available ... ip_vs_rr 12600 0 ip_vs 141092 2 ip_vs_rr - Module ip_vs is loaded. - check for iptables rule for keepalived multicast (224.0.0.18) ... - Generating and writing config to /etc/keepalived/keepalived.conf - Starting failover services ... Starting Healthcheck child process, pid=142 Starting VRRP child process, pid=143 Initializing ipvs 2.6 Netlink reflector reports IP 10.66.140.165 added Netlink reflector reports IP 10.66.140.165 added Netlink reflector reports IP 10.128.0.1 added Netlink reflector reports IP fe80::5054:ff:feb9:4e04 added Netlink reflector reports IP fe80::20bf:dbff:fedb:a496 added Netlink reflector reports IP fe80::428:68ff:fec2:ef6b added Registering Kernel netlink reflector Registering Kernel netlink command channel Registering gratuitous ARP shared channel Opening file '/etc/keepalived/keepalived.conf'. Netlink reflector reports IP 10.128.0.1 added Netlink reflector reports IP fe80::5054:ff:feb9:4e04 added Netlink reflector reports IP fe80::20bf:dbff:fedb:a496 added Netlink reflector reports IP fe80::428:68ff:fec2:ef6b added Registering Kernel netlink reflector Registering Kernel netlink command channel Configuration is using : 73196 Bytes Using LinkWatch kernel netlink reflector... Opening file '/etc/keepalived/keepalived.conf'. VRRP_Instance(ipf_VIP_1) Entering BACKUP STATE VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(9,10)] Configuration is using : 8409 Bytes Using LinkWatch kernel netlink reflector... VRRP_Instance(ipf_VIP_2) Now in FAULT state Process [144] didn't respond to SIGTERM VRRP_Instance(ipf_VIP_1) Now in FAULT state VRRP_Group(group_ipf) Syncing instances to FAULT state Process [148] didn't respond to SIGTERM Process [154] didn't respond to SIGTERM Process [171] didn't respond to SIGTERM Process [207] didn't respond to SIGTERM # oc exec ipf-1-5eaag -- bash -c "ps -ef" UID PID PPID C STIME TTY TIME CMD root 1 0 0 05:51 ? 00:00:00 /bin/bash /var/lib/ipfailover/keepalived/monitor.sh root 141 1 0 05:51 ? 00:00:00 /usr/sbin/keepalived -D -n --log-console root 142 141 0 05:51 ? 00:00:00 /usr/sbin/keepalived -D -n --log-console root 143 141 0 05:51 ? 00:00:00 /usr/sbin/keepalived -D -n --log-console root 431 143 6 05:55 ? 00:00:00 /usr/sbin/keepalived -D -n --log-console root 432 431 0 05:55 ? 00:00:00 sh -c </dev/tcp/10.66.140.165/9736 root 433 0 0 05:55 ? 00:00:00 ps -ef
See openshift-docs PR 3051 for a more recent description of high availability. From what you show this appears to be working properly. There should be two nodes involved here but you only report on one of them. The ipfailover log indicates a FAULT state which means the watch port (9736) is not open. The </dev/tcp/10.66.140.165/9736 fails. There is no VIP set up on eth0 which is expected. This assumes eth0 is the host NIC. The two VIPS=10.66.140.105-106 may be served on the same or different nodes. The vip will be assigned on the host interface (not the pod since these are externally visible IP addresses) For example, I have a test setup with 2 nodes (netdev28 and netdev35) serving 2 VIPS=10.250.2.101-102 In my case, at present both are on the same node: netdev28: # ip a s em3 2: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether ec:f4:bb:dc:03:74 brd ff:ff:ff:ff:ff:ff inet 10.19.17.22/24 brd 10.19.17.255 scope global dynamic em3 valid_lft 77908sec preferred_lft 77908sec inet 10.250.2.101/32 scope global em3 <<<< one VIP valid_lft forever preferred_lft forever inet 10.250.2.102/32 scope global em3 <<<< other VIP valid_lft forever preferred_lft forever netdev35: # ip a s em3 2: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether ec:f4:bb:db:fc:14 brd ff:ff:ff:ff:ff:ff inet 10.19.17.36/24 brd 10.19.17.255 scope global dynamic em3 valid_lft 76620sec preferred_lft 76620sec ============== Please run and send results: # oc rsh ipf-1-5eaag cat /etc/keepalived/keepalived.conf Also, run on the other node. Provide log for other node please run: # ip a on both nodes Please verify that the rc/ha-service does have 2 pods up and running. The pods need to be running on at least 1 of the ipfailover nodes.
Tried again with openshift build v3.4.0.17 and cannot reproduce the bug. Not sure what happened last week. Close the bug for now, will reopen it if I can reproduce next time.
@Phil I found some more things on this bug. I am using the following script to test this. $ cat ipfailover.sh #!/bin/bash node1=ose-node1.bmeng.local node2=ose-node2.bmeng.local ip="10.66.140.105-106" # check ips for i in 105 106 ; do ping -c1 10.66.140.$i ; if [ $? -ne 1 ] ; then exit ; fi ; done # add labels to node oc label node $node1 ha-service=ha --overwrite oc label node $node2 ha-service=ha --overwrite # create router on each node oadm policy add-scc-to-user privileged -z default oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/ha-network-service.json # wait the routers are running while [ `oc get pod | grep ha | grep Running | wc -l` -lt 2 ] ; do ; sleep 5 ; done for i in $node1 $node2 ;do curl $i:9736 ; done # create ipfailover for each router oadm policy add-scc-to-user privileged -z ipfailover oadm ipfailover ipf --create --selector=ha-service=ha --virtual-ips=${ip} --watch-port=9736 --replicas=2 --service-account=ipfailover --interface=eth0 # wait the keepaliveds are running while [ `oc get pod | grep ipf | grep -v deploy | grep Running | wc -l` -lt 2 ] ; do ; sleep 5 ; done for i in $node1 $node2 ;do curl $i:9736 ; done If I access the pods via the host port before I create the ipfailover, then it will work well. But if I don't run the `for i in $node1 $node2 ;do curl $i:9736 ; done` on the first time, then the last line will also fail.
@Meng Some thoughts here. I am not entirely sure about what is going on. The json seems to create a service with port 9736 with hello-openshift pods The ipfailover creates the VIPS 10.66.140.105-106 and the 9736 watch port. The node IPs should be different. You can look at the ipfailover pod logs and see which nodes are master for the two VIPs. "ip a" should show the VIP on the desired interface on each node. The logs you initially reported showed both VIPs in FAULT state. This means they didn't port 9736 open. Once "ip a" shows the VIP on an interface, the curl should work. I think the curl should target 10.66.140.105:9736 and 10.66.140.106:9736 The service should have ExternalIP set for the VIPs as well. This may be the problem. There is also the nodePort which may need to be set. See http://kubernetes.io/docs/user-guide/services/#ips-and-vips I am going to experiment with this and get back to you later.
@meng The service must have externalIPs set to the VIPs for this to work. keepalived is not finding a open port on the VIPs so the VIPs are in a FAULT state. phil
Phil the problem is probably that the target port needs to be the nodeport... the external IP only works if the target is actually the IP address, and this doesn't do that. I think I cc'ed you on an email about this recently. We need to work out a clean way to do this without requiring each service to have a failover configured directly. Ideally we just test that the node is still live and rely on the service proxy to do the rest... not sure how to do that yet.
Ben, I didn't read the kubernetes doc that way. I think there needs to be an externalIPs:<list of vips> in the service's definition. When the user wants ipfailover to front a service, the VIPs must be set in the ipfailover configuration and in the service's externalIPs: list. The ipfailover watch port must be the same as the service's spec.port. That's how they match up. The doc says that the spec.port is exposed on the clusterIP and the externalIPs. This is what we need here. keepalived watch port is the service's spec.port and it is checking that the spec.port is open. In Meng's case the port is not exposed on the VIPs and keepalived is in FAULT state. The node port is a different way of doing this. There are caveats about assigning the node port. Both ways should be tested. Look in the updated docs PR 3051, I just wrote about this.
The example file https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/ha-network-service.json I used in my step, it has the same contents with the one in the ipfailover doc geo-cache.json with just replaced the image name. If it requires a nodePort or externalIP to be set. We should update the example we provided. But the same json file works well in the previous testing before 3.4.
I don't see an update to https://github.com/openshift/openshift-docs/pull/3051
Ben, w.r.t. the doc update, I made a mistake with github and will fix it. Ben and I are discussing this and I am reproducing the test on my setup. It looks like the documentation needs to address how to do this and I am working on the doc changes. At present, I don't think there is a bug here. I think the test/setup needs to be changed. I also think we need to document how to properly set this up to get it working. More later...
Phil is right, there either needs to be an ExternalIP or a NodePort on the service definition. If you do not set that, traffic can not get in to the service. Phil has a docs PR open to get that clarified now.
When I follow Mengbo's above steps, I got exact same results as he did. In current ipfailover doc https://docs.openshift.org/latest/admin_guide/high_availability.html#ip-failover, it look like we support two different ways to use ipfailover, first case is routing service which need install router, second case is network service which is using hostport setup in service. I think the geo-cache.json from doc and Mengbo also used is using hostport which Kubernets recommended not to use it. http://kubernetes.io/docs/user-guide/config-best-practices/ http://kubernetes.io/docs/user-guide/application-troubleshooting/ Mengbo above steps make hostport work only when he do "for i in $node1 $node2 ;do curl $i:9736 ; done" before he create ipfailover instances, we may need decide if this is a bug or we should not support hostport according to Kubernets recommendations. For https://github.com/openshift/openshift-docs/pull/3051, I am doing some sanity testing now: Case1: routing service which need install router: passed Case2: network service with hostport setup: failed Case3: ipfailover with manaul externalIP: testing now Case4: ipfailover with Maul externalIP: will test Case5: ipfailover with nodeport: will test
The new docs PR 3051 goes into more detail. PTAL
@bmeng, if you change --watch-port=9736 to --watch-port=80 and replicas three routers (you did not have in your steps), then your test case should pass even curl use hostport. Here is my passed testing results: ip="10.18.41.250-252" oc new-project pro-ipfailover oc create serviceaccount harp -n pro-ipfailover oadm policy add-scc-to-user privileged system:serviceaccount:pro-ipfailover:harp oadm router ha-router --replicas=3 --selector="infra=ha-router" --labels="infra=ha-router" --service-account=harp oadm ipfailover ipf-har --replicas=3 --watch-port=80 --selector="infra=ha-router" --virtual-ips="10.18.41.250-252" --service-account=harp oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/OSE3.3/ipfailover-hostport.json [root@dhcp-41-65 byo]# for i in 250 251 252 ; do curl 10.18.41.$i:9736 ; done Hello OpenShift! Hello OpenShift! Hello OpenShift!
@weliang For your steps till the `oadm ipfailver`, they should work since the ipfailover is watching the router pods which created with host network, and the 80 port is listening on the nodes at this point. And for the last step to create the list, I have never tested like that. And I am not sure if it is related to the problem I met.
Tested with the new steps: 1. Create the service and pod $ oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/ha-network-service.json 2. Modify the service to type NodePort and change the targetPort to 8080 which the pod is containerPort 3. Get the assigned node port of the service, eg 30547 4. Create ipfailover on the node port # oadm ipfailover ipf --create --selector=ha-service=ha --virtual-ips=10.66.140.105-106 --watch-port=30547 --replicas=2 --service-account=ipfailover --interface=eth0 5. Access the node port via VIPs $ curl 10.66.140.105:30547 $ curl 10.66.140.106:30547 Above steps passed for me. @Phil Could you help review if my steps are correct? Thanks.
@Meng - The steps look correct to me.
Thanks. I am going to mark the bug as VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066