Bug 1365176

Summary: Duplicate addresses shown under oc describe endpoints when configure ipfailover
Product: OpenShift Container Platform Reporter: Weibin Liang <weliang>
Component: NetworkingAssignee: Maru Newby <mnewby>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: aos-bugs, bbennett, hongli, mnewby, tdawson
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When ipfailover was configured for the router, keepalived pods were being labeled with the selector of the router service. Consequence: The router service was selecting both router pods and keepalived pods. Since both types of pods use host networking by default, their IP addresses would be the same if deployed to the same hosts and the service would appear to be selecting duplicate endpoints. Fix: The keepalived pods are now given a label that is distinct from that applied to the router pods. Result: The router service no longer displays duplicate IP addresses when ipfailover is configured.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-27 09:42:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Weibin Liang 2016-08-08 14:23:16 UTC
Description of problem:
Two duplicate addresses (10.18.41.62 and 10.18.41.70) shown under oc describe endpoints when configured ipfailover.

Addresses:		10.18.41.142,10.18.41.61,10.18.41.62,10.18.41.62,10.18.41.70,10.18.41.70

Version-Release number of selected component (if applicable):
[root@dhcp-41-74 ~]# oc version
oc v3.2.1.9-1-g2265530
kubernetes v1.2.0-36-g4a3f9c5
[root@dhcp-41-74 ~]# cat /etc/system-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[root@dhcp-41-74 ~]# 

How reproducible:
Easy to reproduce, just follow below steps,

Steps to Reproduce:
oc label nodes dhcp-41-142.bos.redhat.com "infra=ha-router"
oc label nodes dhcp-41-70.bos.redhat.com "infra=ha-router"
oc label nodes dhcp-41-62.bos.redhat.com "infra=ha-router"
oc label nodes dhcp-41-61.bos.redhat.com "infra=ha-router"
oc label nodes dhcp-41-74.bos.redhat.com "infra=ha-router"
oc get nodes --selector="infra=ha-router"

oc delete project pro-ipfailover
oc project default
sleep 20
oc new-project pro-ipfailover
	
oc create serviceaccount harp -n pro-ipfailover
oadm policy add-scc-to-user privileged system:serviceaccount:pro-ipfailover:harp

oadm router ha-router --replicas=2 --selector="infra=ha-router" --labels="infra=ha-router" \
--service-account=harp

[root@dhcp-41-74 ~]# oc get nodes
NAME                         STATUS                     AGE
dhcp-41-142.bos.redhat.com   Ready                      2d
dhcp-41-61.bos.redhat.com    Ready                      2d
dhcp-41-62.bos.redhat.com    Ready                      2d
dhcp-41-70.bos.redhat.com    Ready                      2d
dhcp-41-74.bos.redhat.com    Ready,SchedulingDisabled   2d

oadm ipfailover ipf-har --replicas=4 --watch-port=80 --selector="infra=ha-router" \
--virtual-ips="10.245.2.201-205" --credentials=/etc/origin/master/openshift-router.kubeconfig --service-account=harp --create

[root@dhcp-41-74 ~]# oc get pods
NAME                READY     STATUS    RESTARTS   AGE
ha-router-1-uf6vo   1/1       Running   0          6m
ha-router-1-vmdf4   1/1       Running   0          6m
ipf-har-1-8eb5y     1/1       Running   0          1m
ipf-har-1-h05i1     1/1       Running   0          1m
ipf-har-1-hn6kh     1/1       Running   0          1m
ipf-har-1-x7qn2     1/1       Running   0          1m
[root@dhcp-41-74 ~]# oc get svc
NAME        CLUSTER-IP      EXTERNAL-IP   PORT(S)                   AGE
ha-router   172.30.135.52   <none>        80/TCP,443/TCP,1936/TCP   7m
[root@dhcp-41-74 ~]# oc get endpoints
NAME        ENDPOINTS                                                    AGE
ha-router   10.18.41.142:80,10.18.41.61:80,10.18.41.62:80 + 15 more...   7m
[root@dhcp-41-74 ~]# oc describe endpoints ha-router
Name:		ha-router
Namespace:	pro-ipfailover
Labels:		infra=ha-router
Subsets:
  Addresses:		10.18.41.142,10.18.41.61,10.18.41.62,10.18.41.62,10.18.41.70,10.18.41.70
  NotReadyAddresses:	<none>
  Ports:
    Name	Port	Protocol
    ----	----	--------
    80-tcp	80	TCP
    443-tcp	443	TCP
    1936-tcp	1936	TCP

No events.

[root@dhcp-41-74 ~]# 

Actual results:
Addresses:		10.18.41.142,10.18.41.61,10.18.41.62,10.18.41.62,10.18.41.70,10.18.41.70

Expected results:
Addresses:		
10.18.41.142,10.18.41.61,10.18.41.62,10.18.41.70

Additional info:

Comment 1 Maru Newby 2016-08-11 21:58:24 UTC
The keepalived pods are being labeled with the value supplied as the selector, which means they are selected by the router service (infra=ha-router in this case) for inclusion in the endpoints that it is targeting.  I can't see a good reason for the keepalived pods being labeled in this way.

Comment 2 Maru Newby 2016-08-12 18:01:15 UTC
Submitted a fix on github.

Comment 3 Ben Bennett 2016-08-16 15:23:27 UTC
PR merged to Origin.

Comment 4 Troy Dawson 2016-08-18 20:13:16 UTC
This has been merged into ose and is in OSE v3.3.0.22 or newer.

Comment 6 Meng Bo 2016-08-22 03:22:33 UTC
Checked on ose v3.3.0.23.

Issue has been fixed.

[root@ose-master ~]# oc get po
NAME                READY     STATUS    RESTARTS   AGE
ha-router-1-ocaop   1/1       Running   0          2m
ha-router-1-xnd6p   1/1       Running   0          2m
ipf-red-1-awwg4     1/1       Running   0          1m
ipf-red-1-cf5u9     1/1       Running   0          1m
[root@ose-master ~]# oc get svc
NAME         CLUSTER-IP       EXTERNAL-IP   PORT(S)                   AGE
ha-router    172.30.212.199   <none>        80/TCP,443/TCP,1936/TCP   2m
kubernetes   172.30.0.1       <none>        443/TCP,53/UDP,53/TCP     3m
[root@ose-master ~]# oc get endpoints 
NAME         ENDPOINTS                                                           AGE
ha-router    10.66.140.165:443,10.66.141.94:443,10.66.140.165:1936 + 3 more...   2m
kubernetes   10.66.140.11:8443,10.66.140.11:8053,10.66.140.11:8053               3m
[root@ose-master ~]# oc describe endpoints ha-router
Name:           ha-router
Namespace:      default
Labels:         router=ha-router
Subsets:
  Addresses:            10.66.140.165,10.66.141.94
  NotReadyAddresses:    <none>
  Ports:
    Name        Port    Protocol
    ----        ----    --------
    443-tcp     443     TCP
    1936-tcp    1936    TCP
    80-tcp      80      TCP

No events.

Comment 8 errata-xmlrpc 2016-09-27 09:42:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933