Bug 1575995 - LoadBalancer kind service floating IP is not reflected in openshift/K8s service
Summary: LoadBalancer kind service floating IP is not reflected in openshift/K8s service
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 3.10.z
Assignee: Tzu-Mainn Chen
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On: 1593662
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-08 13:49 UTC by Jon Uriarte
Modified: 2018-10-08 11:42 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-08 11:42:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1733576 0 None None None 2018-05-08 13:49:05 UTC

Description Jon Uriarte 2018-05-08 13:49:05 UTC
Description of problem:

After service creation in openshift/K8s a floating IP is assigned to the load balancer and it is reachable, but the value is not reflected in the ExternalIPs field of the openshift/K8s service metadata.

Version-Release number of selected component (if applicable):
openstack-kuryr-kubernetes-controller-0.4.3-1.el7ost.noarch

How reproducible: Always

Steps to Reproduce:
1. Make sure external_svc_net param is set with external network id in kuryr-config configmap (oc -n openshift-infra edit cm kuryr-config).
2. oc new-project test
3. oc run --image kuryr/demo demo
4. oc scale dc/demo --replicas=2
5. oc expose dc/demo --port 80 --target-port 8080 --type LoadBalancer
    service "demo" exposed
6. oc get svc
    NAME      TYPE           CLUSTER-IP      EXTERNAL-IP                     PORT(S)        AGE
    demo      LoadBalancer   172.30.230.80   172.29.117.222,172.29.117.222   80:32440/TCP   19s
7. Check a floating IP has been asigned for the load balancer:
    (overcloud) [root@undercloud stack]# openstack floating ip list | grep 172.30.230.80
    | 03e9b5db-1792-4e57-9ece-f79cf8b5ad0c | 172.20.0.218        | 172.30.230.80    | 9d604dbe-e7e0-4e57-a85e-9dd0b4c9d49c | dbba197f-d28e-49be-9905-fde1fa67cd52 | 6c07532860e641989bacc5583275080a |          
8. Check the floating IP is reachable:
   (overcloud) [root@undercloud stack]# curl  172.20.0.218
   demo-1-5pn86: HELLO! I AM ALIVE!!!
9. Check externalIPs field in service metadata:
   [openshift@master-0 ~]$ oc get svc demo -o yaml
   apiVersion: v1
   kind: Service
   metadata:
     annotations:
       openstack.org/kuryr-lbaas-spec: '{"versioned_object.data": {"ip": "172.30.230.80",
         "lb_ip": null, "ports": [{"versioned_object.data": {"name": null, "port": 80,
         "protocol": "TCP"}, "versioned_object.name": "LBaaSPortSpec", "versioned_object.namespace":
         "kuryr_kubernetes", "versioned_object.version": "1.0"}], "project_id": "6c07532860e641989bacc5583275080a",
         "security_groups_ids": ["41d0c3ad-ebcf-4c14-939f-651ec2c50fcf"], "subnet_id":
         "d0755040-7349-4966-8f7f-24989b0e2d56", "type": "LoadBalancer"}, "versioned_object.name":
         "LBaaSServiceSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version":
         "1.0"}'
     creationTimestamp: 2018-05-07T12:51:45Z
     labels:
       run: demo
     name: demo
     namespace: test
     resourceVersion: "442095"
     selfLink: /api/v1/namespaces/test/services/demo
     uid: 663a45f1-51f5-11e8-bca5-fa163ec71097
   spec:
     clusterIP: 172.30.230.80
     externalIPs:
     - 172.29.117.222
     - 172.29.49.250
     externalTrafficPolicy: Cluster
     ports:
     - nodePort: 32440
       port: 80
       protocol: TCP
       targetPort: 8080
     selector:
       run: demo
     sessionAffinity: None
     type: LoadBalancer
   status:
     loadBalancer:
       ingress:
       - ip: 172.29.49.250

Actual results:
The service FIP (172.20.0.218) is not reflected in externalIPs.
EXTERNAL-IP param from 'oc get svc' shows an IP address twice.
oc get svc
    NAME      TYPE           CLUSTER-IP      EXTERNAL-IP                     PORT(S)        AGE
    demo      LoadBalancer   172.30.230.80   172.29.117.222,172.29.117.222   80:32440/TCP   19s

Expected results:
The service FIP (172.20.0.218) should be listed in externalIPs.
EXTERNAL-IP param from 'oc get svc' should not show the IP address twice.

Additional info:
Upstream bug: https://bugs.launchpad.net/kuryr-kubernetes/+bug/1733576

Comment 1 Yossi Boaron 2018-05-09 11:32:53 UTC
Your comment was:
Seems that Kuryrcontroller  is doing his job:
1. Created LB.
2. Allocate FIP  from the external network 
3. Attach the FIP to LB vip
4.  Updates the service with the FIP under service object as follows: 
status:
  loadBalancer:
    ingress:
    - ip: 172.24.4.13

So, the issue that  Openshift also allocates external IP (from the default pool 172.29.xx.xx) and overwrite Kuryr details under status/loadbalancr/ingress/ip.


As a workaround, we can get the LB FIP from the endpoints annotation as follows :

# Create a LoadBalancer type service
 oc run --image kuryr/demo test1                                                                                                                                                        
 oc scale dc/test1 --replicas=2                                                                                                                                                         
  oc expose dc/test1 --port 80 --target-port 8080 --type LoadBalancer        

# The fip could be retrieved from annotation as follows :
 oc get ep test1 -o yaml | grep service_pub_ip_info -A1 #?                                                                                                                   "kuryr_kubernetes", "versioned_object.version": "1.0"}], "service_pub_ip_info":                                                                                                         
      {"versioned_object.data": {"alloc_method": "pool", "ip_addr": 172.20.0.219",                                

# in this example the FIP is 172.20.0.219


I don't think that it should be a blocker.

We"ll continue to investigate

Comment 2 Yossi Boaron 2018-05-14 07:13:29 UTC
More details after further investigations:

A. Openshift external/ingress IP CIDR for LoadBalancer service type is defined by 'ingressIPNetworkCIDR' field at  master-config.yaml file under section 'networkConfig' as follows:

"
networkConfig:
 clusterNetworkCIDR: 10.0.0.64/26
 clusterNetworks:
 - cidr: 10.0.0.64/26
   hostSubnetLength: 9
 externalIPNetworkCIDRs: null
 hostSubnetLength: 9
 ingressIPNetworkCIDR: 172.29.0.0/16
 networkPluginName: ""
 serviceNetworkCIDR: 10.0.0.128/26
"

B. In case 'ingressIPNetworkCIDR' not defined, Openshift use as default the 172.29.0.0/16 CIDR.

C. From Openshift logs, it seems that Openshift has a periodic activity that verifies external IP is in 'ingressIPNetworkCIDR'  range, and in case it isn't - it should allocate a new IP.

The relevant part from Openshift logs appears below.

D. At some point (I assume Openshift set it when reaching the maximum number of external IP's), it's forbidden to update 'LoadBalancerStatus'.

The relevant  section from Openshift logs:

May 14 06:07:26 gtfgfg openshift[18652]: E0514 06:07:26.808014   18652 service_ingressip_controller.go:580] The ingress ip 172.24.4.13 for service default/test21 is not in the ingress range.  A new ip will be allocated.
May 14 06:07:26 gtfgfg openshift[18652]: E0514 06:07:26.812957   18652 service_ingressip_controller.go:385] error syncing service, it will be retried: Failed to persist updated LoadBalancerStatus to service 'default/test21': Service "test21" is invalid: spec.externalIPs: Forbidden: externalIPs have been disabled
May 14 06:07:26 gtfgfg openshift[18652]: E0514 06:07:26.818129   18652 service_ingressip_controller.go:580] The ingress ip 172.24.4.13 for service default/test21 is not in the ingress range.  A new ip will be allocated.

E. In the bottom line, doesn't seem like a Kuryr's bug, we need to find a way to configure Openshift not to allocate External IP's for services of type LoadBalancer.
And set this configuration when SDN=KURYR


OpenShift logs:
----------------
May 14 06:07:26 gtfgfg openshift[18652]: E0514 06:07:26.829383   18652 service_ingressip_controller.go:580] The ingress ip 172.24.4.13 for service default/test21 is not in the ingress range.  A new ip will be al
located.
May 14 06:07:26 gtfgfg openshift[18652]: E0514 06:07:26.830212   18652 service_ingressip_controller.go:385] error syncing service, it will be retried: Failed to persist updated LoadBalancerStatus to service 'def
ault/test21': Service "test21" is invalid: spec.externalIPs: Forbidden: externalIPs have been disabled
May 14 06:07:26 gtfgfg openshift[18652]: E0514 06:07:26.850656   18652 service_ingressip_controller.go:580] The ingress ip 172.24.4.13 for service default/test21 is not in the ingress range.  A new ip will be al
located.
May 14 06:07:26 gtfgfg openshift[18652]: E0514 06:07:26.851521   18652 service_ingressip_controller.go:385] error syncing service, it will be retried: Failed to persist updated LoadBalancerStatus to service 'def
ault/test21': Service "test21" is invalid: spec.externalIPs: Forbidden: externalIPs have been disabled
May 14 06:07:26 gtfgfg openshift[18652]: E0514 06:07:26.891905   18652 service_ingressip_controller.go:580] The ingress ip 172.24.4.13 for service default/test21 is not in the ingress range.  A new ip will be al
located.
May 14 06:07:26 gtfgfg openshift[18652]: E0514 06:07:26.892907   18652 service_ingressip_controller.go:385] error syncing service, it will be retried: Failed to persist updated LoadBalancerStatus to service 'def
ault/test21': Service "test21" is invalid: spec.externalIPs: Forbidden: externalIPs have been disabled
May 14 06:07:26 gtfgfg openshift[18652]: E0514 06:07:26.973134   18652 service_ingressip_controller.go:580] The ingress ip 172.24.4.13 for service default/test21 is not in the ingress range.  A new ip will be al
located.
May 14 06:07:26 gtfgfg openshift[18652]: E0514 06:07:26.974343   18652 service_ingressip_controller.go:385] error syncing service, it will be retried: Failed to persist updated LoadBalancerStatus to service 'def
ault/test21': Service "test21" is invalid: spec.externalIPs: Forbidden: externalIPs have been disabled
May 14 06:07:27 gtfgfg openshift[18652]: E0514 06:07:27.134590   18652 service_ingressip_controller.go:580] The ingress ip 172.24.4.13 for service default/test21 is not in the ingress range.  A new ip will be al
located.
May 14 06:07:27 gtfgfg openshift[18652]: E0514 06:07:27.135865   18652 service_ingressip_controller.go:385] error syncing service, it will be retried: Failed to persist updated LoadBalancerStatus to service 'def
ault/test21': Service "test21" is invalid: spec.externalIPs: Forbidden: externalIPs have been disabled
May 14 06:07:27 gtfgfg openshift[18652]: E0514 06:07:27.456056   18652 service_ingressip_controller.go:580] The ingress ip 172.24.4.13 for service default/test21 is not in the ingress range.  A new ip will be al
located.
May 14 06:07:27 gtfgfg openshift[18652]: E0514 06:07:27.457246   18652 service_ingressip_controller.go:385] error syncing service, it will be retried: Failed to persist updated LoadBalancerStatus to service 'def
ault/test21': Service "test21" is invalid: spec.externalIPs: Forbidden: externalIPs have been disabled
May 14 06:07:28 gtfgfg openshift[18652]: E0514 06:07:28.097357   18652 service_ingressip_controller.go:580] The ingress ip 172.24.4.13 for service default/test21 is not in the ingress range.  A new ip will be al
located.
May 14 06:07:28 gtfgfg openshift[18652]: E0514 06:07:28.098611   18652 service_ingressip_controller.go:385] error syncing service, it will be retried: Failed to persist updated LoadBalancerStatus to service 'def
ault/test21': Service "test21" is invalid: spec.externalIPs: Forbidden: externalIPs have been disabled
May 14 06:07:29 gtfgfg openshift[18652]: E0514 06:07:29.378809   18652 service_ingressip_controller.go:580] The ingress ip 172.24.4.13 for service default/test21 is not in the ingress range.  A new ip will be al
located.
May 14 06:07:29 gtfgfg openshift[18652]: E0514 06:07:29.380048   18652 service_ingressip_controller.go:385] error syncing service, it will be retried: Failed to persist updated LoadBalancerStatus to service 'def
ault/test21': Service "test21" is invalid: spec.externalIPs: Forbidden: externalIPs have been disabled
May 14 06:07:31 gtfgfg openshift[18652]: E0514 06:07:31.940191   18652 service_ingressip_controller.go:580] The ingress ip 172.24.4.13 for service default/test21 is not in the ingress range.  A new ip will be al
located.
May 14 06:07:31 gtfgfg openshift[18652]: E0514 06:07:31.941803   18652 service_ingressip_controller.go:385] error syncing service, it will be retried: Failed to persist updated LoadBalancerStatus to service 'def
ault/test21': Service "test21" is invalid: spec.externalIPs: Forbidden: externalIPs have been disabled
May 14 06:07:37 gtfgfg openshift[18652]: E0514 06:07:37.061999   18652 service_ingressip_controller.go:580] The ingress ip 172.24.4.13 for service default/test21 is not in the ingress range.  A new ip will be al
located.
May 14 06:07:37 gtfgfg openshift[18652]: E0514 06:07:37.066239   18652 service_ingressip_controller.go:385] error syncing service, it will be retried: Failed to persist updated LoadBalancerStatus to service 'def
ault/test21': Service "test21" is invalid: spec.externalIPs: Forbidden: externalIPs have been disabled
May 14 06:07:37 gtfgfg openshift[18652]: E0514 06:07:37.232991   18652 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted
May 14 06:07:37 gtfgfg openshift[18652]: W0514 06:07:37.233252   18652 reflector.go:341] github.com/openshift/origin/vendor/k8s.io/client-go/informers/factory.go:86: watch of *v1beta1.DaemonSet ended with: The r
esourceVersion for the provided watch is too old.
May 14 06:07:42 gtfgfg openshift[18652]: I0514 06:07:42.444996   18652 trace.go:76] Trace[1514250673]: "GuaranteedUpdate etcd3: *core.Endpoints" (started: 2018-05-14 06:07:41.348810335 +0000 UTC m=+62733.0233744
64) (total time: 1.096155925s):
May 14 06:07:42 gtfgfg openshift[18652]: Trace[1514250673]: [1.096082375s] [1.095176139s] Transaction committed
May 14 06:07:42 gtfgfg openshift[18652]: I0514 06:07:42.445072   18652 trace.go:76] Trace[1398341270]: "Get /api/v1/namespaces/kube-system/configmaps/kube-scheduler" (started: 2018-05-14 06:07:41.7873538 +0000 U
TC m=+62733.461917921) (total time: 657.69886ms):
May 14 06:07:42 gtfgfg openshift[18652]: Trace[1398341270]: [657.637651ms] [657.631403ms] About to write a response

Comment 3 Yossi Boaron 2018-06-21 10:59:47 UTC
When setting Openshift cloud provider to  "OpenStack", Openshift 
shouldn't allocate External IP for services of type LoadBalancer.

There's an Open bug for this issue [1].  https://bugzilla.redhat.com/show_bug.cgi?id=1593662

So, when [1] is resolved, the service's external IP should be under service status/ingress/..  , and we should be able to access this service.

[1] : https://bugzilla.redhat.com/show_bug.cgi?id=1593662

Comment 7 Tzu-Mainn Chen 2018-08-02 13:58:59 UTC
We'll add a doc note to the openshift-ansible documentation that using kuryr also requires the openstack cloud provider to be specified.

Comment 8 Tzu-Mainn Chen 2018-08-02 16:45:54 UTC
Merged upstream and backported in https://github.com/openshift/openshift-ansible/pull/9409

Comment 9 GenadiC 2018-08-08 14:43:45 UTC
test

Comment 10 Scott Dodson 2018-08-14 21:40:04 UTC
Should be in openshift-ansible-3.10.28-1

Comment 11 Jon Uriarte 2018-09-28 11:52:25 UTC
Verified in openshift-ansible-3.10.51-1.git.0.44a646c.el7.noarch.

/usr/share/ansible/openshift-ansible/playbooks/openstack/configuration.md file includes:

Finally, you *must* set up an OpenStack cloud provider as specified in
 [OpenStack Cloud Provider Configuration](#openstack-cloud-provider-configuration).


Note You need to log in before you can comment on or make changes to this bug.