Bug 1995507 - Kuryr controller error prevents OCP service of type LoadBalancer in Obtaining externalIP/FIP.
Summary: Kuryr controller error prevents OCP service of type LoadBalancer in Obtaining...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Michał Dulko
QA Contact: Itzik Brown
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-19 09:47 UTC by Mohammad
Modified: 2021-09-15 19:21 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-15 19:21:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 12344 0 None None None 2021-08-27 13:21:38 UTC
Red Hat Product Errata RHBA-2021:3424 0 None None None 2021-09-15 19:21:18 UTC

Description Mohammad 2021-08-19 09:47:40 UTC
Description of problem: Kuryr controller error prevents OCP service of type LoadBalancer in Obtaining externalIP/FIP.


Version-Release number of selected component (if applicable): 3.11.452


How reproducible:

1- Create service of type loadbalancer:

[openshift@master-2 mowork]$ cat lb-momohttpd-40.yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
  name: lb-momohttpd-40
  namespace: momo
spec:
  externalTrafficPolicy: Cluster
  ports:
  - name: http
    nodePort: 30040
    port: 80
    protocol: TCP
    targetPort: 8080
  - name: https
    nodePort: 31040
    port: 443
    protocol: TCP
    targetPort: 8443
  selector:
    k8s-app: momohttpd-40
  sessionAffinity: None
  type: LoadBalancer

2- Monitor service creation:
NAME              TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
lb-momohttpd-40   LoadBalancer   XX.XX.26.44   <pending>     80:30040/TCP,443:31040/TCP   0s
#####Thu Aug 19 19:01:20 AEST 2021####
NAME              TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
lb-momohttpd-40   LoadBalancer   XX.XX.26.44   <pending>     80:30040/TCP,443:31040/TCP   1m
#####Thu Aug 19 19:02:21 AEST 2021####
NAME              TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
lb-momohttpd-40   LoadBalancer   XX.XX.26.44   <pending>     80:30040/TCP,443:31040/TCP   2m
#####Thu Aug 19 19:03:22 AEST 2021####
NAME              TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
lb-momohttpd-40   LoadBalancer   XX.XX.26.44   <pending>     80:30040/TCP,443:31040/TCP   3m
#####Thu Aug 19 19:04:23 AEST 2021####

3- Until you see this error in the controller log:
In kuryr-controller-5c96d54d78-7vkbz_logs_while_FIP_pending.txt

2021-08-19 09:02:30.047 1 ERROR kuryr_kubernetes.controller.drivers.lbaasv2 [-] Error when creating loadbalancer: {"debuginfo": null, "faultcode": "Server", "faultstring": "Provider 'amphora' reports error: IP address XX.XX.26.44 already allocated in subnet 0ada9494-f2fc-4a7a-b7df-8926209f4cc7\nNeutron server returns request_ids: ['req-98c7eb09-6cf2-4d91-9724-2037040c3ffe']"}^[[00m


4- Restart Kuryr controller

Once the controller restarts the FIP gets assigned (logs: kuryr-controller-5c96d54d78-f9rx8_logs_after_restarting_kuryr.txt but don't contain any issues)

5- Check service creation:
#####Thu Aug 19 19:04:23 AEST 2021####
NAME              TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      AGE
lb-momohttpd-40   LoadBalancer   XX.XX.26.44   XXX.ZZZ.129.142   80:30040/TCP,443:31040/TCP   3m
#####Thu Aug 19 19:04:34 AEST 2021####

Actual results:
FIP/ExternalIP never gets assigned until Kuryr controller is manually restarted.

Expected results: 
FIP/ExternalIP should get assigned without a restart of Kuryr controller.

Comment 4 Mohammad 2021-08-19 10:00:34 UTC
 $ openstack loadbalancer list |grep 26.44
| 3fe16450-8034-4ee6-bdab-da0c2b123c62 | momo/lb-momohttpd-40                                     | f7b96553d2fd4e26a05beb87c85c67c9 | XXX.XXX.26.44   | ACTIVE              | amphora  |

Comment 13 Michał Dulko 2021-08-27 14:28:00 UTC
A possible fix to the problem is merged. You can easily try it on a live cluster by checking in which section lbaas_activation_timeout is in kuryr-config ConfigMap (kuryr namespace). If it doesn't help (or affected cluster has lbaas_activation_timeout in [neutron_defaults] section already), then please reopen the bug and we'll continue to investigate.

Comment 16 Itzik Brown 2021-09-01 06:07:28 UTC
Version: v3.11.515

Verified that lbaas_activation_timeout = 1200 appears under neutron_defaults section (kuryr.conf) by running oc get cm -n kuryr -o yaml |grep -A 20 neutron_defaults
kuryr_tempest_plugin.tests.scenario.test_service.TestLoadBalancerServiceScenario tests passed

Comment 19 errata-xmlrpc 2021-09-15 19:21:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.521 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3424


Note You need to log in before you can comment on or make changes to this bug.