Bug 1949551 - kuryr-controller restarting after 3 days cluster running - pools without members
Summary: kuryr-controller restarting after 3 days cluster running - pools without members
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.z
Assignee: Michał Dulko
QA Contact: rlobillo
URL:
Whiteboard:
Depends On: 1936342
Blocks: 1958093
TreeView+ depends on / blocked
 
Reported: 2021-04-14 14:18 UTC by OpenShift BugZilla Robot
Modified: 2021-05-19 15:16 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-19 15:16:13 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 502 0 None open [release-4.7] Bug 1949551: kuryr-controller restarting after 3 days cluster running - pools without members 2021-04-20 01:00:32 UTC
Red Hat Product Errata RHBA-2021:1550 0 None None None 2021-05-19 15:16:36 UTC

Comment 4 rlobillo 2021-05-12 09:54:41 UTC
Verified on 4.7.0-0.nightly-2021-05-12-004740 over OSP16.1 (RHOS-16.1-RHEL-8-20210323.n.0) with OVN-Octavia.


Create a project with 3 pods and a svc, so below members on the loadbalancer will be created:

$ openstack loadbalancer member list demo/demo:TCP:80                                                                                                     
+--------------------------------------+---------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name                            | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+---------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| 4e6ff253-6212-4bb0-bd4c-fd01c7c8b0ff | demo/demo-7897db69cc-6hwj5:8080 | 46badff2f4ab4be487355645cba97276 | ACTIVE              | 10.128.124.248 |          8080 | NO_MONITOR       |      1 |
| 7cc23df9-e4ed-4a7a-b153-079a49799821 | demo/demo-7897db69cc-mw6hz:8080 | 46badff2f4ab4be487355645cba97276 | ACTIVE              | 10.128.124.254 |          8080 | NO_MONITOR       |      1 |
| 32505c2d-eecf-40b3-ad3d-f93b62592f7a | demo/demo-7897db69cc-xzz7b:8080 | 46badff2f4ab4be487355645cba97276 | ACTIVE              | 10.128.125.144 |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+---------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+

$ for i in `openstack loadbalancer member list demo/demo:TCP:80 -c name -f value`; do openstack loadbalancer member delete demo/demo:TCP:80 $i; done      
$ oc edit klb/demo -n demo
(# ^ Remove only the member section, including the key, to trigger kuryr-controller reconciliation. If the whole status section is removed, you will hit https://bugzilla.redhat.com/show_bug.cgi?id=1933880)

Wait few seconds:

$ openstack loadbalancer member list demo/demo:TCP:80                                                                                                     
+--------------------------------------+---------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name                            | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+---------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| 44caa0b3-9475-4cc5-868c-c970cc1b829a | demo/demo-7897db69cc-6hwj5:8080 | 46badff2f4ab4be487355645cba97276 | ACTIVE              | 10.128.124.248 |          8080 | NO_MONITOR       |      1 |
| 40308958-b187-42d2-83d9-f584f6eb4233 | demo/demo-7897db69cc-mw6hz:8080 | 46badff2f4ab4be487355645cba97276 | ACTIVE              | 10.128.124.254 |          8080 | NO_MONITOR       |      1 |
| 7516bbb5-3573-4394-bffb-9433a949af94 | demo/demo-7897db69cc-xzz7b:8080 | 46badff2f4ab4be487355645cba97276 | ACTIVE              | 10.128.125.144 |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+---------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+

Kuryr-controller remained stable:

$ oc get pods -n openshift-kuryr
NAME                                READY   STATUS    RESTARTS   AGE
kuryr-cni-7wlf4                     1/1     Running   0          35m
kuryr-cni-g8d6j                     1/1     Running   0          36m
kuryr-cni-hslx5                     1/1     Running   0          34m
kuryr-cni-lrfwp                     1/1     Running   0          36m
kuryr-cni-lxw6x                     1/1     Running   0          34m
kuryr-cni-t9snt                     1/1     Running   0          35m
kuryr-controller-5d499d6c4c-q7lc2   1/1     Running   0          7m23s

$ oc rsh pod/demo-7897db69cc-6hwj5 curl 172.30.76.248
demo-7897db69cc-6hwj5: HELLO! I AM ALIVE!!!
$ oc rsh pod/demo-7897db69cc-6hwj5 curl 172.30.76.248
demo-7897db69cc-mw6hz: HELLO! I AM ALIVE!!!

Comment 6 errata-xmlrpc 2021-05-19 15:16:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.11 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1550


Note You need to log in before you can comment on or make changes to this bug.