Bug 1897142 - When scaling replicas to zero, Octavia loadbalancer pool members are not updated accordingly
Summary: When scaling replicas to zero, Octavia loadbalancer pool members are not upda...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.7.0
Assignee: Luis Tomas Bolivar
QA Contact: GenadiC
URL:
Whiteboard:
Depends On:
Blocks: 1898906 1898950
TreeView+ depends on / blocked
 
Reported: 2020-11-12 12:29 UTC by Mohammad
Modified: 2021-02-24 15:33 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1898906 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:32:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 405 0 None closed Bug 1897142: Ensure members are deleted from pools when there is no endpoints 2021-01-08 07:33:17 UTC
OpenStack gerrit 762807 0 None MERGED Ensure members are deleted from pools when there is no endpoints 2021-01-08 07:33:18 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:33:13 UTC

Description Mohammad 2020-11-12 12:29:35 UTC
Description of problem: When scaling replicas to zero, Octavia loadbalancer pool members are not updated accordingly.


Version-Release number of selected component (if applicable): OpenShift 3.11.306 with Kuryr on OpenStack 13


How reproducible: Create a deployment with a service and pod, scale the pods to 5 (for example), then scale down to zero, and check members of the pool for that loadbalancer.


Steps to Reproduce:
1. Create deployment
2. Scale up to 5
3. Scale down to zero

Actual results: member list still has five members

Expected results: member list should be zero 


Additional info:

Comment 1 Mohammad 2020-11-12 12:34:02 UTC
After scaling a deployment from 5 replicas to zero, there are no pods left:

$ oc get all
NAME              TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)   AGE
service/echo-01   ClusterIP   XXX.XXX.157.245   <none>        80/TCP    5h
service/echo-02   ClusterIP   XXX.XXX.146.71    <none>        80/TCP    5h

NAME                      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/echo-01   0         0         0            0           5h
deployment.apps/echo-02   0         0         0            0           5h

NAME                                 DESIRED   CURRENT   READY     AGE
replicaset.apps/echo-01-5c8c6d56c8   0         0         0         5h
replicaset.apps/echo-02-77b5b75d95   0         0         0         5h


Checking the OpenStack side of things:

$ openstack loadbalancer member list f8745fbd-c001-4d60-90ba-6ff9b68cf8ce
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name                               | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| 51afb91a-7f73-4d63-8c8e-4ca8125577f4 | momo/echo-02-77b5b75d95-rv5m6:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XXX.XXX.19.132 |          8080 | NO_MONITOR       |      1 |
| 82aa7394-0fc8-479e-8a56-1817bdf083c5 | momo/echo-02-77b5b75d95-tslsb:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XXX.XXX.19.136 |          8080 | NO_MONITOR       |      1 |
| 38f3debf-96aa-4750-a1be-d540bf3839ce | momo/echo-02-77b5b75d95-xpmd8:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XXX.XXX.19.143 |          8080 | NO_MONITOR       |      1 |
| 819a8254-8fb1-44f1-8c82-47bc5860f82a | momo/echo-02-77b5b75d95-zzqrr:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XXX.XXX.19.144 |          8080 | NO_MONITOR       |      1 |
| 1aaf9dcd-1a9b-4d6e-be2b-8a17b88843cd | momo/echo-02-77b5b75d95-6z69w:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XXX.XXX.19.152 |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
$ openstack loadbalancer member list 0a3f85b6-941e-4ba4-9c76-b9dcc9da64d0
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name                               | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| e73d6142-7e5f-429c-a6c6-169a0a944f4f | momo/echo-01-5c8c6d56c8-s8mfq:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XXX.XXX.19.131 |          8080 | NO_MONITOR       |      1 |
| 0bb14dd4-1dba-418d-9e46-c4cfbdb3bde7 | momo/echo-01-5c8c6d56c8-kc5z7:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XXX.XXX.19.139 |          8080 | NO_MONITOR       |      1 |
| efe190c6-8dd9-406f-ac73-02e766afa375 | momo/echo-01-5c8c6d56c8-hn6td:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XXX.XXX.19.142 |          8080 | NO_MONITOR       |      1 |
| fa63bd37-8178-46e4-9b3b-0e7fb62f6e71 | momo/echo-01-5c8c6d56c8-zfxfr:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XXX.XXX.19.145 |          8080 | NO_MONITOR       |      1 |
| 9e369b9e-52f5-4ec5-b9a8-07f873425a02 | momo/echo-01-5c8c6d56c8-crp6c:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XXX.XXX.19.153 |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+

Comment 3 Brendan Shephard 2020-11-13 02:37:26 UTC
Hey Mo,

Is there any evidence that Kuryr has been able to delete any resources at all? Seems to be a common issue here where there are left over ports in Neutron and left over pool members in Octavia. I wonder if Kuryr actually has the permissions it needs to delete objects?

You could find the cloud-credentials being used by Kuryr-CNI and try to delete one of the incorrect pool members, or ports using those same credentials. That would rule out permission issues. 

Also, do you see any logs from Kuryr about updating Octavia pool members? We could cross reference those messages and times with the messages from Octavia API to see if there were any issues raised. Same thing for the Neutron ports in your other case.

Comment 4 Mohammad 2020-11-15 22:15:09 UTC
Hi Brendan,

So far it seems this issue only appears when scaling down to zero.

The problem we are seeing in production is some services are scaled down to zero, but the list of members is not updated. Coincidentally, other pods (for different services) in the same namespace are being created and end up getting assigned one of the previously used IP addresses, which happen to still exist in the member list. This has been observed twice so far.

Whenever we scale to a value above zero, there are no issues. See below.

Mohammad

----------------------------------------------------------

[openshift@master-2 ~]$ oc project momo
Already on project "momo" on server "https://XX.XX.128.1:8443".
[openshift@master-2 ~]$ oc get pods
No resources found.
[openshift@master-2 ~]$ oc get all
NAME              TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)   AGE
service/echo-01   ClusterIP   XX.XX.157.245   <none>        80/TCP    3d
service/echo-02   ClusterIP   XX.XX.146.71    <none>        80/TCP    3d

NAME                      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/echo-01   0         0         0            0           3d
deployment.apps/echo-02   0         0         0            0           3d

NAME                                 DESIRED   CURRENT   READY     AGE
replicaset.apps/echo-01-5c8c6d56c8   0         0         0         3d
replicaset.apps/echo-02-77b5b75d95   0         0         0         3d



dev1/mydc $ openstack loadbalancer list |grep momo
| 53e2f3e1-94c1-4871-90ed-30bdf9e619f0 | momo/echo-02                                                       | 5499a185863a469ba0f8d724e886184f | XX.XX.146.71  | ACTIVE              | octavia  |
| 4b93af87-da44-4749-b035-6a8f81d8c121 | momo/echo-01                                                       | 5499a185863a469ba0f8d724e886184f | XX.XX.157.245 | ACTIVE              | octavia  |

dev1/mydc $ openstack loadbalancer pool list --loadbalancer 4b93af87-da44-4749-b035-6a8f81d8c121
+--------------------------------------+---------------------+----------------------------------+---------------------+----------+--------------+----------------+
| id                                   | name                | project_id                       | provisioning_status | protocol | lb_algorithm | admin_state_up |
+--------------------------------------+---------------------+----------------------------------+---------------------+----------+--------------+----------------+
| 0a3f85b6-941e-4ba4-9c76-b9dcc9da64d0 | momo/echo-01:TCP:80 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | TCP      | ROUND_ROBIN  | True           |
+--------------------------------------+---------------------+----------------------------------+---------------------+----------+--------------+----------------+
dev1/mydc $ openstack loadbalancer pool list --loadbalancer 53e2f3e1-94c1-4871-90ed-30bdf9e619f0
+--------------------------------------+---------------------+----------------------------------+---------------------+----------+--------------+----------------+
| id                                   | name                | project_id                       | provisioning_status | protocol | lb_algorithm | admin_state_up |
+--------------------------------------+---------------------+----------------------------------+---------------------+----------+--------------+----------------+
| f8745fbd-c001-4d60-90ba-6ff9b68cf8ce | momo/echo-02:TCP:80 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | TCP      | ROUND_ROBIN  | True           |
+--------------------------------------+---------------------+----------------------------------+---------------------+----------+--------------+----------------+

dev1/mydc $  openstack loadbalancer member list 0a3f85b6-941e-4ba4-9c76-b9dcc9da64d0
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name                               | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| e73d6142-7e5f-429c-a6c6-169a0a944f4f | momo/echo-01-5c8c6d56c8-s8mfq:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XX.XX.19.131   |          8080 | NO_MONITOR       |      1 |
| 0bb14dd4-1dba-418d-9e46-c4cfbdb3bde7 | momo/echo-01-5c8c6d56c8-kc5z7:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XX.XX.19.139   |          8080 | NO_MONITOR       |      1 |
| efe190c6-8dd9-406f-ac73-02e766afa375 | momo/echo-01-5c8c6d56c8-hn6td:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XX.XX.19.142   |          8080 | NO_MONITOR       |      1 |
| fa63bd37-8178-46e4-9b3b-0e7fb62f6e71 | momo/echo-01-5c8c6d56c8-zfxfr:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XX.XX.19.145   |          8080 | NO_MONITOR       |      1 |
| 9e369b9e-52f5-4ec5-b9a8-07f873425a02 | momo/echo-01-5c8c6d56c8-crp6c:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XX.XX.19.153   |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
dev1/mydc $  openstack loadbalancer member list f8745fbd-c001-4d60-90ba-6ff9b68cf8ce
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name                               | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| 51afb91a-7f73-4d63-8c8e-4ca8125577f4 | momo/echo-02-77b5b75d95-rv5m6:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XX.XX.19.132   |          8080 | NO_MONITOR       |      1 |
| 82aa7394-0fc8-479e-8a56-1817bdf083c5 | momo/echo-02-77b5b75d95-tslsb:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XX.XX.19.136   |          8080 | NO_MONITOR       |      1 |
| 38f3debf-96aa-4750-a1be-d540bf3839ce | momo/echo-02-77b5b75d95-xpmd8:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XX.XX.19.143   |          8080 | NO_MONITOR       |      1 |
| 819a8254-8fb1-44f1-8c82-47bc5860f82a | momo/echo-02-77b5b75d95-zzqrr:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XX.XX.19.144   |          8080 | NO_MONITOR       |      1 |
| 1aaf9dcd-1a9b-4d6e-be2b-8a17b88843cd | momo/echo-02-77b5b75d95-6z69w:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XX.XX.19.152   |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+


[openshift@master-2 ~]$ oc scale --replicas=1 deployment.apps/echo-01
deployment.apps/echo-01 scaled
[openshift@master-2 ~]$ oc scale --replicas=1 deployment.apps/echo-02
deployment.apps/echo-02 scaled

dev1/mydc $  openstack loadbalancer member list f8745fbd-c001-4d60-90ba-6ff9b68cf8ce
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name                               | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| f2b2b209-20e9-41ff-b6f6-584ddc893e4d | momo/echo-02-77b5b75d95-vsg7k:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XX.XX.19.141   |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
dev1/mydc $  openstack loadbalancer member list 0a3f85b6-941e-4ba4-9c76-b9dcc9da64d0
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name                               | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| 06009a72-529c-484b-86be-63e23b58d532 | momo/echo-01-5c8c6d56c8-tzr4v:8080 | 5499a185863a469ba0f8d724e886184f | ACTIVE              | XX.XX.19.133   |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+------------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+

Comment 5 Brendan Shephard 2020-11-15 23:32:24 UTC
Hey Mo,

Yeah, I believe the issue here has been identified as some missing logic that would handle a scale to 0 case:
https://github.com/openshift/kuryr-kubernetes/blob/release-3.11/kuryr_kubernetes/controller/handlers/lbaas.py#L237-L244


I also believe that the Kuryr engineering team are now working on applying this logic to resolve the issue.

Comment 7 rlobillo 2020-11-20 10:51:09 UTC
Verified on OCP4.7.0-0.nightly-2020-11-18-203317 over OSP16.1 with OVN-Octavia (RHOS-16.1-RHEL-8-20201110.n.1)

creating a deployment with 3 replicas and service with below files:

$ cat demo_deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo
  labels:
    app: demo
spec:
  replicas: 3
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      containers:
      - name: demo
        image: kuryr/demo
        ports:
        - containerPort: 8080

$ cat demo_svc.yaml 
apiVersion: v1
kind: Service
metadata:
  name: demo
labels:
  app: demo
spec:
  selector:                  
    app: demo
  ports:
  - port: 80
    protocol: TCP
    targetPort: 8080

The result:

$ oc get all
NAME                       READY   STATUS    RESTARTS   AGE
pod/demo-66cdc7b66-558q4   1/1     Running   0          35s
pod/demo-66cdc7b66-6r6xz   1/1     Running   0          35s
pod/demo-66cdc7b66-lqgwm   1/1     Running   0          35s

NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/demo   ClusterIP   172.30.253.185   <none>        80/TCP    20m

NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/demo   3/3     3            3           21m

NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/demo-66cdc7b66   3         3         3       21m

and 

$ openstack loadbalancer show test/demo
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| admin_state_up      | True                                 |
| created_at          | 2020-11-20T10:29:10                  |
| description         |                                      |
| flavor_id           | None                                 |
| id                  | ea3e7a80-c506-4470-8f03-ffbb54c3582d |
| listeners           | a06a2ac5-29d9-4662-8bfc-ae8efede9dec |
| name                | test/demo                            |
| operating_status    | ONLINE                               |
| pools               | 7445a7f9-3828-4683-9578-8282b60c98bf |
| project_id          | 09384e0f276445b8b369945abd83baf0     |
| provider            | ovn                                  |
| provisioning_status | ACTIVE                               |
| updated_at          | 2020-11-20T10:47:25                  |
| vip_address         | 172.30.253.185                       |
| vip_network_id      | 707947c5-b9ef-416d-a50b-610b8d0c9288 |
| vip_port_id         | 114b59bb-cc40-4ed6-b3da-befd30767725 |
| vip_qos_policy_id   | None                                 |
| vip_subnet_id       | a41d615c-c5e7-4ae4-9b54-139262d060c2 |
+---------------------+--------------------------------------+


$ openstack loadbalancer member list 7445a7f9-3828-4683-9578-8282b60c98bf                                                                                  
+--------------------------------------+--------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| id                                   | name                           | project_id                       | provisioning_status | address        | protocol_port | operating_status | weight |
+--------------------------------------+--------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+
| 8aeca25e-0aed-4047-836e-5ffb08ea6cfc | test/demo-66cdc7b66-6r6xz:8080 | 09384e0f276445b8b369945abd83baf0 | ACTIVE              | 10.128.118.121 |          8080 | NO_MONITOR       |      1 |
| 699f911d-040d-4413-a1c0-78ce6d9127c2 | test/demo-66cdc7b66-558q4:8080 | 09384e0f276445b8b369945abd83baf0 | ACTIVE              | 10.128.119.199 |          8080 | NO_MONITOR       |      1 |
| 79917927-1c2a-452d-a34e-58b8d4bb721e | test/demo-66cdc7b66-lqgwm:8080 | 09384e0f276445b8b369945abd83baf0 | ACTIVE              | 10.128.118.53  |          8080 | NO_MONITOR       |      1 |
+--------------------------------------+--------------------------------+----------------------------------+---------------------+----------------+---------------+------------------+--------+

now, scaling to 0 with below command:

$ oc scale --replicas=0 deployment.apps/demo
deployment.apps/demo scaled


After a while, all the members are removed from the pool:

$ openstack loadbalancer member list 7445a7f9-3828-4683-9578-8282b60c98bf

$

Comment 10 errata-xmlrpc 2021-02-24 15:32:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.