2040691 – Octavia fails to delete Load Balancer, and sqlalchemy fails to mark Load Balancer to ERROR

Bug 2040691 - Octavia fails to delete Load Balancer, and sqlalchemy fails to mark Load Balancer to ERROR

Summary: Octavia fails to delete Load Balancer, and sqlalchemy fails to mark Load Bala...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-octavia
Sub Component:
Version:	16.2 (Train)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	z2
Target Release:	16.2 (Train on RHEL 8.4)
Assignee:	Gregory Thiemonge
QA Contact:	Bruna Bonguardo
Docs Contact:
URL:
Whiteboard:
Depends On:	2001120
Blocks:	2040697
TreeView+	depends on / blocked

Reported:	2022-01-14 14:23 UTC by Gregory Thiemonge
Modified:	2022-03-23 22:13 UTC (History)
CC List:	9 users (show)
Fixed In Version:	openstack-octavia-5.1.3-2.20220107174846.51c6d8b.el8ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2001120
Clones:	2040697 (view as bug list)
Environment:
Last Closed:	2022-03-23 22:13:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
OpenStack Storyboard	2009652	None	None	None	2022-01-14 14:26:54 UTC
OpenStack gerrit	818093	None	MERGED	Fix LB set in ERROR too early in the revert flow	2022-01-14 14:26:54 UTC
OpenStack gerrit	820350	None	MERGED	Fix LB set in ERROR too early in MapLoadbalancerToAmphora	2022-01-14 14:26:54 UTC
Red Hat Issue Tracker	OSP-12136	None	None	None	2022-01-14 14:34:18 UTC
Red Hat Product Errata	RHBA-2022:1001	None	None	None	2022-03-23 22:13:34 UTC

Description Gregory Thiemonge 2022-01-14 14:23:31 UTC

+++ This bug was initially created as a clone of Bug #2001120 +++

Description of problem:

OSP 13 with ML2/OVS
OCP 3.11 with Kuryr

Please note this is NOT Kuryr bug and we are already implementing workaround(in Kuryr) to this issue by adding retries to DELETE requests sent to Octavia in https://github.com/openshift/kuryr-kubernetes/pull/548

It is still worth mentioning that sometimes when we delete LB in Octavia, we can see that LB not properly marked as DELETED.

In Kuryr logs:
~~~
2021-09-03 07:42:35.297 22599 WARNING kuryr_kubernetes.controller.drivers.lbaasv2 [-] Releasing loadbalancer a708a225-86b8-4a18-9ff1-1d6405e90454 with ERROR status
~~~

In Octavia worker.log:
~~~
2021-09-03 07:42:35.554 25 INFO octavia.controller.queue.v1.endpoints [-] Deleting load balancer 'a708a225-86b8-4a18-9ff1-1d6405e90454'...
~~~

We can see the request to update DB provisioning_status to DELETED
~~~
2021-09-03 07:42:38.832 25 DEBUG octavia.controller.worker.v1.tasks.database_tasks [req-41433bc3-9b2b-413d-a15b-549984d27533 - dc58480f8d864ea9b00ffef263be3819 - - -] Mark DELETED in DB for load balancer id: a708a225-86b8-4a18-9ff1-1d6405e90454 execute /usr/lib/python2.7/site-packages/octavia/controller/worker/v1/tasks/database_tasks.py:1125
~~~

But looking at the DB
~~~
MariaDB [octavia]> select * from load_balancer where id='a708a225-86b8-4a18-9ff1-1d6405e90454' \G;
*************************** 1. row ***************************
         project_id: dc58480f8d864ea9b00ffef263be3819
                 id: a708a225-86b8-4a18-9ff1-1d6405e90454
               name: momo1/lb-momohttpd-02
        description: NULL
provisioning_status: ERROR
   operating_status: OFFLINE
            enabled: 1
           topology: SINGLE
    server_group_id: NULL
         created_at: 2021-09-03 11:42:25
         updated_at: 2021-09-03 11:42:39
           provider: amphora
          flavor_id: NULL
1 row in set (0.00 sec)
~~~

Interestingly it seems that MarkLBDeletedInDB.revert is never executed:
~~~
class MarkLBDeletedInDB(BaseDatabaseTask):
    """Mark the load balancer deleted in the DB.

    Since sqlalchemy will likely retry by itself always revert if it fails
    """

    def execute(self, loadbalancer):
        """Mark the load balancer as deleted in DB.

        :param loadbalancer: Load balancer object to be updated
        :returns: None
        """

        LOG.debug("Mark DELETED in DB for load balancer id: %s",
                  loadbalancer.id)
        self.loadbalancer_repo.update(db_apis.get_session(),
                                      loadbalancer.id,
                                      provisioning_status=constants.DELETED)

    def revert(self, loadbalancer, *args, **kwargs):
        """Mark the load balancer as broken and ready to be cleaned up.

        :param loadbalancer: Load balancer object that failed to update
        :returns: None
        """

        LOG.warning("Reverting mark load balancer deleted in DB "
                    "for load balancer id %s", loadbalancer.id)
        self.task_utils.mark_loadbalancer_prov_status_error(loadbalancer.id)

~~~

Looking at all the Octavia logs:
~~~
()[octavia@osp13-controller0 /]$ grep "Reverting mark load balancer deleted" /var/log/octavia/*
()[octavia@osp13-controller0 /]$ grep "Failed to update load balancer" *
()[octavia@osp13-controller0 /]$
~~~


Octavia failed to remove the LB properly.

I have tried tuning the DB a little bit with following, it improved but still happened (1 in 8 hours)

~~~
[mysqld]
innodb_buffer_pool_instances = 2
innodb_buffer_pool_size = 5 G
innodb_lock_wait_timeout = 120
net_write_timeout = 120
net_read_timeout = 120
connect_timeout = 120
max_connections = 8192
~~~

Comment 12 errata-xmlrpc 2022-03-23 22:13:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.2), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1001

Note You need to log in before you can comment on or make changes to this bug.