+++ This bug was initially created as a clone of Bug #2237225 +++ Description of problem: Customer is building an active/standby LoadBalancer with 2 amphoras. Then he simulates a disaster/outage by shutting down both amphoras (openstack server stop) and look at recovery. He noticed that after the rebuilt of the master, the backup amphora goes in error state: openstack loadbalancer amphora list +--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+ | id | loadbalancer_id | status | role | lb_network_ip | ha_ip | +--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+ | 9007868a-ab80-43f8-aa80-96a6ccff1e9e | c10cf3e5-8655-41ce-8e53-9cf7243fea62 | ERROR | BACKUP | 172.21.2.97 | 10.0.0.214 | | f3b02d22-0573-468f-9518-5db89b3471b5 | c10cf3e5-8655-41ce-8e53-9cf7243fea62 | ALLOCATED | MASTER | 172.21.2.55 | 10.0.0.214 | +--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+ The backup amphora is ultimately rebuilt but it took about 40 minutes to complete. We need your help to understand why it goes to error state and why it takes so long to recover. We have sosreport with octavia in debug mode attached to the case. Version-Release number of selected component (if applicable): OSP 16.2.4 puppet-octavia-15.5.1-2.20220821005128.a56b33a.el8ost.noarch openstack-octavia-common-5.1.3-2.20220927125110.57a6265.el8ost.noarch How reproducible: 100% can be reproduced at will Steps to Reproduce: 1. Create LB with active/standby 2. shutdown both instances 3. standby instance will go in error and will recover later up to 40 minutes after. Actual results: Long recovery of the amphoras during a disaster situation. Expected results: Very quick recovery Additional info: sosreport with octavia in debug --- Additional comment from Gregory Thiemonge on 2023-09-04 12:18:29 UTC --- There are 2 issues, I created 2 launchpad bugs: - failover of ACTIVE_STANDBY LBs can take a lot of time in amphorav1 https://bugs.launchpad.net/octavia/+bug/2033894 - a failover of an ACTIVE_STANDBY LB recreate only one amphora when both amps are failing https://bugs.launchpad.net/octavia/+bug/2033734 Note: the amphora in ERROR status can be recreated manually with: openstack loadbalancer amphora failover <amp_id> (a loadbalancer failover can also fix it)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 17.1.2 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:0209