Bug 2239459

Summary: Octavia Active Standby Amphora goes to error after failure
Product: Red Hat OpenStack Reporter: Gregory Thiemonge <gthiemon>
Component: openstack-octaviaAssignee: Gregory Thiemonge <gthiemon>
Status: CLOSED ERRATA QA Contact: Bruna Bonguardo <bbonguar>
Severity: medium Docs Contact: Greg Rakauskas <gregraka>
Priority: medium    
Version: 17.1 (Wallaby)CC: bbonguar, ggrimaux, gregraka, gthiemon, tweining
Target Milestone: z2Keywords: Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-octavia-8.0.2-17.1.20231016170842.907b8b1.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2237225 Environment:
Last Closed: 2024-01-16 14:31:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2237225    
Bug Blocks:    

Description Gregory Thiemonge 2023-09-18 13:41:19 UTC
+++ This bug was initially created as a clone of Bug #2237225 +++

Description of problem:
Customer is building an active/standby LoadBalancer with 2 amphoras.
Then he simulates a disaster/outage by shutting down both amphoras (openstack server stop) and look at recovery.

He noticed that after the rebuilt of the master, the backup amphora goes in error state:
openstack loadbalancer amphora list
+--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+
| id                                   | loadbalancer_id                      | status    | role   | lb_network_ip | ha_ip      |
+--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+
| 9007868a-ab80-43f8-aa80-96a6ccff1e9e | c10cf3e5-8655-41ce-8e53-9cf7243fea62 | ERROR     | BACKUP | 172.21.2.97   | 10.0.0.214 |
| f3b02d22-0573-468f-9518-5db89b3471b5 | c10cf3e5-8655-41ce-8e53-9cf7243fea62 | ALLOCATED | MASTER | 172.21.2.55   | 10.0.0.214 |
+--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+

The backup amphora is ultimately rebuilt but it took about 40 minutes to complete.

We need your help to understand why it goes to error state and why it takes so long to recover.

We have sosreport with octavia in debug mode attached to the case.

Version-Release number of selected component (if applicable):
OSP 16.2.4
puppet-octavia-15.5.1-2.20220821005128.a56b33a.el8ost.noarch 
openstack-octavia-common-5.1.3-2.20220927125110.57a6265.el8ost.noarch

How reproducible:
100% can be reproduced at will

Steps to Reproduce:
1. Create LB with active/standby
2. shutdown both instances
3. standby instance will go in error and will recover later up to 40 minutes after.

Actual results:
Long recovery of the amphoras during a disaster situation.

Expected results:
Very quick recovery

Additional info:
sosreport with octavia in debug

--- Additional comment from Gregory Thiemonge on 2023-09-04 12:18:29 UTC ---

There are 2 issues, I created 2 launchpad bugs:

- failover of ACTIVE_STANDBY LBs can take a lot of time in amphorav1 https://bugs.launchpad.net/octavia/+bug/2033894
- a failover of an ACTIVE_STANDBY LB recreate only one amphora when both amps are failing https://bugs.launchpad.net/octavia/+bug/2033734

Note: the amphora in ERROR status can be recreated manually with: openstack loadbalancer amphora failover <amp_id> (a loadbalancer failover can also fix it)

Comment 14 errata-xmlrpc 2024-01-16 14:31:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 17.1.2 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:0209