Bug 1577976 - Disabling heartbeat interface for failover, LB goes into error state, New backup amphora is not created
Summary: Disabling heartbeat interface for failover, LB goes into error state, New bac...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z5
: 13.0 (Queens)
Assignee: Carlos Goncalves
QA Contact: Alexander Stafeyev
URL:
Whiteboard:
: 1576434 1641827 (view as bug list)
Depends On: 1655431
Blocks: 1547043 1698576
TreeView+ depends on / blocked
 
Reported: 2018-05-14 14:25 UTC by Alexander Stafeyev
Modified: 2022-03-13 14:59 UTC (History)
13 users (show)

Fixed In Version: openstack-tripleo-common-8.6.6-10.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1655431 (view as bug list)
Environment:
Last Closed: 2019-03-14 13:54:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1788571 0 None None None 2018-08-23 08:13:01 UTC
OpenStack gerrit 596373 0 'None' MERGED Move Octavia config opts to common config directory 2021-02-14 23:41:05 UTC
OpenStack gerrit 624305 0 'None' MERGED Move Octavia config opts to common config directory 2021-02-14 23:41:05 UTC
Red Hat Product Errata RHBA-2019:0448 0 None None None 2019-03-14 13:54:58 UTC
Storyboard 2003052 0 None None None 2018-08-23 07:57:29 UTC

Description Alexander Stafeyev 2018-05-14 14:25:29 UTC
On active standby configuration: 
We disabled the MASTER management port in order to block heartbit messages and trigger failover. 

The LB goes to error state, new backup amphora is not created, the backup amphora ( which is now functional ) is still shown as BACKUP 

(overcloud) [stack@undercloud-0 ~]$  openstack loadbalancer amphora list | grep 192.168.1.16
| 5f48afa4-8d58-4960-998e-c714d10ab21f | 7bf8921e-6d37-4667-82c3-6b3ae3410af8 | ALLOCATED | MASTER | 192.168.199.51 | 192.168.1.16 |
| e232e520-4bd5-403c-880a-361a54990df8 | 7bf8921e-6d37-4667-82c3-6b3ae3410af8 | ALLOCATED | BACKUP | 192.168.199.56 | 192.168.1.16 |

(overcloud) [stack@undercloud-0 ~]$ openstack port list | grep 192.168.199.51
| 1671769a-a790-4c56-8d97-d102d2630052 |                                                             | fa:16:3e:62:c7:32 | ip_address='192.168.199.51', subnet_id='06096d99-a0bc-4533-a6c8-f8824dddbf2e' | ACTIVE |


(overcloud) [stack@undercloud-0 ~]$ openstack port set 1671769a-a790-4c56-8d97-d102d2630052 --disable 


After port disabling: 

(overcloud) [stack@undercloud-0 ~]$  openstack loadbalancer amphora list | grep 192.168.1.16
| e232e520-4bd5-403c-880a-361a54990df8 | 7bf8921e-6d37-4667-82c3-6b3ae3410af8 | ALLOCATED | BACKUP | 192.168.199.56 | 192.168.1.16 |




2018-05-14 14:11:51.038 22 ERROR octavia.controller.worker.controller_worker [req-94d8360b-097b-44d0-a9aa-22fb363a353a - 64ba63c12a9a46288fd4623295d81bc0 - - -] Failover exception: failed to detect a valid IP address from None: AddrFormatError: failed to detect a valid IP address from None
2018-05-14 14:11:51.060 22 INFO octavia.controller.healthmanager.health_manager [-] Attempted 1 failovers of amphora
2018-05-14 14:11:51.060 22 INFO octavia.controller.healthmanager.health_manager [-] Failed at 1 failovers of amphora
2018-05-14 14:11:51.061 22 INFO octavia.controller.healthmanager.health_manager [-] Cancelled 0 failovers of amphora
2018-05-14 14:11:51.061 22 INFO octavia.controller.healthmanager.health_manager [-] Successfully completed 0 failovers of amphora



******* Traffic is still handled properly.

Comment 2 Carlos Goncalves 2018-08-23 07:57:29 UTC
This issue turned out to be caused by a misconfiguration of Octavia by the deployment tool (TripleO) that is only setting [controller_worker]/amp_boot_network_list in the worker where is should also have been set to the health manager. An amphora instance (Nova instance) was created but did not get a Neutron port created and attached to the lb-mgmt-net. A fix is required in tripleo-common.

Still, Octavia should validate and error on configuration parameters with no default values to prevent cases like this.

Comment 3 Carlos Goncalves 2018-08-23 12:29:44 UTC
(In reply to Carlos Goncalves from comment #2)
> Still, Octavia should validate and error on configuration parameters with no
> default values to prevent cases like this.

https://review.openstack.org/#/c/595578/

Comment 5 Carlos Goncalves 2018-11-07 15:51:30 UTC
*** Bug 1641827 has been marked as a duplicate of this bug. ***

Comment 6 Pratik Bandarkar 2019-01-15 19:12:54 UTC
Is there a workaround until the issue is fixed? maybe some manual configuration in "health manager"(octavia-health-manager/manager-post-deploy.conf) on all controllers?

Comment 7 Carlos Goncalves 2019-01-16 11:56:52 UTC
Set [controller_worker]/amp_boot_network_list and [controller_worker]/amp_secgroup_list in the configuration file of the health manager on all controllers and restart the container.

Comment 8 Pratik Bandarkar 2019-01-17 09:19:12 UTC
(In reply to Carlos Goncalves from comment #7)
> Set [controller_worker]/amp_boot_network_list and
> [controller_worker]/amp_secgroup_list in the configuration file of the
> health manager on all controllers and restart the container.

Thanks, Carlos. I have made the changes. But, unfortunately, I am facing different issue while re-creating amphora instances. I have created a support case with Red Hat.

Comment 31 errata-xmlrpc 2019-03-14 13:54:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0448

Comment 32 Carlos Goncalves 2019-10-01 12:42:36 UTC
*** Bug 1576434 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.