1577976 – Disabling heartbeat interface for failover, LB goes into error state, New backup amphora is not created

Bug 1577976 - Disabling heartbeat interface for failover, LB goes into error state, New backup amphora is not created

Summary: Disabling heartbeat interface for failover, LB goes into error state, New bac...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-common
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z5
Target Release:	13.0 (Queens)
Assignee:	Carlos Goncalves
QA Contact:	Alexander Stafeyev
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1576434 1641827 (view as bug list)
Depends On:	1655431
Blocks:	1547043 1698576
TreeView+	depends on / blocked

Reported:	2018-05-14 14:25 UTC by Alexander Stafeyev
Modified:	2022-03-13 14:59 UTC (History)
CC List:	13 users (show)
Fixed In Version:	openstack-tripleo-common-8.6.6-10.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1655431 (view as bug list)
Environment:
Last Closed:	2019-03-14 13:54:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1788571	None	None	None	2018-08-23 08:13:01 UTC
OpenStack gerrit	596373	'None'	MERGED	Move Octavia config opts to common config directory	2021-02-14 23:41:05 UTC
OpenStack gerrit	624305	'None'	MERGED	Move Octavia config opts to common config directory	2021-02-14 23:41:05 UTC
Red Hat Product Errata	RHBA-2019:0448	None	None	None	2019-03-14 13:54:58 UTC
Storyboard	2003052	None	None	None	2018-08-23 07:57:29 UTC

Description Alexander Stafeyev 2018-05-14 14:25:29 UTC

On active standby configuration: 
We disabled the MASTER management port in order to block heartbit messages and trigger failover. 

The LB goes to error state, new backup amphora is not created, the backup amphora ( which is now functional ) is still shown as BACKUP 

(overcloud) [stack@undercloud-0 ~]$  openstack loadbalancer amphora list | grep 192.168.1.16
| 5f48afa4-8d58-4960-998e-c714d10ab21f | 7bf8921e-6d37-4667-82c3-6b3ae3410af8 | ALLOCATED | MASTER | 192.168.199.51 | 192.168.1.16 |
| e232e520-4bd5-403c-880a-361a54990df8 | 7bf8921e-6d37-4667-82c3-6b3ae3410af8 | ALLOCATED | BACKUP | 192.168.199.56 | 192.168.1.16 |

(overcloud) [stack@undercloud-0 ~]$ openstack port list | grep 192.168.199.51
| 1671769a-a790-4c56-8d97-d102d2630052 |                                                             | fa:16:3e:62:c7:32 | ip_address='192.168.199.51', subnet_id='06096d99-a0bc-4533-a6c8-f8824dddbf2e' | ACTIVE |


(overcloud) [stack@undercloud-0 ~]$ openstack port set 1671769a-a790-4c56-8d97-d102d2630052 --disable 


After port disabling: 

(overcloud) [stack@undercloud-0 ~]$  openstack loadbalancer amphora list | grep 192.168.1.16
| e232e520-4bd5-403c-880a-361a54990df8 | 7bf8921e-6d37-4667-82c3-6b3ae3410af8 | ALLOCATED | BACKUP | 192.168.199.56 | 192.168.1.16 |




2018-05-14 14:11:51.038 22 ERROR octavia.controller.worker.controller_worker [req-94d8360b-097b-44d0-a9aa-22fb363a353a - 64ba63c12a9a46288fd4623295d81bc0 - - -] Failover exception: failed to detect a valid IP address from None: AddrFormatError: failed to detect a valid IP address from None
2018-05-14 14:11:51.060 22 INFO octavia.controller.healthmanager.health_manager [-] Attempted 1 failovers of amphora
2018-05-14 14:11:51.060 22 INFO octavia.controller.healthmanager.health_manager [-] Failed at 1 failovers of amphora
2018-05-14 14:11:51.061 22 INFO octavia.controller.healthmanager.health_manager [-] Cancelled 0 failovers of amphora
2018-05-14 14:11:51.061 22 INFO octavia.controller.healthmanager.health_manager [-] Successfully completed 0 failovers of amphora



******* Traffic is still handled properly.

Comment 2 Carlos Goncalves 2018-08-23 07:57:29 UTC

This issue turned out to be caused by a misconfiguration of Octavia by the deployment tool (TripleO) that is only setting [controller_worker]/amp_boot_network_list in the worker where is should also have been set to the health manager. An amphora instance (Nova instance) was created but did not get a Neutron port created and attached to the lb-mgmt-net. A fix is required in tripleo-common.

Still, Octavia should validate and error on configuration parameters with no default values to prevent cases like this.

Comment 3 Carlos Goncalves 2018-08-23 12:29:44 UTC

(In reply to Carlos Goncalves from comment #2)
> Still, Octavia should validate and error on configuration parameters with no
> default values to prevent cases like this.

https://review.openstack.org/#/c/595578/

Comment 5 Carlos Goncalves 2018-11-07 15:51:30 UTC

*** Bug 1641827 has been marked as a duplicate of this bug. ***

Comment 6 Pratik Bandarkar 2019-01-15 19:12:54 UTC

Is there a workaround until the issue is fixed? maybe some manual configuration in "health manager"(octavia-health-manager/manager-post-deploy.conf) on all controllers?

Comment 7 Carlos Goncalves 2019-01-16 11:56:52 UTC

Set [controller_worker]/amp_boot_network_list and [controller_worker]/amp_secgroup_list in the configuration file of the health manager on all controllers and restart the container.

Comment 8 Pratik Bandarkar 2019-01-17 09:19:12 UTC

(In reply to Carlos Goncalves from comment #7)
> Set [controller_worker]/amp_boot_network_list and
> [controller_worker]/amp_secgroup_list in the configuration file of the
> health manager on all controllers and restart the container.

Thanks, Carlos. I have made the changes. But, unfortunately, I am facing different issue while re-creating amphora instances. I have created a support case with Red Hat.

Comment 31 errata-xmlrpc 2019-03-14 13:54:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0448

Comment 32 Carlos Goncalves 2019-10-01 12:42:36 UTC

*** Bug 1576434 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.