Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1577976

Summary: Disabling heartbeat interface for failover, LB goes into error state, New backup amphora is not created
Product: Red Hat OpenStack Reporter: Alexander Stafeyev <astafeye>
Component: openstack-tripleo-commonAssignee: Carlos Goncalves <cgoncalves>
Status: CLOSED ERRATA QA Contact: Alexander Stafeyev <astafeye>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: amuller, bcafarel, ccopello, cgoncalves, ebarrera, ihrachys, lpeer, majopela, mburns, mflusche, nmagnezi, pratik.bandarkar, slinaber
Target Milestone: z5Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-8.6.6-10.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1655431 (view as bug list) Environment:
Last Closed: 2019-03-14 13:54:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1655431    
Bug Blocks: 1547043, 1698576    

Description Alexander Stafeyev 2018-05-14 14:25:29 UTC
On active standby configuration: 
We disabled the MASTER management port in order to block heartbit messages and trigger failover. 

The LB goes to error state, new backup amphora is not created, the backup amphora ( which is now functional ) is still shown as BACKUP 

(overcloud) [stack@undercloud-0 ~]$  openstack loadbalancer amphora list | grep 192.168.1.16
| 5f48afa4-8d58-4960-998e-c714d10ab21f | 7bf8921e-6d37-4667-82c3-6b3ae3410af8 | ALLOCATED | MASTER | 192.168.199.51 | 192.168.1.16 |
| e232e520-4bd5-403c-880a-361a54990df8 | 7bf8921e-6d37-4667-82c3-6b3ae3410af8 | ALLOCATED | BACKUP | 192.168.199.56 | 192.168.1.16 |

(overcloud) [stack@undercloud-0 ~]$ openstack port list | grep 192.168.199.51
| 1671769a-a790-4c56-8d97-d102d2630052 |                                                             | fa:16:3e:62:c7:32 | ip_address='192.168.199.51', subnet_id='06096d99-a0bc-4533-a6c8-f8824dddbf2e' | ACTIVE |


(overcloud) [stack@undercloud-0 ~]$ openstack port set 1671769a-a790-4c56-8d97-d102d2630052 --disable 


After port disabling: 

(overcloud) [stack@undercloud-0 ~]$  openstack loadbalancer amphora list | grep 192.168.1.16
| e232e520-4bd5-403c-880a-361a54990df8 | 7bf8921e-6d37-4667-82c3-6b3ae3410af8 | ALLOCATED | BACKUP | 192.168.199.56 | 192.168.1.16 |




2018-05-14 14:11:51.038 22 ERROR octavia.controller.worker.controller_worker [req-94d8360b-097b-44d0-a9aa-22fb363a353a - 64ba63c12a9a46288fd4623295d81bc0 - - -] Failover exception: failed to detect a valid IP address from None: AddrFormatError: failed to detect a valid IP address from None
2018-05-14 14:11:51.060 22 INFO octavia.controller.healthmanager.health_manager [-] Attempted 1 failovers of amphora
2018-05-14 14:11:51.060 22 INFO octavia.controller.healthmanager.health_manager [-] Failed at 1 failovers of amphora
2018-05-14 14:11:51.061 22 INFO octavia.controller.healthmanager.health_manager [-] Cancelled 0 failovers of amphora
2018-05-14 14:11:51.061 22 INFO octavia.controller.healthmanager.health_manager [-] Successfully completed 0 failovers of amphora



******* Traffic is still handled properly.

Comment 2 Carlos Goncalves 2018-08-23 07:57:29 UTC
This issue turned out to be caused by a misconfiguration of Octavia by the deployment tool (TripleO) that is only setting [controller_worker]/amp_boot_network_list in the worker where is should also have been set to the health manager. An amphora instance (Nova instance) was created but did not get a Neutron port created and attached to the lb-mgmt-net. A fix is required in tripleo-common.

Still, Octavia should validate and error on configuration parameters with no default values to prevent cases like this.

Comment 3 Carlos Goncalves 2018-08-23 12:29:44 UTC
(In reply to Carlos Goncalves from comment #2)
> Still, Octavia should validate and error on configuration parameters with no
> default values to prevent cases like this.

https://review.openstack.org/#/c/595578/

Comment 5 Carlos Goncalves 2018-11-07 15:51:30 UTC
*** Bug 1641827 has been marked as a duplicate of this bug. ***

Comment 6 Pratik Bandarkar 2019-01-15 19:12:54 UTC
Is there a workaround until the issue is fixed? maybe some manual configuration in "health manager"(octavia-health-manager/manager-post-deploy.conf) on all controllers?

Comment 7 Carlos Goncalves 2019-01-16 11:56:52 UTC
Set [controller_worker]/amp_boot_network_list and [controller_worker]/amp_secgroup_list in the configuration file of the health manager on all controllers and restart the container.

Comment 8 Pratik Bandarkar 2019-01-17 09:19:12 UTC
(In reply to Carlos Goncalves from comment #7)
> Set [controller_worker]/amp_boot_network_list and
> [controller_worker]/amp_secgroup_list in the configuration file of the
> health manager on all controllers and restart the container.

Thanks, Carlos. I have made the changes. But, unfortunately, I am facing different issue while re-creating amphora instances. I have created a support case with Red Hat.

Comment 31 errata-xmlrpc 2019-03-14 13:54:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0448

Comment 32 Carlos Goncalves 2019-10-01 12:42:36 UTC
*** Bug 1576434 has been marked as a duplicate of this bug. ***