Bug 1576434

Summary: Amphorae ACTIVE_STANDBY topology fail to recover when the amphora-agent stops working
Product: Red Hat OpenStack Reporter: Nir Magnezi <nmagnezi>
Component: openstack-octaviaAssignee: Carlos Goncalves <cgoncalves>
Status: CLOSED DUPLICATE QA Contact: Bruna Bonguardo <bbonguar>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: amuller, cgoncalves, fiezzi, ihrachys, lpeer, majopela
Target Milestone: z8Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-01 12:42:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1698576    
Attachments:
Description Flags
Failover attempt logs
none
lb deletion fails none

Description Nir Magnezi 2018-05-09 12:57:24 UTC
Created attachment 1433832 [details]
Failover attempt logs

Description of problem:
=======================
Amphorae ACTIVE_STANDBY topology fail to recover when the amphora-agent stops working.
Tested this with a single controller topology.


Version-Release number of selected component (if applicable):
=============================================================
OSP13
openstack-octavia-common-2.0.1-4.el7ost.noarch
openstack-octavia-health-manager-2.0.1-4.el7ost.noarch
python-octavia-2.0.1-4.el7ost.noarch
openstack-octavia-api-2.0.1-4.el7ost.noarch
openstack-octavia-housekeeping-2.0.1-4.el7ost.noarch
openstack-octavia-worker-2.0.1-4.el7ost.noarch

Steps to Reproduce:
===================
1. Change amphora topology to ACTIVE_STANDBY
2. Restart Octavia services
3. Create a loadbalancer
4. Switch off the amphora-agent on the MASTER amphora

Actual results:
===============
Loadbalancer ends up in an ERROR state

Expected results:
=================
Should failover to the BACKUP amphora and spawn a new amphora as BACKUP.

Additional info:
================
Attaching logs.

Comment 1 Nir Magnezi 2018-05-09 12:59:16 UTC
The end result:
$ openstack loadbalancer show nir_ha | grep provisioning_status
| provisioning_status | ERROR

Comment 2 Nir Magnezi 2018-05-09 13:01:19 UTC
Created attachment 1433833 [details]
lb deletion fails

Also fails to delete the ERROR state loadbalancer

Comment 6 Carlos Goncalves 2019-10-01 12:19:54 UTC
(In reply to Nir Magnezi from comment #2)
> Created attachment 1433833 [details]
> lb deletion fails
> 
> Also fails to delete the ERROR state loadbalancer

Fixed in https://review.opendev.org/#/c/574215/

Comment 7 Carlos Goncalves 2019-10-01 12:42:36 UTC
Looking at the log in comment #0, I see this:

2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker Traceback (most recent call last):
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     result = task.execute(**arguments)
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/controller/worker/tasks/amphora_driver_tasks.py", line 219, in execute
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     amphora, loadbalancer, amphorae_network_config)
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 137, in post_vip_plug
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     net_info)
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 388, in plug_vip
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     json=net_info)
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 255, in request
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     _url = self._base_url(amp.lb_network_ip) + path
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 241, in _base_url
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     if utils.is_ipv6_lla(ip):
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/common/utils.py", line 64, in is_ipv6_lla
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     ip = netaddr.IPAddress(ip_address)
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/netaddr/ip/__init__.py", line 306, in __init__
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     'address from %r' % addr)
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker AddrFormatError: failed to detect a valid IP address from None

I believe this was fixed in BZ #1577976.
Closing as duplicate.

*** This bug has been marked as a duplicate of bug 1577976 ***