Bug 1576434 - Amphorae ACTIVE_STANDBY topology fail to recover when the amphora-agent stops working
Summary: Amphorae ACTIVE_STANDBY topology fail to recover when the amphora-agent stops...
Keywords:
Status: CLOSED DUPLICATE of bug 1577976
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-octavia
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z8
: 13.0 (Queens)
Assignee: Carlos Goncalves
QA Contact: Bruna Bonguardo
URL:
Whiteboard:
Depends On:
Blocks: 1698576
TreeView+ depends on / blocked
 
Reported: 2018-05-09 12:57 UTC by Nir Magnezi
Modified: 2019-10-01 12:42 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-01 12:42:36 UTC
Target Upstream Version:


Attachments (Terms of Use)
Failover attempt logs (40.00 KB, text/plain)
2018-05-09 12:57 UTC, Nir Magnezi
no flags Details
lb deletion fails (16.33 KB, text/plain)
2018-05-09 13:01 UTC, Nir Magnezi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 574215 0 'None' MERGED When SG delete fails on vip deallocate, try harder 2020-02-17 06:03:51 UTC

Description Nir Magnezi 2018-05-09 12:57:24 UTC
Created attachment 1433832 [details]
Failover attempt logs

Description of problem:
=======================
Amphorae ACTIVE_STANDBY topology fail to recover when the amphora-agent stops working.
Tested this with a single controller topology.


Version-Release number of selected component (if applicable):
=============================================================
OSP13
openstack-octavia-common-2.0.1-4.el7ost.noarch
openstack-octavia-health-manager-2.0.1-4.el7ost.noarch
python-octavia-2.0.1-4.el7ost.noarch
openstack-octavia-api-2.0.1-4.el7ost.noarch
openstack-octavia-housekeeping-2.0.1-4.el7ost.noarch
openstack-octavia-worker-2.0.1-4.el7ost.noarch

Steps to Reproduce:
===================
1. Change amphora topology to ACTIVE_STANDBY
2. Restart Octavia services
3. Create a loadbalancer
4. Switch off the amphora-agent on the MASTER amphora

Actual results:
===============
Loadbalancer ends up in an ERROR state

Expected results:
=================
Should failover to the BACKUP amphora and spawn a new amphora as BACKUP.

Additional info:
================
Attaching logs.

Comment 1 Nir Magnezi 2018-05-09 12:59:16 UTC
The end result:
$ openstack loadbalancer show nir_ha | grep provisioning_status
| provisioning_status | ERROR

Comment 2 Nir Magnezi 2018-05-09 13:01:19 UTC
Created attachment 1433833 [details]
lb deletion fails

Also fails to delete the ERROR state loadbalancer

Comment 6 Carlos Goncalves 2019-10-01 12:19:54 UTC
(In reply to Nir Magnezi from comment #2)
> Created attachment 1433833 [details]
> lb deletion fails
> 
> Also fails to delete the ERROR state loadbalancer

Fixed in https://review.opendev.org/#/c/574215/

Comment 7 Carlos Goncalves 2019-10-01 12:42:36 UTC
Looking at the log in comment #0, I see this:

2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker Traceback (most recent call last):
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     result = task.execute(**arguments)
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/controller/worker/tasks/amphora_driver_tasks.py", line 219, in execute
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     amphora, loadbalancer, amphorae_network_config)
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 137, in post_vip_plug
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     net_info)
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 388, in plug_vip
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     json=net_info)
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 255, in request
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     _url = self._base_url(amp.lb_network_ip) + path
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 241, in _base_url
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     if utils.is_ipv6_lla(ip):
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/common/utils.py", line 64, in is_ipv6_lla
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     ip = netaddr.IPAddress(ip_address)
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/netaddr/ip/__init__.py", line 306, in __init__
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker     'address from %r' % addr)
2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker AddrFormatError: failed to detect a valid IP address from None

I believe this was fixed in BZ #1577976.
Closing as duplicate.

*** This bug has been marked as a duplicate of bug 1577976 ***


Note You need to log in before you can comment on or make changes to this bug.