Created attachment 1433832 [details] Failover attempt logs Description of problem: ======================= Amphorae ACTIVE_STANDBY topology fail to recover when the amphora-agent stops working. Tested this with a single controller topology. Version-Release number of selected component (if applicable): ============================================================= OSP13 openstack-octavia-common-2.0.1-4.el7ost.noarch openstack-octavia-health-manager-2.0.1-4.el7ost.noarch python-octavia-2.0.1-4.el7ost.noarch openstack-octavia-api-2.0.1-4.el7ost.noarch openstack-octavia-housekeeping-2.0.1-4.el7ost.noarch openstack-octavia-worker-2.0.1-4.el7ost.noarch Steps to Reproduce: =================== 1. Change amphora topology to ACTIVE_STANDBY 2. Restart Octavia services 3. Create a loadbalancer 4. Switch off the amphora-agent on the MASTER amphora Actual results: =============== Loadbalancer ends up in an ERROR state Expected results: ================= Should failover to the BACKUP amphora and spawn a new amphora as BACKUP. Additional info: ================ Attaching logs.
The end result: $ openstack loadbalancer show nir_ha | grep provisioning_status | provisioning_status | ERROR
Created attachment 1433833 [details] lb deletion fails Also fails to delete the ERROR state loadbalancer
(In reply to Nir Magnezi from comment #2) > Created attachment 1433833 [details] > lb deletion fails > > Also fails to delete the ERROR state loadbalancer Fixed in https://review.opendev.org/#/c/574215/
Looking at the log in comment #0, I see this: 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker Traceback (most recent call last): 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker result = task.execute(**arguments) 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/octavia/controller/worker/tasks/amphora_driver_tasks.py", line 219, in execute 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker amphora, loadbalancer, amphorae_network_config) 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 137, in post_vip_plug 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker net_info) 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 388, in plug_vip 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker json=net_info) 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 255, in request 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker _url = self._base_url(amp.lb_network_ip) + path 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 241, in _base_url 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker if utils.is_ipv6_lla(ip): 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/octavia/common/utils.py", line 64, in is_ipv6_lla 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker ip = netaddr.IPAddress(ip_address) 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/netaddr/ip/__init__.py", line 306, in __init__ 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker 'address from %r' % addr) 2018-05-09 12:29:02.250 22 ERROR octavia.controller.worker.controller_worker AddrFormatError: failed to detect a valid IP address from None I believe this was fixed in BZ #1577976. Closing as duplicate. *** This bug has been marked as a duplicate of bug 1577976 ***