Description of problem: Octavia should recover once active load balancers from error state and/or provide users ability to recover load balancers from error state. Version-Release number of selected component (if applicable): OSP 13 - python-octavia-2.0.1-6.d137eaagit.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. create load balancer & wait for active $ openstack loadbalancer create --name lb1 --vip-subnet-id external $ openstack loadbalancer list +--------------------------------------+------+----------------------------------+---------------+---------------------+----------+ | id | name | project_id | vip_address | provisioning_status | provider | +--------------------------------------+------+----------------------------------+---------------+---------------------+----------+ | c3e9485a-7136-45e6-ba3f-d43a0b322746 | lb1 | 8b10c466135449c48332f8d5d3168306 | 192.168.2.152 | ACTIVE | octavia | +--------------------------------------+------+----------------------------------+---------------+---------------------+----------+ 2. show active amphora. $ openstack loadbalancer amphora list +--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+ | id | loadbalancer_id | status | role | lb_network_ip | ha_ip | +--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+ | 03e6b16e-79d2-4a21-a8d1-bc1b406a7f1f | c3e9485a-7136-45e6-ba3f-d43a0b322746 | ALLOCATED | STANDALONE | 172.24.0.3 | 192.168.2.152 | +--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+ $ nova list --all |grep 03e6b16e-79d2-4a21-a8d1-bc1b406a7f1f | 8e0d3faf-984d-46d6-be41-31be7d66f0f6 | amphora-03e6b16e-79d2-4a21-a8d1-bc1b406a7f1f | 75917ec7263a45b699dd610ba8491240 | ACTIVE | - | Running | lb-mgmt-net=172.24.0.3; external=192.168.2.163 | 3. Stop amphora instance via nova. $ nova stop 8e0d3faf-984d-46d6-be41-31be7d66f0f6 Request to stop server 8e0d3faf-984d-46d6-be41-31be7d66f0f6 has been accepted. 4. Show failed load balancer. $ openstack loadbalancer list +--------------------------------------+------+----------------------------------+---------------+---------------------+----------+ | id | name | project_id | vip_address | provisioning_status | provider | +--------------------------------------+------+----------------------------------+---------------+---------------------+----------+ | c3e9485a-7136-45e6-ba3f-d43a0b322746 | lb1 | 8b10c466135449c48332f8d5d3168306 | 192.168.2.152 | ERROR | octavia | +--------------------------------------+------+----------------------------------+---------------+---------------------+----------+ $ nova show 8e0d3faf-984d-46d6-be41-31be7d66f0f6 ERROR (CommandError): No server with a name or ID of '8e0d3faf-984d-46d6-be41-31be7d66f0f6' exists. Actual results: Load balancer stays in ERROR state until deleted Expected results: Load balancer recovered Additional info:
Does the load balancer have to be recreated to recover?
Hi Matt, we had a bunch of fixes backported in this area. Can you please tell me which RPM version did you use?
Also, please add SOS reports.
(In reply to Nir Magnezi from comment #2) > Hi Matt, > > we had a bunch of fixes backported in this area. > Can you please tell me which RPM version did you use? [root@overcloud-controller-0 ~]# docker exec -ti octavia_api rpm -qa |grep octavia openstack-octavia-common-2.0.1-6.d137eaagit.el7ost.noarch openstack-octavia-api-2.0.1-6.d137eaagit.el7ost.noarch puppet-octavia-12.4.0-2.el7ost.noarch python-octavia-2.0.1-6.d137eaagit.el7ost.noarch (In reply to Nir Magnezi from comment #3) > Also, please add SOS reports. Attaching. I enabled debug and here are some time stamps during the testing. $ date; date -u; openstack loadbalancer create --name lb1 --vip-subnet-id external Thu Nov 1 11:13:03 EDT 2018 Thu Nov 1 15:13:03 UTC 2018 +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | admin_state_up | True | | created_at | 2018-11-01T15:13:10 | | description | | | flavor | | | id | b146bc5e-132d-4ebf-b9cb-ba6231893fd2 | | listeners | | | name | lb1 | | operating_status | OFFLINE | | pools | | | project_id | 088df85965664cd081ddb740378bc3be | | provider | octavia | | provisioning_status | PENDING_CREATE | | updated_at | None | | vip_address | 192.168.2.154 | | vip_network_id | d2e8db94-f340-4075-8316-a4eea49cff0d | | vip_port_id | 418971e3-5494-4267-a58a-ac437a79ce07 | | vip_qos_policy_id | None | | vip_subnet_id | 720f6e67-90f2-431a-b4c3-82024fcb62b9 | +---------------------+--------------------------------------+ $ openstack loadbalancer list +--------------------------------------+------+----------------------------------+---------------+---------------------+----------+ | id | name | project_id | vip_address | provisioning_status | provider | +--------------------------------------+------+----------------------------------+---------------+---------------------+----------+ | b146bc5e-132d-4ebf-b9cb-ba6231893fd2 | lb1 | 088df85965664cd081ddb740378bc3be | 192.168.2.154 | ACTIVE | octavia | +--------------------------------------+------+----------------------------------+---------------+---------------------+----------+ $ openstack loadbalancer amphora list +--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+ | id | loadbalancer_id | status | role | lb_network_ip | ha_ip | +--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+ | 57ad2140-be7a-4780-bcb4-752fcbfb73f7 | b146bc5e-132d-4ebf-b9cb-ba6231893fd2 | ALLOCATED | STANDALONE | 172.24.0.16 | 192.168.2.154 | +--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+ $ nova list --all |grep 57ad2140-be7a-4780-bcb4-752fcbfb73f7 | f814da2f-193c-44f4-af70-a5d01810731d | amphora-57ad2140-be7a-4780-bcb4-752fcbfb73f7 | d85d624c57674dd48d8e85386bb37d32 | ACTIVE | - | Running | lb-mgmt-net=172.24.0.16; external=192.168.2.158 | $ date; date -u; nova stop f814da2f-193c-44f4-af70-a5d01810731d Thu Nov 1 11:26:20 EDT 2018 Thu Nov 1 15:26:20 UTC 2018 Request to stop server f814da2f-193c-44f4-af70-a5d01810731d has been accepted. $ openstack loadbalancer list +--------------------------------------+------+----------------------------------+---------------+---------------------+----------+ | id | name | project_id | vip_address | provisioning_status | provider | +--------------------------------------+------+----------------------------------+---------------+---------------------+----------+ | b146bc5e-132d-4ebf-b9cb-ba6231893fd2 | lb1 | 088df85965664cd081ddb740378bc3be | 192.168.2.154 | ERROR | octavia | +--------------------------------------+------+----------------------------------+---------------+---------------------+----------+
From sosreports I see that Octavia health manager in fact detected the missing amphora (Nova instance) and rightfully triggered an amphora failover but then failed. This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1577976 with a patch submitted upstream (still under review). Closing this as duplicate of #1577976. Feel free to reopen it if needed. |__Flow 'octavia-failover-amphora-flow-octavia-get-amphora-for-lb-subflow' |__Atom 'octavia.controller.worker.tasks.database_tasks.GetAmphoraDetails' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_models.Amph ora object at 0x7f5b010abcd0>}, 'provides': <octavia.common.data_models.Amphora object at 0x7f5afa86f850>} |__Atom 'octavia.controller.worker.tasks.database_tasks.MarkAmphoraDeletedInDB' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_mod els.Amphora object at 0x7f5b010abcd0>}, 'provides': None} |__Atom 'octavia.controller.worker.tasks.database_tasks.DisableAmphoraHealthMonitoring' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.comm on.data_models.Amphora object at 0x7f5b010abcd0>}, 'provides': None} |__Atom 'octavia.controller.worker.tasks.network_tasks.WaitForPortDetach' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_mod els.Amphora object at 0x7f5b010abcd0>}, 'provides': None} |__Atom 'octavia.controller.worker.tasks.compute_tasks.ComputeDelete' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_mode ls.Amphora object at 0x7f5b010abcd0>}, 'provides': None} |__Atom 'octavia.controller.worker.tasks.database_tasks.MarkAmphoraHealthBusy' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.comm on.data_models.Amphora object at 0x7f5b010abcd0>}, 'provides': None} |__Atom 'octavia.controller.worker.tasks.database_tasks.MarkAmphoraPendingDeleteInDB' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <oc tavia.common.data_models.Amphora object at 0x7f5b010abcd0>}, 'provides': None} |__Atom 'octavia.controller.worker.tasks.lifecycle_tasks.AmphoraToErrorOnRevertTask' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_models.Amphora object at 0x7f5b010abcd0>}, 'provides': None} |__Flow 'octavia-failover-amphora-flow': AddrFormatError: failed to detect a valid IP address from None 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker Traceback (most recent call last): 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker result = task.execute(**arguments) 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/octavia/controller/worker/tasks/amphora_driver_tasks.py", line 219, in execute 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker amphora, loadbalancer, amphorae_network_config) 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 137, in post_vip_plug 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker net_info) 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 388, in plug_vip 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker json=net_info) 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 255, in request 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker _url = self._base_url(amp.lb_network_ip) + path 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 241, in _base_url 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker if utils.is_ipv6_lla(ip): 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/octavia/common/utils.py", line 64, in is_ipv6_lla 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker ip = netaddr.IPAddress(ip_address) 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker File "/usr/lib/python2.7/site-packages/netaddr/ip/__init__.py", line 306, in __init__ 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker 'address from %r' % addr) 2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker AddrFormatError: failed to detect a valid IP address from None *** This bug has been marked as a duplicate of bug 1577976 ***