Bug 1641827 - Octavia - recover once active load balancer from ERROR state
Summary: Octavia - recover once active load balancer from ERROR state
Keywords:
Status: CLOSED DUPLICATE of bug 1577976
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-octavia
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Assaf Muller
QA Contact: Alexander Stafeyev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-22 21:58 UTC by Matt Flusche
Modified: 2019-09-10 14:09 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-07 15:51:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Matt Flusche 2018-10-22 21:58:03 UTC
Description of problem:
Octavia should recover once active load balancers from error state and/or provide users ability to recover load balancers from error state.

Version-Release number of selected component (if applicable):
OSP 13 - python-octavia-2.0.1-6.d137eaagit.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. create load balancer & wait for active

$ openstack loadbalancer create --name lb1 --vip-subnet-id external

$ openstack loadbalancer list
+--------------------------------------+------+----------------------------------+---------------+---------------------+----------+
| id                                   | name | project_id                       | vip_address   | provisioning_status | provider |
+--------------------------------------+------+----------------------------------+---------------+---------------------+----------+
| c3e9485a-7136-45e6-ba3f-d43a0b322746 | lb1  | 8b10c466135449c48332f8d5d3168306 | 192.168.2.152 | ACTIVE              | octavia  |
+--------------------------------------+------+----------------------------------+---------------+---------------------+----------+

2. show active amphora.

$ openstack loadbalancer amphora list
+--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+
| id                                   | loadbalancer_id                      | status    | role       | lb_network_ip | ha_ip         |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+
| 03e6b16e-79d2-4a21-a8d1-bc1b406a7f1f | c3e9485a-7136-45e6-ba3f-d43a0b322746 | ALLOCATED | STANDALONE | 172.24.0.3    | 192.168.2.152 |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+

$ nova list --all |grep 03e6b16e-79d2-4a21-a8d1-bc1b406a7f1f
| 8e0d3faf-984d-46d6-be41-31be7d66f0f6 | amphora-03e6b16e-79d2-4a21-a8d1-bc1b406a7f1f | 75917ec7263a45b699dd610ba8491240 | ACTIVE | -          | Running     | lb-mgmt-net=172.24.0.3; external=192.168.2.163 |


3. Stop amphora instance via nova.

$ nova stop 8e0d3faf-984d-46d6-be41-31be7d66f0f6
Request to stop server 8e0d3faf-984d-46d6-be41-31be7d66f0f6 has been accepted.


4. Show failed load balancer.

$ openstack loadbalancer list
+--------------------------------------+------+----------------------------------+---------------+---------------------+----------+
| id                                   | name | project_id                       | vip_address   | provisioning_status | provider |
+--------------------------------------+------+----------------------------------+---------------+---------------------+----------+
| c3e9485a-7136-45e6-ba3f-d43a0b322746 | lb1  | 8b10c466135449c48332f8d5d3168306 | 192.168.2.152 | ERROR               | octavia  |
+--------------------------------------+------+----------------------------------+---------------+---------------------+----------+


$ nova show 8e0d3faf-984d-46d6-be41-31be7d66f0f6
ERROR (CommandError): No server with a name or ID of '8e0d3faf-984d-46d6-be41-31be7d66f0f6' exists.



Actual results:
Load balancer stays in ERROR state until deleted

Expected results:
Load balancer recovered


Additional info:

Comment 1 Matt Flusche 2018-10-23 14:20:32 UTC
Does the load balancer have to be recreated to recover?

Comment 2 Nir Magnezi 2018-10-31 14:50:49 UTC
Hi Matt,

we had a bunch of fixes backported in this area.
Can you please tell me which RPM version did you use?

Comment 3 Nir Magnezi 2018-10-31 14:51:41 UTC
Also, please add SOS reports.

Comment 4 Matt Flusche 2018-11-01 15:35:49 UTC
(In reply to Nir Magnezi from comment #2)
> Hi Matt,
> 
> we had a bunch of fixes backported in this area.
> Can you please tell me which RPM version did you use?

[root@overcloud-controller-0 ~]# docker exec -ti octavia_api rpm -qa |grep octavia
openstack-octavia-common-2.0.1-6.d137eaagit.el7ost.noarch
openstack-octavia-api-2.0.1-6.d137eaagit.el7ost.noarch
puppet-octavia-12.4.0-2.el7ost.noarch
python-octavia-2.0.1-6.d137eaagit.el7ost.noarch

(In reply to Nir Magnezi from comment #3)
> Also, please add SOS reports.

Attaching.

I enabled debug and here are some time stamps during the testing.

$ date; date -u; openstack loadbalancer create --name lb1 --vip-subnet-id external
Thu Nov  1 11:13:03 EDT 2018
Thu Nov  1 15:13:03 UTC 2018
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| admin_state_up      | True                                 |
| created_at          | 2018-11-01T15:13:10                  |
| description         |                                      |
| flavor              |                                      |
| id                  | b146bc5e-132d-4ebf-b9cb-ba6231893fd2 |
| listeners           |                                      |
| name                | lb1                                  |
| operating_status    | OFFLINE                              |
| pools               |                                      |
| project_id          | 088df85965664cd081ddb740378bc3be     |
| provider            | octavia                              |
| provisioning_status | PENDING_CREATE                       |
| updated_at          | None                                 |
| vip_address         | 192.168.2.154                        |
| vip_network_id      | d2e8db94-f340-4075-8316-a4eea49cff0d |
| vip_port_id         | 418971e3-5494-4267-a58a-ac437a79ce07 |
| vip_qos_policy_id   | None                                 |
| vip_subnet_id       | 720f6e67-90f2-431a-b4c3-82024fcb62b9 |
+---------------------+--------------------------------------+

$ openstack loadbalancer list
+--------------------------------------+------+----------------------------------+---------------+---------------------+----------+
| id                                   | name | project_id                       | vip_address   | provisioning_status | provider |
+--------------------------------------+------+----------------------------------+---------------+---------------------+----------+
| b146bc5e-132d-4ebf-b9cb-ba6231893fd2 | lb1  | 088df85965664cd081ddb740378bc3be | 192.168.2.154 | ACTIVE              | octavia  |
+--------------------------------------+------+----------------------------------+---------------+---------------------+----------+

$ openstack loadbalancer amphora list
+--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+
| id                                   | loadbalancer_id                      | status    | role       | lb_network_ip | ha_ip         |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+
| 57ad2140-be7a-4780-bcb4-752fcbfb73f7 | b146bc5e-132d-4ebf-b9cb-ba6231893fd2 | ALLOCATED | STANDALONE | 172.24.0.16   | 192.168.2.154 |
+--------------------------------------+--------------------------------------+-----------+------------+---------------+---------------+

$ nova list --all |grep 57ad2140-be7a-4780-bcb4-752fcbfb73f7
| f814da2f-193c-44f4-af70-a5d01810731d | amphora-57ad2140-be7a-4780-bcb4-752fcbfb73f7 | d85d624c57674dd48d8e85386bb37d32 | ACTIVE  | -          | Running     | lb-mgmt-net=172.24.0.16; external=192.168.2.158 |


$ date; date -u; nova stop f814da2f-193c-44f4-af70-a5d01810731d
Thu Nov  1 11:26:20 EDT 2018
Thu Nov  1 15:26:20 UTC 2018
Request to stop server f814da2f-193c-44f4-af70-a5d01810731d has been accepted.

$  openstack loadbalancer list
+--------------------------------------+------+----------------------------------+---------------+---------------------+----------+
| id                                   | name | project_id                       | vip_address   | provisioning_status | provider |
+--------------------------------------+------+----------------------------------+---------------+---------------------+----------+
| b146bc5e-132d-4ebf-b9cb-ba6231893fd2 | lb1  | 088df85965664cd081ddb740378bc3be | 192.168.2.154 | ERROR               | octavia  |
+--------------------------------------+------+----------------------------------+---------------+---------------------+----------+

Comment 8 Carlos Goncalves 2018-11-07 15:51:30 UTC
From sosreports I see that Octavia health manager in fact detected the missing amphora (Nova instance) and rightfully triggered an amphora failover but then failed. This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1577976 with a patch submitted upstream (still under review).

Closing this as duplicate of #1577976. Feel free to reopen it if needed.


                                |__Flow 'octavia-failover-amphora-flow-octavia-get-amphora-for-lb-subflow'
                                   |__Atom 'octavia.controller.worker.tasks.database_tasks.GetAmphoraDetails' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_models.Amph
ora object at 0x7f5b010abcd0>}, 'provides': <octavia.common.data_models.Amphora object at 0x7f5afa86f850>}
                                      |__Atom 'octavia.controller.worker.tasks.database_tasks.MarkAmphoraDeletedInDB' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_mod
els.Amphora object at 0x7f5b010abcd0>}, 'provides': None}
                                         |__Atom 'octavia.controller.worker.tasks.database_tasks.DisableAmphoraHealthMonitoring' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.comm
on.data_models.Amphora object at 0x7f5b010abcd0>}, 'provides': None}
                                            |__Atom 'octavia.controller.worker.tasks.network_tasks.WaitForPortDetach' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_mod
els.Amphora object at 0x7f5b010abcd0>}, 'provides': None}
                                               |__Atom 'octavia.controller.worker.tasks.compute_tasks.ComputeDelete' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_mode
ls.Amphora object at 0x7f5b010abcd0>}, 'provides': None}
                                                  |__Atom 'octavia.controller.worker.tasks.database_tasks.MarkAmphoraHealthBusy' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.comm
on.data_models.Amphora object at 0x7f5b010abcd0>}, 'provides': None}
                                                     |__Atom 'octavia.controller.worker.tasks.database_tasks.MarkAmphoraPendingDeleteInDB' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <oc
tavia.common.data_models.Amphora object at 0x7f5b010abcd0>}, 'provides': None}
                                                        |__Atom 'octavia.controller.worker.tasks.lifecycle_tasks.AmphoraToErrorOnRevertTask' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_models.Amphora object at 0x7f5b010abcd0>}, 'provides': None}
                                                           |__Flow 'octavia-failover-amphora-flow': AddrFormatError: failed to detect a valid IP address from None
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker Traceback (most recent call last):
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker     result = task.execute(**arguments)
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/controller/worker/tasks/amphora_driver_tasks.py", line 219, in execute
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker     amphora, loadbalancer, amphorae_network_config)
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 137, in post_vip_plug
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker     net_info)
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 388, in plug_vip
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker     json=net_info)
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 255, in request
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker     _url = self._base_url(amp.lb_network_ip) + path
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 241, in _base_url
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker     if utils.is_ipv6_lla(ip):
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/octavia/common/utils.py", line 64, in is_ipv6_lla
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker     ip = netaddr.IPAddress(ip_address)
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker   File "/usr/lib/python2.7/site-packages/netaddr/ip/__init__.py", line 306, in __init__
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker     'address from %r' % addr)
2018-11-01 10:28:08.758 22 ERROR octavia.controller.worker.controller_worker AddrFormatError: failed to detect a valid IP address from None

*** This bug has been marked as a duplicate of bug 1577976 ***


Note You need to log in before you can comment on or make changes to this bug.