Description of problem: Octavia is going into an error state when testing failover. How reproducible: Always Steps to Reproduce: 1) create HA loadbalancer openstack loadbalancer create --name lb1 --vip-subnet-id public-subnet openstack loadbalancer listener create --name listener1 --protocol TCP --protocol-port 22 lb1 openstack loadbalancer pool create --protocol TCP --listener listener1 --lb-algorithm ROUND_ROBIN --session-persistence type=SOURCE_IP 2) run "openstack loadbalancer amphora failover <ID of one of the loadbalancers>" 3) another amphora instance is built and the instance that it's replacing gets deleted and the primary/secondary load balancer go into an error state Actual results: Amphora agent returned unexpected result code 400 with response ERROR octavia.amphorae.drivers.haproxy.exceptions ... Amphora agent returned unexpected result code 400 with response {'message': 'Invalid request', 'details': "[ALERT] 166/075021 (4837) : Proxy ...': unable to find local peer '...' in peers section '..._peers'.\n[WARNING] 166/075021 (4837) : Removing incomplete section 'peers ..._peers' (no peer named '...').\n[ALERT] 166/075021 (4837) : Fatal errors found in configuration.\n"} ERROR octavia.controller.worker.v1.tasks.amphora_driver_tasks... Failed to update listeners on amphora .... Skipping this amphora as it is failing to update due to: Invalid request: octavia.amphorae.drivers.haproxy.exceptions.InvalidRequest: Invalid request Expected results: amphora failover to work Additional info: The patch was cherry-picked to branch stable/victoria as commit a865c03c3b4573ce5855c1d90aad43b5950ea9c5, also the change has been successfully merged by Zuul - https://review.opendev.org/q/Ia46a05ab9fdc97ed9be699e5b2ae90daca3ab9a2was cherry-picked to branch stable/victoria as commit a865c03c3b4573ce5855c1d90aad43b5950ea9c5, also the change has been successfully merged by Zuul - https://review.opendev.org/q/Ia46a05ab9fdc97ed9be699e5b2ae90daca3ab9a2 it looks like the amphora_driver_tasks.py file went from: class AmpListenersUpdate(BaseAmphoraTask): """Task to update the listeners on one amphora.""" def execute(self, loadbalancer, amphora, timeout_dict=None): # Note, we don't want this to cause a revert as it may be used # in a failover flow with both amps failing. Skip it and let # health manager fix it. try: self.amphora_driver.update_amphora_listeners( loadbalancer, amphora, timeout_dict) except Exception as e: LOG.error('Failed to update listeners on amphora %s. Skipping ' 'this amphora as it is failing to update due to: %s', amphora.id, str(e)) self.amphora_repo.update(db_apis.get_session(), amphora.id, status=constants.ERROR) TO: class AmpListenersUpdate(BaseAmphoraTask): """Task to update the listeners on one amphora.""" def execute(self, loadbalancer, amphora, timeout_dict=None): # Note, we don't want this to cause a revert as it may be used # in a failover flow with both amps failing. Skip it and let # health manager fix it. try: # Make sure we have a fresh load balancer object loadbalancer = self.loadbalancer_repo.get(db_apis.get_session(), id=loadbalancer.id) self.amphora_driver.update_amphora_listeners( loadbalancer, amphora, timeout_dict) except Exception as e: LOG.error('Failed to update listeners on amphora %s. Skipping ' 'this amphora as it is failing to update due to: %s', amphora.id, str(e)) self.amphora_repo.update(db_apis.get_session(), amphora.id, status=constants.ERROR) with the latest patch. Attached is a cat of that file from the octavia_worker container which is missing the addition of: loadbalancer = self.loadbalancer_repo.get(db_apis.get_session(), id=loadbalancer.id)
I checked the logs, this is another amphora_driver_tasks Task that is failing: 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker Traceback (most recent call last): 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker result = task.execute(**arguments) 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/octavia/controller/worker/v1/tasks/amphora_driver_tasks.py", line 133, in execute 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker loadbalancer, amphorae[amphora_index], timeout_dict) 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 296, in reload 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker self._apply('reload_listener', loadbalancer, amphora, timeout_dict) 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 293, in _apply 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker amp, loadbalancer.id, *args) 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py", line 899, in _action 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker return exc.check_exception(r) 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker File "/usr/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/exceptions.py", line 44, in check_exception 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker raise responses[status_code]() 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker octavia.amphorae.drivers.haproxy.exceptions.NotFound: Not Found 2021-06-16 11:50:24.562 129 ERROR octavia.controller.worker.v1.controller_worker 2021-06-16 11:50:24.579 129 DEBUG octavia.controller.worker.v1.controller_worker [-] Task '1-amphora-reload-listener' (ed7df91a-6153-4a09-be17-c4f8c3a3d48f) transitioned into state 'REVERTING' from state 'FAILURE' _task_receiver /usr/lib/python3.6/site-packages/taskflow/listeners/logging.py:194 Which corresponds to the AmphoraIndexListenersReload class https://opendev.org/openstack/octavia/src/branch/stable/train/octavia/controller/worker/v1/tasks/amphora_driver_tasks.py#L128-L144 We have to discuss if we need the same loadbalancer_repo.get() call to refresh the load balancer object. We can also check if some other classes have the same issue.
I found the issue, there's a missing commit in 16.1, it is on stable/train but it hasn't been backported yet: https://review.opendev.org/c/openstack/octavia/+/761805
Bug cannot be verified until 16.1 z7 puddle is available.
#Verified in version: [stack@undercloud-0 ~]$ cat /var/lib/rhos-release/latest-installed 16.1 -p RHOS-16.1-RHEL-8-20210804.n.0 #Creating HA load balancer: (overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer create --name lb1 --vip-subnet-id external_subnet +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | admin_state_up | True | | created_at | 2021-08-23T09:11:00 | | description | | | flavor_id | None | | id | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | | listeners | | | name | lb1 | | operating_status | OFFLINE | | pools | | | project_id | 75228e22f16c4c2087a2ab2324427aa8 | | provider | amphora | | provisioning_status | PENDING_CREATE | | updated_at | None | | vip_address | 10.0.0.218 | | vip_network_id | b3650b06-20fb-4bf3-90e2-aa806ee13920 | | vip_port_id | ff460bec-5283-403d-a644-5e6d9605c79f | | vip_qos_policy_id | None | | vip_subnet_id | 5f5aea73-0bc1-452c-aa17-64616b99d953 | +---------------------+--------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer show lb1 +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | admin_state_up | True | | created_at | 2021-08-23T09:11:00 | | description | | | flavor_id | None | | id | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | | listeners | | | name | lb1 | | operating_status | ONLINE | | pools | | | project_id | 75228e22f16c4c2087a2ab2324427aa8 | | provider | amphora | | provisioning_status | ACTIVE | | updated_at | 2021-08-23T09:13:05 | | vip_address | 10.0.0.218 | | vip_network_id | b3650b06-20fb-4bf3-90e2-aa806ee13920 | | vip_port_id | ff460bec-5283-403d-a644-5e6d9605c79f | | vip_qos_policy_id | None | | vip_subnet_id | 5f5aea73-0bc1-452c-aa17-64616b99d953 | +---------------------+--------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer listener create --name listener1 --protocol TCP --protocol-port 22 lb1 +-----------------------------+--------------------------------------+ | Field | Value | +-----------------------------+--------------------------------------+ | admin_state_up | True | | connection_limit | -1 | | created_at | 2021-08-23T09:15:49 | | default_pool_id | None | | default_tls_container_ref | None | | description | | | id | 4ef152f1-018f-4113-8707-aadc81192702 | | insert_headers | None | | l7policies | | | loadbalancers | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | | name | listener1 | | operating_status | OFFLINE | | project_id | 75228e22f16c4c2087a2ab2324427aa8 | | protocol | TCP | | protocol_port | 22 | | provisioning_status | PENDING_CREATE | | sni_container_refs | [] | | timeout_client_data | 50000 | | timeout_member_connect | 5000 | | timeout_member_data | 50000 | | timeout_tcp_inspect | 0 | | updated_at | None | | client_ca_tls_container_ref | None | | client_authentication | NONE | | client_crl_container_ref | None | | allowed_cidrs | None | +-----------------------------+--------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer pool create --protocol TCP --listener listener1 --lb-algorithm ROUND_ROBIN --session-persistence type=SOURCE_IP +----------------------+--------------------------------------+ | Field | Value | +----------------------+--------------------------------------+ | admin_state_up | True | | created_at | 2021-08-23T09:39:44 | | description | | | healthmonitor_id | | | id | 10aa023e-509d-4ee8-8d2b-98e3e461bbcc | | lb_algorithm | ROUND_ROBIN | | listeners | 4ef152f1-018f-4113-8707-aadc81192702 | | loadbalancers | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | | members | | | name | | | operating_status | OFFLINE | | project_id | 75228e22f16c4c2087a2ab2324427aa8 | | protocol | TCP | | provisioning_status | PENDING_CREATE | | session_persistence | type=SOURCE_IP | | | cookie_name=None | | | persistence_timeout=None | | | persistence_granularity=None | | updated_at | None | | tls_container_ref | None | | ca_tls_container_ref | None | | crl_container_ref | None | | tls_enabled | False | +----------------------+--------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer status show lb1 { "loadbalancer": { "id": "61e24d04-0e21-4cc5-9ba4-94469e1e7028", "name": "lb1", "operating_status": "ONLINE", "provisioning_status": "ACTIVE", "listeners": [ { "id": "4ef152f1-018f-4113-8707-aadc81192702", "name": "listener1", "operating_status": "ONLINE", "provisioning_status": "ACTIVE", "pools": [ { "id": "10aa023e-509d-4ee8-8d2b-98e3e461bbcc", "name": "", "provisioning_status": "ACTIVE", "operating_status": "ONLINE", "members": [] } ] } ] } } (overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer amphora list +--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+ | id | loadbalancer_id | status | role | lb_network_ip | ha_ip | +--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+ | 7bf6634e-4f56-428d-b1ff-48e45d6099d0 | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | ALLOCATED | BACKUP | 172.24.2.221 | 10.0.0.218 | | bf4eecfb-b741-40b2-819c-d22695b1e4a2 | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | ALLOCATED | MASTER | 172.24.1.193 | 10.0.0.218 | +--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+ #Run "openstack loadbalancer amphora failover <ID of one of the amphorae>" (overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer amphora failover bf4eecfb-b741-40b2-819c-d22695b1e4a2 (overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer amphora list +--------------------------------------+--------------------------------------+----------------+--------+---------------+------------+ | id | loadbalancer_id | status | role | lb_network_ip | ha_ip | +--------------------------------------+--------------------------------------+----------------+--------+---------------+------------+ | 7bf6634e-4f56-428d-b1ff-48e45d6099d0 | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | ALLOCATED | BACKUP | 172.24.2.221 | 10.0.0.218 | | bf4eecfb-b741-40b2-819c-d22695b1e4a2 | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | PENDING_DELETE | MASTER | 172.24.1.193 | 10.0.0.218 | | 9ee56b75-ecd8-4efd-9f83-c3b96c0b8be9 | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | PENDING_CREATE | None | None | None | +--------------------------------------+--------------------------------------+----------------+--------+---------------+------------+ (overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer amphora list +--------------------------------------+--------------------------------------+----------------+--------+---------------+------------+ | id | loadbalancer_id | status | role | lb_network_ip | ha_ip | +--------------------------------------+--------------------------------------+----------------+--------+---------------+------------+ | 7bf6634e-4f56-428d-b1ff-48e45d6099d0 | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | ALLOCATED | BACKUP | 172.24.2.221 | 10.0.0.218 | | bf4eecfb-b741-40b2-819c-d22695b1e4a2 | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | PENDING_DELETE | MASTER | 172.24.1.193 | 10.0.0.218 | | 9ee56b75-ecd8-4efd-9f83-c3b96c0b8be9 | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | BOOTING | None | None | None | +--------------------------------------+--------------------------------------+----------------+--------+---------------+------------+ (...) (overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer amphora list +--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+ | id | loadbalancer_id | status | role | lb_network_ip | ha_ip | +--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+ | 7bf6634e-4f56-428d-b1ff-48e45d6099d0 | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | ALLOCATED | BACKUP | 172.24.2.221 | 10.0.0.218 | | 9ee56b75-ecd8-4efd-9f83-c3b96c0b8be9 | 61e24d04-0e21-4cc5-9ba4-94469e1e7028 | ALLOCATED | MASTER | 172.24.1.240 | 10.0.0.218 | +--------------------------------------+--------------------------------------+-----------+--------+---------------+------------+ (overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer status show lb1 { "loadbalancer": { "id": "61e24d04-0e21-4cc5-9ba4-94469e1e7028", "name": "lb1", "operating_status": "ONLINE", "provisioning_status": "ACTIVE", "listeners": [ { "id": "4ef152f1-018f-4113-8707-aadc81192702", "name": "listener1", "operating_status": "ONLINE", "provisioning_status": "ACTIVE", "pools": [ { "id": "10aa023e-509d-4ee8-8d2b-98e3e461bbcc", "name": "", "provisioning_status": "ACTIVE", "operating_status": "ONLINE", "members": [] } ] } ] } } No ERROR messages were found in the logs. Moving the bug to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3762