Description of problem: RHOS17 Octavia jobs failing on tempest tests with "show_loadbalancer provisioning_status failed to update to ACTIVE within the required time" The following error is observed in Octavia worker logs:- ~~~ 2021-08-30 21:06:42.068 30 ERROR octavia.amphorae.drivers.haproxy.exceptions [-] Amphora agent returned unexpected result code 409 with response {'message': 'Interface already exists'} ~~~ Version-Release number of selected component (if applicable): RHOS17 How reproducible: Issue hit in last 3 run in RHOSP17 Integration pipeline Actual results: Load balancer is stuck in PENDING_CREATE and PENDING_UPDATE state. Expected results: Load balancer should not stuck in PENDING_CREATE and PENDING_UPDATE state. Additional info: Following error observed in tempest:- ~~~ {1} setUpClass (octavia_tempest_plugin.tests.scenario.v2.test_pool.PoolScenarioTest) [0.000000s] ... FAILED Captured traceback: ~~~~~~~~~~~~~~~~~~~ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/tempest/test.py", line 182, in setUpClass six.reraise(etype, value, trace) File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise raise value File "/usr/lib/python3.6/site-packages/tempest/test.py", line 175, in setUpClass cls.resource_setup() File "/usr/lib/python3.6/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_pool.py", line 54, in resource_setup CONF.load_balancer.lb_build_timeout) File "/usr/lib/python3.6/site-packages/octavia_tempest_plugin/tests/waiters.py", line 96, in wait_for_status raise exceptions.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: (PoolScenarioTest:setUpClass) show_loadbalancer provisioning_status failed to update to ACTIVE within the required time 900. Current status of show_loadbalancer: PENDING_CREATE ~~~ Captured traceback: ~~~~~~~~~~~~~~~~~~~ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_pool.py", line 79, in test_HTTPS_LC_pool_with_listener_CRUD algorithm=const.LB_ALGORITHM_LEAST_CONNECTIONS) File "/usr/lib/python3.6/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_pool.py", line 366, in _test_pool_CRUD CONF.load_balancer.build_timeout) File "/usr/lib/python3.6/site-packages/octavia_tempest_plugin/tests/waiters.py", line 96, in wait_for_status raise exceptions.TimeoutException(message) tempest.lib.exceptions.TimeoutException: Request timed out Details: (PoolScenarioTest:test_HTTPS_LC_pool_with_listener_CRUD) show_loadbalancer provisioning_status failed to update to ACTIVE within the required time 300. Current status of show_loadbalancer: PENDING_UPDATE ~~~
In the Octavia worker logs, we got: 2021-08-30 21:03:23.679 30 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] request url https://172.24.3.95:9443/1.0/plug/vip/10.1.1.159 request /usr/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:678 2021-08-30 21:04:24.066 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=60.0) 2021-08-30 21:04:44.124 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=10.0) 2021-08-30 21:05:04.166 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=10.0) 2021-08-30 21:05:24.217 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=10.0) 2021-08-30 21:05:44.241 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=10.0) 2021-08-30 21:06:04.266 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=10.0) 2021-08-30 21:06:24.297 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=10.0) 2021-08-30 21:06:42.066 30 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] Connected to amphora. Response: <Response [409]> request /usr/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:699 2021-08-30 21:06:42.068 30 ERROR octavia.amphorae.drivers.haproxy.exceptions [-] Amphora agent returned unexpected result code 409 with response {'message': 'Interface already exists'} The "Interface already exists" error should not affect the load balancer creation, after this message, the VIP interface (created at 21:03:23.679) should be up and functional. But this issue is due to the duration between the request from the octavia-worker and the response from the amphora-agent (21:03:23.679 -> 21:06:42.068), it took more than 3 min to set up an interface in the amphora (but it should be pretty quick, basically it calls 'ifup eth1' in the VM). Then we can also check the duration of the flow that created the load balancer in Octavia: in the worker logs: 2021-08-30 20:50:06.315 30 INFO octavia.controller.queue.v1.endpoints [-] Creating load balancer '16495fcc-3f88-41f9-b99d-4fd1d8fe4649'... 2021-08-30 20:50:08.097 30 DEBUG octavia.controller.worker.v1.controller_worker [-] Flow 'octavia-create-loadbalancer-flow' (6f5de613-1820-48c3-8ec1-2c23ba536c72) transitioned into state 'RUNNING' from state 'PENDING' _flow_receiver /usr/lib/python3.6/site-packages/taskflow/listeners/logging.py:143 2021-08-30 21:06:42.150 30 INFO octavia.controller.worker.v1.tasks.database_tasks [-] Mark ACTIVE in DB for load balancer id: 16495fcc-3f88-41f9-b99d-4fd1d8fe4649 2021-08-30 21:06:42.193 30 DEBUG octavia.controller.worker.v1.controller_worker [-] Flow 'octavia-create-loadbalancer-flow' (6f5de613-1820-48c3-8ec1-2c23ba536c72) transitioned into state 'SUCCESS' from state 'RUNNING' _flow_receiver /usr/lib/python3.6/site-packages/taskflow/listeners/logging.py:143 It took 16 min to create a load balancer, this is longer than any timeouts that are defined in octavia-tempest-plugin, creating a load balancer in OSP should take only 1min30 (2min in a busy environment). In nova logs, we have the following configuration value: 2021-08-30 20:38:23.510 7 DEBUG oslo_service.service [req-4fcdfece-e5c4-4fd3-9b31-3cb27ccb953e - - - - -] libvirt.virt_type = qemu log_opt_values /usr/lib/python3.6/site-packages/oslo_config/cfg.py:2593 Here the CI uses qemu emulation which doesn't work well with Octavia because Octavia uses RHEL/Centos based VMs that are slow in those environments. Do you have a way to enable kvm nested virtualization for those jobs?