Bug 2000070
Summary: | RHOS17 Octavia jobs failing on tempest tests with "show_loadbalancer provisioning_status failed to update to ACTIVE within the required time" | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Sandeep Yadav <sandyada> |
Component: | openstack-octavia | Assignee: | Nate Johnston <njohnston> |
Status: | CLOSED NOTABUG | QA Contact: | Bruna Bonguardo <bbonguar> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 17.0 (Wallaby) | CC: | gthiemon, ihrachys, lpeer, majopela, scohen |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-09-07 15:24:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sandeep Yadav
2021-09-01 10:18:25 UTC
In the Octavia worker logs, we got: 2021-08-30 21:03:23.679 30 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] request url https://172.24.3.95:9443/1.0/plug/vip/10.1.1.159 request /usr/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:678 2021-08-30 21:04:24.066 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=60.0) 2021-08-30 21:04:44.124 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=10.0) 2021-08-30 21:05:04.166 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=10.0) 2021-08-30 21:05:24.217 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=10.0) 2021-08-30 21:05:44.241 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=10.0) 2021-08-30 21:06:04.266 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=10.0) 2021-08-30 21:06:24.297 30 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='172.24.3.95', port=9443): Read timed out. (read timeout=10.0) 2021-08-30 21:06:42.066 30 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] Connected to amphora. Response: <Response [409]> request /usr/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:699 2021-08-30 21:06:42.068 30 ERROR octavia.amphorae.drivers.haproxy.exceptions [-] Amphora agent returned unexpected result code 409 with response {'message': 'Interface already exists'} The "Interface already exists" error should not affect the load balancer creation, after this message, the VIP interface (created at 21:03:23.679) should be up and functional. But this issue is due to the duration between the request from the octavia-worker and the response from the amphora-agent (21:03:23.679 -> 21:06:42.068), it took more than 3 min to set up an interface in the amphora (but it should be pretty quick, basically it calls 'ifup eth1' in the VM). Then we can also check the duration of the flow that created the load balancer in Octavia: in the worker logs: 2021-08-30 20:50:06.315 30 INFO octavia.controller.queue.v1.endpoints [-] Creating load balancer '16495fcc-3f88-41f9-b99d-4fd1d8fe4649'... 2021-08-30 20:50:08.097 30 DEBUG octavia.controller.worker.v1.controller_worker [-] Flow 'octavia-create-loadbalancer-flow' (6f5de613-1820-48c3-8ec1-2c23ba536c72) transitioned into state 'RUNNING' from state 'PENDING' _flow_receiver /usr/lib/python3.6/site-packages/taskflow/listeners/logging.py:143 2021-08-30 21:06:42.150 30 INFO octavia.controller.worker.v1.tasks.database_tasks [-] Mark ACTIVE in DB for load balancer id: 16495fcc-3f88-41f9-b99d-4fd1d8fe4649 2021-08-30 21:06:42.193 30 DEBUG octavia.controller.worker.v1.controller_worker [-] Flow 'octavia-create-loadbalancer-flow' (6f5de613-1820-48c3-8ec1-2c23ba536c72) transitioned into state 'SUCCESS' from state 'RUNNING' _flow_receiver /usr/lib/python3.6/site-packages/taskflow/listeners/logging.py:143 It took 16 min to create a load balancer, this is longer than any timeouts that are defined in octavia-tempest-plugin, creating a load balancer in OSP should take only 1min30 (2min in a busy environment). In nova logs, we have the following configuration value: 2021-08-30 20:38:23.510 7 DEBUG oslo_service.service [req-4fcdfece-e5c4-4fd3-9b31-3cb27ccb953e - - - - -] libvirt.virt_type = qemu log_opt_values /usr/lib/python3.6/site-packages/oslo_config/cfg.py:2593 Here the CI uses qemu emulation which doesn't work well with Octavia because Octavia uses RHEL/Centos based VMs that are slow in those environments. Do you have a way to enable kvm nested virtualization for those jobs? |