Description of problem: Some tests are failing in octavia-tempest-plugin. all the failing tests are from the octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops.IPv6TrafficOperationsScenarioTest test class and are named test_ipv6_<proto>_<dispatch_method>_listener_with_allowed_cidrs Traceback shows: Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 294, in test_ipv6_http_LC_listener_with_allowed_cidrs self._test_listener_with_allowed_cidrs( File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 476, in _test_listener_with_allowed_cidrs self.check_members_balanced( File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 282, in check_members_balanced self._wait_for_lb_functional( File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 423, in _wait_for_lb_functional raise Exception(message) Exception: Server [fd47:e41c:f56e:1::a3] on port 90 did not begin passing traffic within the timeout period. Failing test. tempest logs: 2023-02-02 17:10:43,209 165021 INFO [octavia_tempest_plugin.tests.validators] Request for http://[fd47:e41c:f56e:1::a3]:90 timed out. Retrying. 2023-02-02 17:10:43,210 165021 WARNING [octavia_tempest_plugin.tests.validators] Server is not passing initial traffic. Waiting. 2023-02-02 17:10:49,217 165021 INFO [octavia_tempest_plugin.tests.validators] Request for http://[fd47:e41c:f56e:1::a3]:90 timed out. Retrying. 2023-02-02 17:10:54,226 165021 INFO [octavia_tempest_plugin.tests.validators] Request for http://[fd47:e41c:f56e:1::a3]:90 timed out. Retrying. 2023-02-02 17:10:54,226 165021 WARNING [octavia_tempest_plugin.tests.validators] Server is not passing initial traffic. Waiting. 2023-02-02 17:10:55,228 165021 DEBUG [octavia_tempest_plugin.tests.validators] Loadbalancer wait for load balancer response totals: {} 2023-02-02 17:10:55,228 165021 ERROR [octavia_tempest_plugin.tests.validators] Server [fd47:e41c:f56e:1::a3] on port 90 did not begin passing traffic within the timeout period. Failing test. Version-Release number of selected component (if applicable): 17.1 How reproducible: 100% Steps to Reproduce: 1. deploy osp 17.1 with octavia 2. run ipv6 test from test_ipv6_traffic_ops 3. Actual results: Expected results: Additional info: This is a bug in octavia-tempest-plugin not in octavia, it was fixed upstream with: 810660: Fix incorrect subnet_id for ipv6 member servers | https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/810660
I am not so sure this should be moved into verified just based on plain CI recovering. I am working on the latest content RHOS-17.1-RHEL-9-20230301.n.1, where python-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1.el9ost is present on UC: [stack@undercloud-0 tempest-dir]$ sudo rpm -qa | grep python3-octavia-tests-tempest python3-octavia-tests-tempest-golang-1.9.0-1.20230203110933.a3a95b1.el9ost.x86_64 python3-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1.el9ost.noarch and therefore the https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/810660/1/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py#316 is included, yet still I get for octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops.IPv6TrafficOperationsScenarioTest.test_ipv6_http_LC_listener_with_allowed_cidrs : Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 295, in test_ipv6_http_LC_liste ner_with_allowed_cidrs self._test_listener_with_allowed_cidrs( File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 480, in _test_listener_with_all owed_cidrs self.check_members_balanced( File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 282, in check_members_balanced self._wait_for_lb_functional( File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 423, in _wait_for_lb_functional raise Exception(message) Exception: Server [fdde:1a92:7523:70a0::2e3] on port 90 did not begin passing traffic within the timeout period. Failing test. What I don't understand yet is why it is passing in CI, but not manually. TBD.
So I found a small differences in the failure I am getting currently: 2023-03-06 12:08:58,785 173649 INFO [octavia_tempest_plugin.tests.validators] Validate URL got exception: HTTPConnectionPool(host='fdde:1a92:7523:70a0::171', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9702affcd0>: Failed to establish a new connection: [Errno 101] Network is unreachable')). Retrying. and the failures reported in older builds: 2023-02-08 23:20:33,259 176414 WARNING [octavia_tempest_plugin.tests.validators] Server is not passing initial traffic. Waiting. 2023-02-08 23:20:39,270 176414 INFO [octavia_tempest_plugin.tests.validators] Request for http://[fd47:e41c:f56e:1::183]:90 timed out. Retrying. which might indicate that this might be different root cause after all (why it can not reach configured ipv6 network? misconfig? why this wont appear in CI?), but gonna need someone from Octavia for deeper deep dive on this to make some conclusions. (P.S. ignore the diffent stacktrace line numbering above, its due to the debug code add inline)
Ok, thanks to Gregory I was able to manually (re)verify that python-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1 on top of RHOS-17.1-RHEL-9-20230131.n.2 fixes the problem. Sorry for the fuzz but imho better be safe than sorry ;) Problem found is on different layer where I'd never expect it. Our CI reproducer ("custom") jobs (there never should be such issue as they are designed exactly to prevent these parameter mismatches by only referencing existing params). Namely, the right IR (post)config should be: --tasks create_external_network,create_private_network,forward_overcloud_dashboard,network_time,tempest_deployer_input,add_extra_overcloud_ssh_keys compared to the custom job using(for some unknown reason): --tasks create_external_network,forward_overcloud_dashboard,network_time,tempest_deployer_input,add_extra_overcloud_ssh_keys It's obvious from there the private network can't be reached, if its not even deployed. Downstream CI investigation and debugging will be done internally. Likely the problem is somewhere in compact-job parameter passing/sharing area.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577