Bug 2166843
Summary: | octavia tempest plugin IPv6 tests fail with: Server ... on port ... did not begin passing traffic within the timeout period. | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Gregory Thiemonge <gthiemon> |
Component: | python-octavia-tests-tempest | Assignee: | Gregory Thiemonge <gthiemon> |
Status: | CLOSED ERRATA | QA Contact: | Bruna Bonguardo <bbonguar> |
Severity: | medium | Docs Contact: | Greg Rakauskas <gregraka> |
Priority: | medium | ||
Version: | 17.1 (Wallaby) | CC: | fhubik, gthiemon, tweining |
Target Milestone: | beta | Keywords: | AutomationBlocker, Regression, Triaged |
Target Release: | 17.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | python-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1.el9ost | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-08-16 01:13:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Gregory Thiemonge
2023-02-03 07:14:57 UTC
I am not so sure this should be moved into verified just based on plain CI recovering. I am working on the latest content RHOS-17.1-RHEL-9-20230301.n.1, where python-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1.el9ost is present on UC: [stack@undercloud-0 tempest-dir]$ sudo rpm -qa | grep python3-octavia-tests-tempest python3-octavia-tests-tempest-golang-1.9.0-1.20230203110933.a3a95b1.el9ost.x86_64 python3-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1.el9ost.noarch and therefore the https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/810660/1/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py#316 is included, yet still I get for octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops.IPv6TrafficOperationsScenarioTest.test_ipv6_http_LC_listener_with_allowed_cidrs : Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 295, in test_ipv6_http_LC_liste ner_with_allowed_cidrs self._test_listener_with_allowed_cidrs( File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 480, in _test_listener_with_all owed_cidrs self.check_members_balanced( File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 282, in check_members_balanced self._wait_for_lb_functional( File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 423, in _wait_for_lb_functional raise Exception(message) Exception: Server [fdde:1a92:7523:70a0::2e3] on port 90 did not begin passing traffic within the timeout period. Failing test. What I don't understand yet is why it is passing in CI, but not manually. TBD. So I found a small differences in the failure I am getting currently: 2023-03-06 12:08:58,785 173649 INFO [octavia_tempest_plugin.tests.validators] Validate URL got exception: HTTPConnectionPool(host='fdde:1a92:7523:70a0::171', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9702affcd0>: Failed to establish a new connection: [Errno 101] Network is unreachable')). Retrying. and the failures reported in older builds: 2023-02-08 23:20:33,259 176414 WARNING [octavia_tempest_plugin.tests.validators] Server is not passing initial traffic. Waiting. 2023-02-08 23:20:39,270 176414 INFO [octavia_tempest_plugin.tests.validators] Request for http://[fd47:e41c:f56e:1::183]:90 timed out. Retrying. which might indicate that this might be different root cause after all (why it can not reach configured ipv6 network? misconfig? why this wont appear in CI?), but gonna need someone from Octavia for deeper deep dive on this to make some conclusions. (P.S. ignore the diffent stacktrace line numbering above, its due to the debug code add inline) Ok, thanks to Gregory I was able to manually (re)verify that python-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1 on top of RHOS-17.1-RHEL-9-20230131.n.2 fixes the problem. Sorry for the fuzz but imho better be safe than sorry ;) Problem found is on different layer where I'd never expect it. Our CI reproducer ("custom") jobs (there never should be such issue as they are designed exactly to prevent these parameter mismatches by only referencing existing params). Namely, the right IR (post)config should be: --tasks create_external_network,create_private_network,forward_overcloud_dashboard,network_time,tempest_deployer_input,add_extra_overcloud_ssh_keys compared to the custom job using(for some unknown reason): --tasks create_external_network,forward_overcloud_dashboard,network_time,tempest_deployer_input,add_extra_overcloud_ssh_keys It's obvious from there the private network can't be reached, if its not even deployed. Downstream CI investigation and debugging will be done internally. Likely the problem is somewhere in compact-job parameter passing/sharing area. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577 |