Bug 2166843 - octavia tempest plugin IPv6 tests fail with: Server ... on port ... did not begin passing traffic within the timeout period.
Summary: octavia tempest plugin IPv6 tests fail with: Server ... on port ... did not b...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-octavia-tests-tempest
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: beta
: 17.1
Assignee: Gregory Thiemonge
QA Contact: Bruna Bonguardo
Greg Rakauskas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-02-03 07:14 UTC by Gregory Thiemonge
Modified: 2023-08-16 01:14 UTC (History)
3 users (show)

Fixed In Version: python-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1.el9ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-16 01:13:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 810660 0 None MERGED Fix incorrect subnet_id for ipv6 member servers 2023-02-03 07:15:46 UTC
Red Hat Issue Tracker OSP-22022 0 None None None 2023-02-03 07:16:09 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:14:03 UTC

Description Gregory Thiemonge 2023-02-03 07:14:57 UTC
Description of problem:
Some tests are failing in octavia-tempest-plugin.
all the failing tests are from the
octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops.IPv6TrafficOperationsScenarioTest test class and are named test_ipv6_<proto>_<dispatch_method>_listener_with_allowed_cidrs

Traceback shows:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 294, in test_ipv6_http_LC_listener_with_allowed_cidrs
    self._test_listener_with_allowed_cidrs(
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 476, in _test_listener_with_allowed_cidrs
    self.check_members_balanced(
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 282, in check_members_balanced
    self._wait_for_lb_functional(
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 423, in _wait_for_lb_functional
    raise Exception(message)
Exception: Server [fd47:e41c:f56e:1::a3] on port 90 did not begin passing traffic within the timeout period. Failing test.


tempest logs:

2023-02-02 17:10:43,209 165021 INFO     [octavia_tempest_plugin.tests.validators] Request for http://[fd47:e41c:f56e:1::a3]:90 timed out. Retrying.
2023-02-02 17:10:43,210 165021 WARNING  [octavia_tempest_plugin.tests.validators] Server is not passing initial traffic. Waiting.
2023-02-02 17:10:49,217 165021 INFO     [octavia_tempest_plugin.tests.validators] Request for http://[fd47:e41c:f56e:1::a3]:90 timed out. Retrying.
2023-02-02 17:10:54,226 165021 INFO     [octavia_tempest_plugin.tests.validators] Request for http://[fd47:e41c:f56e:1::a3]:90 timed out. Retrying.
2023-02-02 17:10:54,226 165021 WARNING  [octavia_tempest_plugin.tests.validators] Server is not passing initial traffic. Waiting.
2023-02-02 17:10:55,228 165021 DEBUG    [octavia_tempest_plugin.tests.validators] Loadbalancer wait for load balancer response totals: {}
2023-02-02 17:10:55,228 165021 ERROR    [octavia_tempest_plugin.tests.validators] Server [fd47:e41c:f56e:1::a3] on port 90 did not begin passing traffic within the timeout period. Failing test.


Version-Release number of selected component (if applicable):
17.1

How reproducible:
100%

Steps to Reproduce:
1. deploy osp 17.1 with octavia
2. run ipv6 test from test_ipv6_traffic_ops
3.

Actual results:


Expected results:


Additional info:
This is a bug in octavia-tempest-plugin not in octavia, it was fixed upstream with:
810660: Fix incorrect subnet_id for ipv6 member servers | https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/810660

Comment 7 Filip Hubík 2023-03-09 09:40:51 UTC
I am not so sure this should be moved into verified just based on plain CI recovering. I am working on the latest content RHOS-17.1-RHEL-9-20230301.n.1, where python-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1.el9ost is present on UC:

[stack@undercloud-0 tempest-dir]$ sudo rpm -qa | grep python3-octavia-tests-tempest
python3-octavia-tests-tempest-golang-1.9.0-1.20230203110933.a3a95b1.el9ost.x86_64
python3-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1.el9ost.noarch

and therefore the https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/810660/1/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py#316 is included, yet still I get for octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops.IPv6TrafficOperationsScenarioTest.test_ipv6_http_LC_listener_with_allowed_cidrs :

Traceback (most recent call last):                                                                                                              
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 295, in test_ipv6_http_LC_liste
ner_with_allowed_cidrs                                                                                                                          
    self._test_listener_with_allowed_cidrs(                                                                                                     
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 480, in _test_listener_with_all
owed_cidrs                                                                                                                                      
    self.check_members_balanced(                                                                                                                
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 282, in check_members_balanced                       
    self._wait_for_lb_functional(                                                                                                                  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 423, in _wait_for_lb_functional                      
    raise Exception(message)                                                                                                                     
Exception: Server [fdde:1a92:7523:70a0::2e3] on port 90 did not begin passing traffic within the timeout period. Failing test.

What I don't understand yet is why it is passing in CI, but not manually. TBD.

Comment 8 Filip Hubík 2023-03-09 10:21:57 UTC
So I found a small differences in the failure I am getting currently:

2023-03-06 12:08:58,785 173649 INFO     [octavia_tempest_plugin.tests.validators] Validate URL got exception: HTTPConnectionPool(host='fdde:1a92:7523:70a0::171', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9702affcd0>: Failed to establish a new connection: [Errno 101] Network is unreachable')). Retrying.

and the failures reported in older builds:

2023-02-08 23:20:33,259 176414 WARNING  [octavia_tempest_plugin.tests.validators] Server is not passing initial traffic. Waiting.
2023-02-08 23:20:39,270 176414 INFO     [octavia_tempest_plugin.tests.validators] Request for http://[fd47:e41c:f56e:1::183]:90 timed out. Retrying.

which might indicate that this might be different root cause after all (why it can not reach configured ipv6 network? misconfig? why this wont appear in CI?), but gonna need someone from Octavia for deeper deep dive on this to make some conclusions.

(P.S. ignore the diffent stacktrace line numbering above, its due to the debug code add inline)

Comment 9 Filip Hubík 2023-03-10 16:41:53 UTC
Ok, thanks to Gregory I was able to manually (re)verify that python-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1 on top of RHOS-17.1-RHEL-9-20230131.n.2 fixes the problem. Sorry for the fuzz but imho better be safe than sorry ;)

Problem found is on different layer where I'd never expect it. Our CI reproducer ("custom") jobs (there never should be such issue as they are designed exactly to prevent these parameter mismatches by only referencing existing params). Namely, the right IR (post)config should be:

--tasks create_external_network,create_private_network,forward_overcloud_dashboard,network_time,tempest_deployer_input,add_extra_overcloud_ssh_keys

compared to the custom job using(for some unknown reason):

--tasks create_external_network,forward_overcloud_dashboard,network_time,tempest_deployer_input,add_extra_overcloud_ssh_keys

It's obvious from there the private network can't be reached, if its not even deployed.

Downstream CI investigation and debugging will be done internally. Likely the problem is somewhere in compact-job parameter passing/sharing area.

Comment 18 errata-xmlrpc 2023-08-16 01:13:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.