Bug 2166843

Summary: octavia tempest plugin IPv6 tests fail with: Server ... on port ... did not begin passing traffic within the timeout period.
Product: Red Hat OpenStack Reporter: Gregory Thiemonge <gthiemon>
Component: python-octavia-tests-tempestAssignee: Gregory Thiemonge <gthiemon>
Status: CLOSED ERRATA QA Contact: Bruna Bonguardo <bbonguar>
Severity: medium Docs Contact: Greg Rakauskas <gregraka>
Priority: medium    
Version: 17.1 (Wallaby)CC: fhubik, gthiemon, tweining
Target Milestone: betaKeywords: AutomationBlocker, Regression, Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1.el9ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-16 01:13:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gregory Thiemonge 2023-02-03 07:14:57 UTC
Description of problem:
Some tests are failing in octavia-tempest-plugin.
all the failing tests are from the
octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops.IPv6TrafficOperationsScenarioTest test class and are named test_ipv6_<proto>_<dispatch_method>_listener_with_allowed_cidrs

Traceback shows:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 294, in test_ipv6_http_LC_listener_with_allowed_cidrs
    self._test_listener_with_allowed_cidrs(
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 476, in _test_listener_with_allowed_cidrs
    self.check_members_balanced(
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 282, in check_members_balanced
    self._wait_for_lb_functional(
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 423, in _wait_for_lb_functional
    raise Exception(message)
Exception: Server [fd47:e41c:f56e:1::a3] on port 90 did not begin passing traffic within the timeout period. Failing test.


tempest logs:

2023-02-02 17:10:43,209 165021 INFO     [octavia_tempest_plugin.tests.validators] Request for http://[fd47:e41c:f56e:1::a3]:90 timed out. Retrying.
2023-02-02 17:10:43,210 165021 WARNING  [octavia_tempest_plugin.tests.validators] Server is not passing initial traffic. Waiting.
2023-02-02 17:10:49,217 165021 INFO     [octavia_tempest_plugin.tests.validators] Request for http://[fd47:e41c:f56e:1::a3]:90 timed out. Retrying.
2023-02-02 17:10:54,226 165021 INFO     [octavia_tempest_plugin.tests.validators] Request for http://[fd47:e41c:f56e:1::a3]:90 timed out. Retrying.
2023-02-02 17:10:54,226 165021 WARNING  [octavia_tempest_plugin.tests.validators] Server is not passing initial traffic. Waiting.
2023-02-02 17:10:55,228 165021 DEBUG    [octavia_tempest_plugin.tests.validators] Loadbalancer wait for load balancer response totals: {}
2023-02-02 17:10:55,228 165021 ERROR    [octavia_tempest_plugin.tests.validators] Server [fd47:e41c:f56e:1::a3] on port 90 did not begin passing traffic within the timeout period. Failing test.


Version-Release number of selected component (if applicable):
17.1

How reproducible:
100%

Steps to Reproduce:
1. deploy osp 17.1 with octavia
2. run ipv6 test from test_ipv6_traffic_ops
3.

Actual results:


Expected results:


Additional info:
This is a bug in octavia-tempest-plugin not in octavia, it was fixed upstream with:
810660: Fix incorrect subnet_id for ipv6 member servers | https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/810660

Comment 7 Filip Hubík 2023-03-09 09:40:51 UTC
I am not so sure this should be moved into verified just based on plain CI recovering. I am working on the latest content RHOS-17.1-RHEL-9-20230301.n.1, where python-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1.el9ost is present on UC:

[stack@undercloud-0 tempest-dir]$ sudo rpm -qa | grep python3-octavia-tests-tempest
python3-octavia-tests-tempest-golang-1.9.0-1.20230203110933.a3a95b1.el9ost.x86_64
python3-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1.el9ost.noarch

and therefore the https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/810660/1/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py#316 is included, yet still I get for octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops.IPv6TrafficOperationsScenarioTest.test_ipv6_http_LC_listener_with_allowed_cidrs :

Traceback (most recent call last):                                                                                                              
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 295, in test_ipv6_http_LC_liste
ner_with_allowed_cidrs                                                                                                                          
    self._test_listener_with_allowed_cidrs(                                                                                                     
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_ipv6_traffic_ops.py", line 480, in _test_listener_with_all
owed_cidrs                                                                                                                                      
    self.check_members_balanced(                                                                                                                
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 282, in check_members_balanced                       
    self._wait_for_lb_functional(                                                                                                                  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 423, in _wait_for_lb_functional                      
    raise Exception(message)                                                                                                                     
Exception: Server [fdde:1a92:7523:70a0::2e3] on port 90 did not begin passing traffic within the timeout period. Failing test.

What I don't understand yet is why it is passing in CI, but not manually. TBD.

Comment 8 Filip Hubík 2023-03-09 10:21:57 UTC
So I found a small differences in the failure I am getting currently:

2023-03-06 12:08:58,785 173649 INFO     [octavia_tempest_plugin.tests.validators] Validate URL got exception: HTTPConnectionPool(host='fdde:1a92:7523:70a0::171', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9702affcd0>: Failed to establish a new connection: [Errno 101] Network is unreachable')). Retrying.

and the failures reported in older builds:

2023-02-08 23:20:33,259 176414 WARNING  [octavia_tempest_plugin.tests.validators] Server is not passing initial traffic. Waiting.
2023-02-08 23:20:39,270 176414 INFO     [octavia_tempest_plugin.tests.validators] Request for http://[fd47:e41c:f56e:1::183]:90 timed out. Retrying.

which might indicate that this might be different root cause after all (why it can not reach configured ipv6 network? misconfig? why this wont appear in CI?), but gonna need someone from Octavia for deeper deep dive on this to make some conclusions.

(P.S. ignore the diffent stacktrace line numbering above, its due to the debug code add inline)

Comment 9 Filip Hubík 2023-03-10 16:41:53 UTC
Ok, thanks to Gregory I was able to manually (re)verify that python-octavia-tests-tempest-1.9.0-1.20230203110933.a3a95b1 on top of RHOS-17.1-RHEL-9-20230131.n.2 fixes the problem. Sorry for the fuzz but imho better be safe than sorry ;)

Problem found is on different layer where I'd never expect it. Our CI reproducer ("custom") jobs (there never should be such issue as they are designed exactly to prevent these parameter mismatches by only referencing existing params). Namely, the right IR (post)config should be:

--tasks create_external_network,create_private_network,forward_overcloud_dashboard,network_time,tempest_deployer_input,add_extra_overcloud_ssh_keys

compared to the custom job using(for some unknown reason):

--tasks create_external_network,forward_overcloud_dashboard,network_time,tempest_deployer_input,add_extra_overcloud_ssh_keys

It's obvious from there the private network can't be reached, if its not even deployed.

Downstream CI investigation and debugging will be done internally. Likely the problem is somewhere in compact-job parameter passing/sharing area.

Comment 18 errata-xmlrpc 2023-08-16 01:13:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577