Bug 2054145 - Server on port X did not begin passing traffic within the timeout period
Summary: Server on port X did not begin passing traffic within the timeout period
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-octavia
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: Alpha
: 17.0
Assignee: Gregory Thiemonge
QA Contact: Bruna Bonguardo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-14 09:52 UTC by Omer Schwartz
Modified: 2022-09-21 12:19 UTC (History)
5 users (show)

Fixed In Version: openstack-octavia-8.0.2-0.20220329110858.a16f516.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:18:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 828606 0 None MERGED Fix ipv6 interface configuration 2022-02-22 12:59:48 UTC
OpenStack gerrit 829805 0 None MERGED Fix unplugging member ports 2022-02-22 12:59:49 UTC
OpenStack gerrit 830413 0 None MERGED Fix unplugging member ports 2022-04-22 05:47:54 UTC
OpenStack gerrit 830419 0 None MERGED Fix ipv6 interface configuration 2022-04-22 05:47:56 UTC
Red Hat Issue Tracker OSP-12646 0 None None None 2022-02-14 09:55:28 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:19:19 UTC

Description Omer Schwartz 2022-02-14 09:52:56 UTC
Description of problem:
Some IPv6 tests:

octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops.IPv6TrafficOperationsScenarioTest

- test_ipv6_http_LC_listener_with_allowed_cidrs
- test_ipv6_http_SI_listener_with_allowed_cidrs
- test_ipv6_tcp_SI_listener_with_allowed_cidrs

Fail on CI when running with ACTIVE_STANDBY topology.


Links to the failed tests:

https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/octavia/job/DFG-network-octavia-17.0_director-rhel-virthost-3cont_3comp-ipv4-geneve-actstby/13/testReport/octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops/IPv6TrafficOperationsScenarioTest/Finally_Steps___test_ipv6_http_LC_listener_with_allowed_cidrs_id_9bead31b_0760_4c8f_b70a_f758fc5edd6a_/

https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/octavia/job/DFG-network-octavia-17.0_director-rhel-virthost-3cont_3comp-ipv4-geneve-actstby/13/testReport/octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops/IPv6TrafficOperationsScenarioTest/Finally_Steps___test_ipv6_http_SI_listener_with_allowed_cidrs_id_d1256195_3d85_4ffd_bda3_1c0ab78b8ce1_/

https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/octavia/job/DFG-network-octavia-17.0_director-rhel-virthost-3cont_3comp-ipv4-geneve-actstby/13/testReport/octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops/IPv6TrafficOperationsScenarioTest/Finally_Steps___test_ipv6_tcp_SI_listener_with_allowed_cidrs_id_bf8504b6_b95a_4f8a_9032_ab432db46eec_/


Version-Release number of selected component (if applicable):
17

How reproducible:
100%

Steps to Reproduce:
1. Run Active standby job for OSP17.
2.
3.

Actual results:
Some IPv6 fail

Expected results:
All should pass

Comment 1 Gregory Thiemonge 2022-02-14 10:10:13 UTC
It looks like all the requests were forwarded to the same member:

2022-02-11 14:03:06,693 318518 DEBUG    [octavia_tempest_plugin.tests.validators] Loadbalancer wait for load balancer response totals: {'1': 25636}


Need to investigate, but this open review https://review.opendev.org/c/openstack/octavia/+/828606 fixes a similar issue with ipv6 members

Comment 3 Gregory Thiemonge 2022-02-21 08:13:21 UTC
One potential related issue that I reproduced in my env:

Load balancer loses connectivity to its members when adding a new member (observed in the IPv6 Scenario tests)

The IPv6 tests trigger a race condition in the management of the member ports in Octavia.

I. Reproduction steps:

Using the same load balancer:

test 1:

1. Create member on subnet A
   A port on subnet A is attached to the amphora
2. Create member on subnet B
   A port on subnet B is attached to the amphora
3. Delete members
   Ports are not updated

test 2:

4. Create member on subnet A
   Port is already attached, but Octavia notices that a port on subnet B is not used, so it unplugs this port
5. Create member on subnet B
   Octavia gets the list of ports attached to the amphora, it should create a port on subnet B but the port on subnet B is still in the list, so it doesn't update the ports
6. Port is removed from the list of attached ports and is missing from the amphora, the load balancer cannot create a connection to subnet B

II. Details

Unplugging a port from a server can take many seconds (the deletion of the port from the server is fast, but it takes up to 8 seconds on OSP for the changes to be committed in the DB) and Octavia doesn't wait for the removal to be completed.
So if a new member is added between the unplug API call and the effective deletion from the DB, it may leave the amphora with a bad network configuration.

Waiting for the DB update in Octavia would fix this issue, but it would also increase the duration of the member create flow.


Change proposed in https://review.opendev.org/c/openstack/octavia/+/829805

Comment 10 errata-xmlrpc 2022-09-21 12:18:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.