Bug 2054145

Summary: Server on port X did not begin passing traffic within the timeout period
Product: Red Hat OpenStack Reporter: Omer Schwartz <oschwart>
Component: openstack-octaviaAssignee: Gregory Thiemonge <gthiemon>
Status: CLOSED ERRATA QA Contact: Bruna Bonguardo <bbonguar>
Severity: medium Docs Contact:
Priority: medium    
Version: 17.0 (Wallaby)CC: gthiemon, ihrachys, lpeer, majopela, scohen
Target Milestone: AlphaKeywords: Triaged
Target Release: 17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-octavia-8.0.2-0.20220329110858.a16f516.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-21 12:18:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Omer Schwartz 2022-02-14 09:52:56 UTC
Description of problem:
Some IPv6 tests:

octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops.IPv6TrafficOperationsScenarioTest

- test_ipv6_http_LC_listener_with_allowed_cidrs
- test_ipv6_http_SI_listener_with_allowed_cidrs
- test_ipv6_tcp_SI_listener_with_allowed_cidrs

Fail on CI when running with ACTIVE_STANDBY topology.


Links to the failed tests:

https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/octavia/job/DFG-network-octavia-17.0_director-rhel-virthost-3cont_3comp-ipv4-geneve-actstby/13/testReport/octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops/IPv6TrafficOperationsScenarioTest/Finally_Steps___test_ipv6_http_LC_listener_with_allowed_cidrs_id_9bead31b_0760_4c8f_b70a_f758fc5edd6a_/

https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/octavia/job/DFG-network-octavia-17.0_director-rhel-virthost-3cont_3comp-ipv4-geneve-actstby/13/testReport/octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops/IPv6TrafficOperationsScenarioTest/Finally_Steps___test_ipv6_http_SI_listener_with_allowed_cidrs_id_d1256195_3d85_4ffd_bda3_1c0ab78b8ce1_/

https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/octavia/job/DFG-network-octavia-17.0_director-rhel-virthost-3cont_3comp-ipv4-geneve-actstby/13/testReport/octavia_tempest_plugin.tests.scenario.v2.test_ipv6_traffic_ops/IPv6TrafficOperationsScenarioTest/Finally_Steps___test_ipv6_tcp_SI_listener_with_allowed_cidrs_id_bf8504b6_b95a_4f8a_9032_ab432db46eec_/


Version-Release number of selected component (if applicable):
17

How reproducible:
100%

Steps to Reproduce:
1. Run Active standby job for OSP17.
2.
3.

Actual results:
Some IPv6 fail

Expected results:
All should pass

Comment 1 Gregory Thiemonge 2022-02-14 10:10:13 UTC
It looks like all the requests were forwarded to the same member:

2022-02-11 14:03:06,693 318518 DEBUG    [octavia_tempest_plugin.tests.validators] Loadbalancer wait for load balancer response totals: {'1': 25636}


Need to investigate, but this open review https://review.opendev.org/c/openstack/octavia/+/828606 fixes a similar issue with ipv6 members

Comment 3 Gregory Thiemonge 2022-02-21 08:13:21 UTC
One potential related issue that I reproduced in my env:

Load balancer loses connectivity to its members when adding a new member (observed in the IPv6 Scenario tests)

The IPv6 tests trigger a race condition in the management of the member ports in Octavia.

I. Reproduction steps:

Using the same load balancer:

test 1:

1. Create member on subnet A
   A port on subnet A is attached to the amphora
2. Create member on subnet B
   A port on subnet B is attached to the amphora
3. Delete members
   Ports are not updated

test 2:

4. Create member on subnet A
   Port is already attached, but Octavia notices that a port on subnet B is not used, so it unplugs this port
5. Create member on subnet B
   Octavia gets the list of ports attached to the amphora, it should create a port on subnet B but the port on subnet B is still in the list, so it doesn't update the ports
6. Port is removed from the list of attached ports and is missing from the amphora, the load balancer cannot create a connection to subnet B

II. Details

Unplugging a port from a server can take many seconds (the deletion of the port from the server is fast, but it takes up to 8 seconds on OSP for the changes to be committed in the DB) and Octavia doesn't wait for the removal to be completed.
So if a new member is added between the unplug API call and the effective deletion from the DB, it may leave the amphora with a bad network configuration.

Waiting for the DB update in Octavia would fix this issue, but it would also increase the duration of the member create flow.


Change proposed in https://review.opendev.org/c/openstack/octavia/+/829805

Comment 10 errata-xmlrpc 2022-09-21 12:18:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543