Bug 2178062 - test_least_connections_tcp_traffic failed
Summary: test_least_connections_tcp_traffic failed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-octavia-tests-tempest
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: beta
: 17.1
Assignee: Gregory Thiemonge
QA Contact: Eliad Cohen
Greg Rakauskas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-14 10:57 UTC by Attila Fazekas
Modified: 2023-08-16 01:14 UTC (History)
4 users (show)

Fixed In Version: python-octavia-tests-tempest-1.9.0-1.20230407070913.a3a95b1.el9ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-16 01:14:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack Storyboard 2010660 0 None None None 2023-03-24 08:09:33 UTC
OpenStack gerrit 879726 0 None MERGED Fix request interval in _wait_for_lb_functional 2023-04-06 19:24:19 UTC
Red Hat Issue Tracker OSP-23073 0 None None None 2023-03-14 11:01:00 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:14:34 UTC

Description Attila Fazekas 2023-03-14 10:57:20 UTC
Description of problem:

octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest.test_least_connections_tcp_traffic failed with todays compose.


Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20230313.n.2

How reproducible:
Unknown



Actual results:

    Response - Headers: {'date': 'Mon, 13 Mar 2023 22:07:34 GMT', 'server': 'Apache', 'content-length': '1690', 'x-openstack-request-id': 'req-47ed8796-e9a5-486a-a104-a4f1fae23491', 'content-type': 'application/json', 'connection': 'close', 'status': '200', 'content-location': 'http://10.0.0.107:9876/v2.0/lbaas/loadbalancers/87d50129-0e55-46bb-94e2-a9cf87268e1b'}
        Body: b'{"loadbalancer": {"id": "87d50129-0e55-46bb-94e2-a9cf87268e1b", "name": "tempest-lb_member_lb1_operations-1567318300", "description": "", "provisioning_status": "ACTIVE", "operating_status": "ONLINE", "admin_state_up": true, "project_id": "3f571953295c4ebd9b3a9f0cd02bda69", "created_at": "2023-03-13T21:45:21", "updated_at": "2023-03-13T22:07:31", "vip_address": "10.1.1.93", "vip_port_id": "ed594a85-bc1a-4403-a6b0-dbb0e9e6bd06", "vip_subnet_id": "a8ad9797-f9e1-43e2-b8ea-12f57c7ccf20", "vip_network_id": "88c2dc7c-bf15-4b03-a21a-04db5fa3eedc", "listeners": [{"id": "074c6035-06d6-4da6-a397-9176ee011794"}, {"id": "1f4977c0-9ea8-42ee-8423-1f0d87c40c14"}, {"id": "3b26cf45-aae0-4d3c-878b-3ff8e677c8f1"}, {"id": "528310c6-b036-4ee3-9dee-e7c453cad726"}, {"id": "7a4232be-082a-473d-98cd-1cd35be2b944"}, {"id": "ace4f134-2e4e-4660-a711-d5c687dae146"}, {"id": "bccf5f43-8f9c-4683-aa33-6753638bdeeb"}, {"id": "c876546d-587d-4e32-9d37-bda620c01c44"}, {"id": "d372d2eb-5d16-471c-9f13-982153cb9f49"}, {"id": "d707d35b-52da-46a5-9dc2-d50beed7689d"}], "pools": [{"id": "190ba5cd-4b4f-4fb1-b725-528f624a272a"}, {"id": "19f66808-d0e9-40fb-9b34-df5bff8a35ab"}, {"id": "3151a9eb-3c61-4ae1-9b73-4847dc7ae123"}, {"id": "71966cde-87b4-45c7-998d-7148fd0d4e4a"}, {"id": "76e5f53a-dd64-496c-8203-b2c69c87965d"}, {"id": "8630a9f9-db58-4923-8936-b1c2b4f13b4b"}, {"id": "ce07939d-a57e-403b-a39d-cd1c1970a064"}, {"id": "d8b957bf-a7cd-40cb-8574-fbbf4f89d88c"}, {"id": "db8f3ccd-75c4-42fa-9489-fa86b34b73bf"}, {"id": "e8d84325-731e-407c-b300-4f409a88eb46"}], "provider": "octavia", "flavor_id": null, "vip_qos_policy_id": null, "tags": [], "availability_zone": null, "tenant_id": "3f571953295c4ebd9b3a9f0cd02bda69"}}'
2023-03-13 22:07:34,418 164068 INFO     [octavia_tempest_plugin.tests.waiters] show_loadbalancer's status updated to ACTIVE.
2023-03-13 22:07:34,419 164068 INFO     [octavia_tempest_plugin.services.load_balancer.v2.base_client] Cleanup complete for member 5d93baa9-b1d3-4a0e-8913-8ec1548c8bec...
}}}

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_traffic_ops.py", line 851, in test_least_connections_tcp_traffic
    self._test_basic_traffic(const.TCP, 88, pool_id,
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_traffic_ops.py", line 201, in _test_basic_traffic
    self.check_members_balanced(
  File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 321, in check_members_balanced
    self.assertEqual(1, len(set(response_counts.values())))
  File "/usr/lib/python3.9/site-packages/testtools/testcase.py", line 391, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/usr/lib/python3.9/site-packages/testtools/testcase.py", line 478, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: 1 != 2


Additional info:

Other tests passed, possibly octavia or the octavia_tempest_plugin has some
issue.

Comment 2 Attila Fazekas 2023-03-14 13:26:18 UTC
Not 100% reproducible.

Comment 3 Gregory Thiemonge 2023-03-15 07:30:27 UTC
The interesting part in the logs is:

2023-03-13 22:07:03,821 164068 INFO     [octavia_tempest_plugin.tests.validators] Request for http://10.0.0.182:88 timed out. Retrying.
2023-03-13 22:07:05,080 164068 DEBUG    [octavia_tempest_plugin.tests.validators] Loadbalancer response totals: {'5': 44, '1': 1}
2023-03-13 22:07:10,235 164068 DEBUG    [octavia_tempest_plugin.tests.validators] Loadbalancer response totals: {'5': 12, '1': 8}

The test waits for at least one reply from each member (it means that they are both ready)

{'5': 44, '1': 1} means that one member replied 44 times, the other only once, but it is enough to consider that it is also ready

Then the test expects a fair dispatch of the requests, it creates 20 requests and expects each member to reply 10 times.

but {'5': 12, '1': 8} indicates that one member replied 12 times, and the other only 8.

This is not a bug in Octavia but a random behavior in the test.
We need to investigate how to make it more stable.

Comment 4 Gregory Thiemonge 2023-03-15 07:41:01 UTC
The Octavia log offloading feature is not enabled by default in OSP, it is enabled only in phase3 jobs, we cannot get the logs from the haproxy service in the amphora.

Comment 5 Attila Fazekas 2023-03-15 08:51:13 UTC
Feel free to propose change to make the logging default enabled.

Typical tests are black box approach, and should not expect exact implementation detail,
but perfect balance should not be expected either.

Frequent loadbalancer tactic to hashing to the backend based on the 4-tule.  The and the src port is randomized by the client.

Typically things which has lower chance than uuid collision is ok, but at least lottery ticket winning chances should be beaten.

The test probably should try up to at least 128 request and not expecting more than one on the another node,
The test code probably also should have comment about the random trade off.

The port randomization likely not cryptographic grade random source...

Comment 6 Gregory Thiemonge 2023-03-15 09:46:27 UTC
(In reply to Attila Fazekas from comment #5)
> Feel free to propose change to make the logging default enabled.
> 
> Typical tests are black box approach, and should not expect exact
> implementation detail,
> but perfect balance should not be expected either.
> 
> Frequent loadbalancer tactic to hashing to the backend based on the 4-tule. 
> The and the src port is randomized by the client.
> 
> Typically things which has lower chance than uuid collision is ok, but at
> least lottery ticket winning chances should be beaten.
> 
> The test probably should try up to at least 128 request and not expecting
> more than one on the another node,
> The test code probably also should have comment about the random trade off.
>
> The port randomization likely not cryptographic grade random source...


In that particular test, we check that the "least connection" algorithm works properly, I agree that with some other dispatching algorithms, we cannot determine the number of requests for each backend member, but in that case, it should be straightforward.

I think that haproxy might have not been fully ready to forward the request to the backend members.
Let's see if we can reproduce it in phase3, log offloading is enabled, we could get more info from haproxy.

Comment 7 Attila Fazekas 2023-03-15 10:20:10 UTC
Which code part (or rule) should ensure a connection remains active on one backend while sending a new request
or at least statistically leading to haproxy does not see  equally (0) connected backend ?

Comment 8 Attila Fazekas 2023-03-21 14:15:45 UTC
I was not able to reproduce it so far.

Comment 9 Gregory Thiemonge 2023-03-24 08:09:34 UTC
I reported a similar issue upstream (this one is with round-robin dispatching): https://storyboard.openstack.org/#!/story/2010660

I don't know yet if it is an issue with haproxy.
We may have to add a workaround for it in octavia-tempest-plugin

Comment 10 Attila Fazekas 2023-03-24 08:34:54 UTC
https://github.com/haproxy/haproxy/issues/1393

I guess HAProxy does things little differently than in the encyclopedia,
and it might have a good reasons to do so.

I suggest connecting with HAProxy QA/dev group about what really can and cannot be expected.

Comment 27 errata-xmlrpc 2023-08-16 01:14:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.