Description of problem: octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest.test_least_connections_tcp_traffic failed with todays compose. Version-Release number of selected component (if applicable): RHOS-17.1-RHEL-9-20230313.n.2 How reproducible: Unknown Actual results: Response - Headers: {'date': 'Mon, 13 Mar 2023 22:07:34 GMT', 'server': 'Apache', 'content-length': '1690', 'x-openstack-request-id': 'req-47ed8796-e9a5-486a-a104-a4f1fae23491', 'content-type': 'application/json', 'connection': 'close', 'status': '200', 'content-location': 'http://10.0.0.107:9876/v2.0/lbaas/loadbalancers/87d50129-0e55-46bb-94e2-a9cf87268e1b'} Body: b'{"loadbalancer": {"id": "87d50129-0e55-46bb-94e2-a9cf87268e1b", "name": "tempest-lb_member_lb1_operations-1567318300", "description": "", "provisioning_status": "ACTIVE", "operating_status": "ONLINE", "admin_state_up": true, "project_id": "3f571953295c4ebd9b3a9f0cd02bda69", "created_at": "2023-03-13T21:45:21", "updated_at": "2023-03-13T22:07:31", "vip_address": "10.1.1.93", "vip_port_id": "ed594a85-bc1a-4403-a6b0-dbb0e9e6bd06", "vip_subnet_id": "a8ad9797-f9e1-43e2-b8ea-12f57c7ccf20", "vip_network_id": "88c2dc7c-bf15-4b03-a21a-04db5fa3eedc", "listeners": [{"id": "074c6035-06d6-4da6-a397-9176ee011794"}, {"id": "1f4977c0-9ea8-42ee-8423-1f0d87c40c14"}, {"id": "3b26cf45-aae0-4d3c-878b-3ff8e677c8f1"}, {"id": "528310c6-b036-4ee3-9dee-e7c453cad726"}, {"id": "7a4232be-082a-473d-98cd-1cd35be2b944"}, {"id": "ace4f134-2e4e-4660-a711-d5c687dae146"}, {"id": "bccf5f43-8f9c-4683-aa33-6753638bdeeb"}, {"id": "c876546d-587d-4e32-9d37-bda620c01c44"}, {"id": "d372d2eb-5d16-471c-9f13-982153cb9f49"}, {"id": "d707d35b-52da-46a5-9dc2-d50beed7689d"}], "pools": [{"id": "190ba5cd-4b4f-4fb1-b725-528f624a272a"}, {"id": "19f66808-d0e9-40fb-9b34-df5bff8a35ab"}, {"id": "3151a9eb-3c61-4ae1-9b73-4847dc7ae123"}, {"id": "71966cde-87b4-45c7-998d-7148fd0d4e4a"}, {"id": "76e5f53a-dd64-496c-8203-b2c69c87965d"}, {"id": "8630a9f9-db58-4923-8936-b1c2b4f13b4b"}, {"id": "ce07939d-a57e-403b-a39d-cd1c1970a064"}, {"id": "d8b957bf-a7cd-40cb-8574-fbbf4f89d88c"}, {"id": "db8f3ccd-75c4-42fa-9489-fa86b34b73bf"}, {"id": "e8d84325-731e-407c-b300-4f409a88eb46"}], "provider": "octavia", "flavor_id": null, "vip_qos_policy_id": null, "tags": [], "availability_zone": null, "tenant_id": "3f571953295c4ebd9b3a9f0cd02bda69"}}' 2023-03-13 22:07:34,418 164068 INFO [octavia_tempest_plugin.tests.waiters] show_loadbalancer's status updated to ACTIVE. 2023-03-13 22:07:34,419 164068 INFO [octavia_tempest_plugin.services.load_balancer.v2.base_client] Cleanup complete for member 5d93baa9-b1d3-4a0e-8913-8ec1548c8bec... }}} Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_traffic_ops.py", line 851, in test_least_connections_tcp_traffic self._test_basic_traffic(const.TCP, 88, pool_id, File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/scenario/v2/test_traffic_ops.py", line 201, in _test_basic_traffic self.check_members_balanced( File "/usr/lib/python3.9/site-packages/octavia_tempest_plugin/tests/validators.py", line 321, in check_members_balanced self.assertEqual(1, len(set(response_counts.values()))) File "/usr/lib/python3.9/site-packages/testtools/testcase.py", line 391, in assertEqual self.assertThat(observed, matcher, message) File "/usr/lib/python3.9/site-packages/testtools/testcase.py", line 478, in assertThat raise mismatch_error testtools.matchers._impl.MismatchError: 1 != 2 Additional info: Other tests passed, possibly octavia or the octavia_tempest_plugin has some issue.
Not 100% reproducible.
The interesting part in the logs is: 2023-03-13 22:07:03,821 164068 INFO [octavia_tempest_plugin.tests.validators] Request for http://10.0.0.182:88 timed out. Retrying. 2023-03-13 22:07:05,080 164068 DEBUG [octavia_tempest_plugin.tests.validators] Loadbalancer response totals: {'5': 44, '1': 1} 2023-03-13 22:07:10,235 164068 DEBUG [octavia_tempest_plugin.tests.validators] Loadbalancer response totals: {'5': 12, '1': 8} The test waits for at least one reply from each member (it means that they are both ready) {'5': 44, '1': 1} means that one member replied 44 times, the other only once, but it is enough to consider that it is also ready Then the test expects a fair dispatch of the requests, it creates 20 requests and expects each member to reply 10 times. but {'5': 12, '1': 8} indicates that one member replied 12 times, and the other only 8. This is not a bug in Octavia but a random behavior in the test. We need to investigate how to make it more stable.
The Octavia log offloading feature is not enabled by default in OSP, it is enabled only in phase3 jobs, we cannot get the logs from the haproxy service in the amphora.
Feel free to propose change to make the logging default enabled. Typical tests are black box approach, and should not expect exact implementation detail, but perfect balance should not be expected either. Frequent loadbalancer tactic to hashing to the backend based on the 4-tule. The and the src port is randomized by the client. Typically things which has lower chance than uuid collision is ok, but at least lottery ticket winning chances should be beaten. The test probably should try up to at least 128 request and not expecting more than one on the another node, The test code probably also should have comment about the random trade off. The port randomization likely not cryptographic grade random source...
(In reply to Attila Fazekas from comment #5) > Feel free to propose change to make the logging default enabled. > > Typical tests are black box approach, and should not expect exact > implementation detail, > but perfect balance should not be expected either. > > Frequent loadbalancer tactic to hashing to the backend based on the 4-tule. > The and the src port is randomized by the client. > > Typically things which has lower chance than uuid collision is ok, but at > least lottery ticket winning chances should be beaten. > > The test probably should try up to at least 128 request and not expecting > more than one on the another node, > The test code probably also should have comment about the random trade off. > > The port randomization likely not cryptographic grade random source... In that particular test, we check that the "least connection" algorithm works properly, I agree that with some other dispatching algorithms, we cannot determine the number of requests for each backend member, but in that case, it should be straightforward. I think that haproxy might have not been fully ready to forward the request to the backend members. Let's see if we can reproduce it in phase3, log offloading is enabled, we could get more info from haproxy.
Which code part (or rule) should ensure a connection remains active on one backend while sending a new request or at least statistically leading to haproxy does not see equally (0) connected backend ?
I was not able to reproduce it so far.
I reported a similar issue upstream (this one is with round-robin dispatching): https://storyboard.openstack.org/#!/story/2010660 I don't know yet if it is an issue with haproxy. We may have to add a workaround for it in octavia-tempest-plugin
https://github.com/haproxy/haproxy/issues/1393 I guess HAProxy does things little differently than in the encyclopedia, and it might have a good reasons to do so. I suggest connecting with HAProxy QA/dev group about what really can and cannot be expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577