Bug 1732834

Summary: Amphora RST instead of FIN connection with server side
Product: Red Hat OpenStack Reporter: Priscila <pveiga>
Component: openstack-octaviaAssignee: Michael Johnson <michjohn>
Status: CLOSED CURRENTRELEASE QA Contact: Bruna Bonguardo <bbonguar>
Severity: high Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: amuller, astafeye, broose, cgoncalves, ihrachys, lpeer, majopela, marjones, michjohn, njohnston, scohen
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-25 15:36:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1709925, 1759254    
Bug Blocks:    

Description Priscila 2019-07-24 13:22:04 UTC
Description of problem:

Http Client ----- Amphora ----- Http Server

Sniffing in the controller where is Amphora VIP attached We see it is RST the SYN instead of FIN. Also there are a lot of fragmentation and WSS negotiation, sometime We got Window full.

The client is able to get the web-page, but it slower than expected.

The tests are made with 50, 100 and 500 clients. All the same results.


Version-Release number of selected component (if applicable):


How reproducible: Always


•	It would be really nice to have some performance numbers that you guys have been able to achieve for this to be termed as carrier-grade. 

•	Would also appreciate if you could share any inputs on performance tuning Octavia
•	Any recommended flavor sizes for spinning up Amphorae, the default size of 1 core, 2 Gb disk and 1 Gig RAM does not seem enough.

•	Also I noticed when the Amphorae are spun up, at one time only one master is talking to the backend servers and has one IP that its using, it has to run 
        out of ports after 64000 TCP concurrent sessions, is there a way to add more IPs or is this the limitation

•	If I needed some help with Octavia and some guidance around performance tuning can you help me with that?

Comment 6 Michael Johnson 2019-07-26 19:11:19 UTC
Thank you for providing the pcap of the traffic you are concerned about.

I can see that this was a “bench marking” activity given the content of the flows and that there were 563 transfers in this 76 second capture.

I analyzed the HTTP GET /1024k.html flow in this pcap. It starts with packet 140 and ends with a RST/ACK in packet 1784.
I see that the flow has jumbo frames enabled and the amphora was communicating with the web server using 8960 byte segments (40 for the protocol overhead). The 1,048,576 byte HTTP payload on a network with 8960 byte segments took 118 TCP segments to transfer the payload. The total transfer time for this payload was 0.204659 seconds.
I was unable to find any packets that had IP fragmentation in the pcap.
The TCP window size stayed pretty consistent through the beginning of the transfer (approximately .12 seconds), but did shift down towards the end of the transfer. I also see a delayed ACK at approximately that time frame. This flow did not experience a window full event, though I see others in the capture did, especially the later the flows were in the capture.

In analyzing this flow, I do not see anything wrong with how the Amphora handled the request.

The RST you see at the end of the flows is expected behavior and does not impact the HTTP payload transfer time. The initial HTTP transfer finished at packet 1233 with the final ACK for the transfer. The Amphora then held the connection to the back end server for a short period to see if another request could be serviced over the same connection. This is a form of back end keep-alive. It reduces the latency between flows and the load on the back end servers. The bench marking tool being used does not send follow on requests, so the back end connection is eventually reset, with the RST flag, to close the TCP session. The tool is likely not using HTTP keepalive or reusing the client to Amphora TCP connections.

The delayed ACKs and TCP window full events are likely being caused by the client connecting to the Amphora not being able to handle the the data it is receiving in a timely manner. The Amphora will have to slow the rate of data from the server if the client is unable to handle it. This is common with clients that do not have tuned kernel settings and are using bench marking tools such as ApacheBench. To confirm this you can look at a pcap from the client to Amphora side that aligns to the pcap from the Amphora to the back end server. You should see some indication that the client was not responding in a timely manner to the data packets from the Amphora.

Comment 33 Red Hat Bugzilla 2023-09-18 00:16:52 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days