Bug 1732834 - Amphora RST instead of FIN connection with server side
Summary: Amphora RST instead of FIN connection with server side
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-octavia
Version: 13.0 (Queens)
Hardware: All
OS: All
Target Milestone: ---
: ---
Assignee: Michael Johnson
QA Contact: Bruna Bonguardo
Depends On: 1709925 1759254
TreeView+ depends on / blocked
Reported: 2019-07-24 13:22 UTC by Priscila
Modified: 2020-01-21 04:50 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4484761 None None None 2019-10-08 14:32:57 UTC

Description Priscila 2019-07-24 13:22:04 UTC
Description of problem:

Http Client ----- Amphora ----- Http Server

Sniffing in the controller where is Amphora VIP attached We see it is RST the SYN instead of FIN. Also there are a lot of fragmentation and WSS negotiation, sometime We got Window full.

The client is able to get the web-page, but it slower than expected.

The tests are made with 50, 100 and 500 clients. All the same results.

Version-Release number of selected component (if applicable):

How reproducible: Always

•	It would be really nice to have some performance numbers that you guys have been able to achieve for this to be termed as carrier-grade. 

•	Would also appreciate if you could share any inputs on performance tuning Octavia
•	Any recommended flavor sizes for spinning up Amphorae, the default size of 1 core, 2 Gb disk and 1 Gig RAM does not seem enough.

•	Also I noticed when the Amphorae are spun up, at one time only one master is talking to the backend servers and has one IP that its using, it has to run 
        out of ports after 64000 TCP concurrent sessions, is there a way to add more IPs or is this the limitation

•	If I needed some help with Octavia and some guidance around performance tuning can you help me with that?

Comment 6 Michael Johnson 2019-07-26 19:11:19 UTC
Thank you for providing the pcap of the traffic you are concerned about.

I can see that this was a “bench marking” activity given the content of the flows and that there were 563 transfers in this 76 second capture.

I analyzed the HTTP GET /1024k.html flow in this pcap. It starts with packet 140 and ends with a RST/ACK in packet 1784.
I see that the flow has jumbo frames enabled and the amphora was communicating with the web server using 8960 byte segments (40 for the protocol overhead). The 1,048,576 byte HTTP payload on a network with 8960 byte segments took 118 TCP segments to transfer the payload. The total transfer time for this payload was 0.204659 seconds.
I was unable to find any packets that had IP fragmentation in the pcap.
The TCP window size stayed pretty consistent through the beginning of the transfer (approximately .12 seconds), but did shift down towards the end of the transfer. I also see a delayed ACK at approximately that time frame. This flow did not experience a window full event, though I see others in the capture did, especially the later the flows were in the capture.

In analyzing this flow, I do not see anything wrong with how the Amphora handled the request.

The RST you see at the end of the flows is expected behavior and does not impact the HTTP payload transfer time. The initial HTTP transfer finished at packet 1233 with the final ACK for the transfer. The Amphora then held the connection to the back end server for a short period to see if another request could be serviced over the same connection. This is a form of back end keep-alive. It reduces the latency between flows and the load on the back end servers. The bench marking tool being used does not send follow on requests, so the back end connection is eventually reset, with the RST flag, to close the TCP session. The tool is likely not using HTTP keepalive or reusing the client to Amphora TCP connections.

The delayed ACKs and TCP window full events are likely being caused by the client connecting to the Amphora not being able to handle the the data it is receiving in a timely manner. The Amphora will have to slow the rate of data from the server if the client is unable to handle it. This is common with clients that do not have tuned kernel settings and are using bench marking tools such as ApacheBench. To confirm this you can look at a pcap from the client to Amphora side that aligns to the pcap from the Amphora to the back end server. You should see some indication that the client was not responding in a timely manner to the data packets from the Amphora.

Note You need to log in before you can comment on or make changes to this bug.