Back to bug 1315156

Who When What Removed Added
Red Hat Bugzilla Rules Engine 2016-03-07 05:44:16 UTC Target Release --- 8.0
Masaki Furuta ( RH ) 2016-03-07 06:01:44 UTC Comment 3 is private 1 0
CC mfuruta
Flags needinfo?(amuller)
Assaf Muller 2016-03-07 23:39:42 UTC Priority unspecified medium
Target Release 8.0 7.0
Link ID OpenStack gerrit 280595
Flags needinfo?(amuller)
RHEL Program Management 2016-03-07 23:54:58 UTC Keywords ZStream
Jaison Raju 2016-04-01 07:30:18 UTC CC jraju
Assaf Muller 2016-06-01 00:31:03 UTC Status NEW POST
Target Release 7.0 (Kilo) 8.0 (Liberty)
CC chrisw, srevivo
Component openstack-neutron-lbaas openstack-neutron
Chris Lincoln 2016-08-03 15:27:41 UTC CC clincoln
Flags needinfo?(amuller)
John Skeoch 2016-08-08 01:27:48 UTC CC clincoln dchia
Tim Quinlan 2016-11-30 17:08:41 UTC CC tquinlan
Assaf Muller 2016-12-09 21:21:39 UTC Status POST MODIFIED
Link ID OpenStack gerrit 280595 OpenStack gerrit 314319
Fixed In Version openstack-neutron-7.1.1-3.el7ost
Target Milestone --- async
Severity urgent medium
Assaf Muller 2016-12-09 21:22:03 UTC Flags needinfo?(amuller)
Assaf Muller 2016-12-19 15:26:28 UTC Doc Text Cause:
If a server operation takes long enough to trigger a timeout on an agent call to the server, the agent will just give up and issue a new call immediately.

Consequence:
First, if the server is busy and the requests take more than the timeout window to fulfill, the agent will just continually hammer the server with calls that are bound to fail until the server load is reduced enough to fulfill the query. If the load is a result of calls from agents, this leads to a stampeding effect where the server will be unable to fulfill requests until operator intervention.

Second, the server will build a backlog of call requests that makes the window of time to process a message smaller as the backlog grows. With enough clients making calls, the timeout threshold can be crossed before a call even starts to process. For example, if it takes the server 6 seconds to process a given call and the clients are configured with a 60 second timeout, 30 agents making the call simultaneously will result in a situation where 20 of the agents will never get a response. The first 10 will get their calls filled and the last 20 will end up in a loop where the server is just spending time replying to calls that are expired by the time it processes them.

Fix:
This adds an exponential backoff mechanism for timeout values on any RPC calls in Neutron that don't explicitly request a timeout value.

Result:
This will prevent the clients from DDoSing the server by giving up on requests and retrying them before they are fulfilled.
Jon Schlueter 2017-01-03 13:16:30 UTC Keywords TestOnly
Status MODIFIED ON_QA
Toni Freger 2017-01-04 09:02:38 UTC QA Contact tfreger ekuris
Eran Kuris 2017-01-04 09:56:26 UTC Flags needinfo?(amuller)
Assaf Muller 2017-01-04 12:44:39 UTC Link ID Launchpad 1554332
Flags needinfo?(amuller)
Eran Kuris 2017-01-09 09:00:44 UTC Status ON_QA VERIFIED
Jon Schlueter 2017-01-11 19:26:51 UTC Status VERIFIED CLOSED
Resolution --- CURRENTRELEASE
Last Closed 2017-01-11 14:26:51 UTC
Toni Freger 2017-12-19 10:13:17 UTC CC tfreger

Back to bug 1315156