Red Hat Bugzilla – Bug 1302861
Utilize QoS to prevent excessive client side queueing of messages
Last modified: 2016-04-26 23:33:58 EDT
+++ This bug was initially created as a clone of Bug #1295896 +++
--- Additional comment from John Eckersberg on 2016-01-05 13:22:42 EST ---
This is problematic with the way we've got rabbitmq configured, particularly on older versions without AMQP heartbeat support. We set TCP_USER_TIMEOUT to 5s in order to quickly notice failed connections . What happens is roughly:
- There are a bunch of messages in a queue
- Because of no QoS, they all get flushed to the consumer(s)
- The consumer(s) can't process them fast enough, meaning they don't call recv() on the socket
- Messages buffer in the kernel on the consumer, using up the size of the recv buffer until it's full and the window drops to zero
- The server probes the zero window for 5 seconds, hits the timeout, and closes the connection due to timeout
 This is primarily due to weird behavior during VIP failover which we don't even use presently. See http://john.eckersberg.com/improving-ha-failures-with-tcp-timeouts.html. Maybe we should just turn this off for now...
--- Additional comment from John Eckersberg on 2016-01-07 13:31:21 EST ---
--- Additional comment from Perry Myers on 2016-01-12 10:01:11 EST ---
@eck: Do we need this on OSP5 and OSP6 as well? If so we need this cloned 3 more times for those releases (OSP5/RHEL6, OSP5/RHEL7, OSP6)
--- Additional comment from John Eckersberg on 2016-01-12 14:33:35 EST ---
(In reply to Perry Myers from comment #3)
> @eck: Do we need this on OSP5 and OSP6 as well? If so we need this cloned 3
> more times for those releases (OSP5/RHEL6, OSP5/RHEL7, OSP6)
Yeah it'd be nice to get it everywhere. I guess hold for now until the upstream patch gets accepted, because it looks like it may be slightly more invasive than I originally thought and maybe the backport won't be so straightforward/feasible. We'll see. I'll keep the needinfo? to keep it on my radar for clones.
/me goes off to amend the review.
Closing this because the backport from upstream is more involved and probably not worth the effort for now.