Hide Forgot
This was found in the context of a customer. During the debug session, we noticed that whenever a timeout error was raised[0], it was propagated by[1] and waiters were correctly removed by[2] but the connection waiting for replies is still in used and it's never put back into the connection pool[3][4]. In other words, there's just one connection waiting for replies and it's dedicated to it, which is not ideal. This issue, as shown in [3], exists in master as well. This issue also explains some of the behaviors we were seeing in the customer's environment. Here's a short log of a conversation between John and myself about this issue: 2016-02-09 10:39:57 eck [15:09:39] flaper87: every time i read o.m code i learn something new and important [83/3933] 2016-02-09 10:39:57 eck [15:10:21] i just realized there's only one connection in the pool that is waiting for replies (and it's dedicated to doing so) 2016-02-09 10:46:15 flaper87 eck: is that OSP5 ? 2016-02-09 10:57:01 eck flaper87: no on master 2016-02-09 10:57:11 eck flaper87: i'm guessing it's the same back in icehouse though? 2016-02-09 11:00:07 flaper87 eck: I don't think so, that's why I'm asking. I think it's newish stuff but I can't recall 2016-02-09 11:02:01 * eck looks 2016-02-09 11:04:23 eck flaper87: seems to still be the case in icehouse if i'm reading correctly 2016-02-09 11:06:25 eck basically this... https://github.com/openstack/oslo.messaging/blob/icehouse-eol/oslo/messaging/_drivers/amqpdriver.py#L350-L365 2016-02-09 11:07:05 eck it just sets one _reply_q and one _reply_q_conn for the whole driver 2016-02-09 11:07:58 eck and the ReplyWaiter just listens on that one connection 2016-02-09 11:08:42 eck i think it could also explain some of the bottleneck too 2016-02-09 11:08:55 flaper87 eck: ah yeah, thought you were talking about something else 2016-02-09 11:09:09 eck if you've got ~30 greenthreads context switching on rpc connections 2016-02-09 11:09:09 flaper87 yeah, that's also one of the causes for those leaks we were seeing 2016-02-09 11:09:20 flaper87 There's a card for it and I'm supposed to create a BZ today 2016-02-09 11:09:21 eck and only one of them is able to read replies 2016-02-09 11:09:26 flaper87 with a more detailed explanation 2016-02-09 11:09:31 eck then it's not going to get "scheduled" very often 2016-02-09 11:09:35 eck flaper87: cool 2016-02-09 11:09:37 flaper87 right 2016-02-09 11:10:18 eck and just to finish my thought... :) 2016-02-09 11:10:41 eck you've got metadata workers with big pools submitting lots of messages 2016-02-09 11:11:22 eck and over in conductor, you've only got two listening greenthreads 2016-02-09 11:11:32 eck one is consuming from the conductor queue 2016-02-09 11:11:39 eck and the other is consuming from the reply queue 2016-02-09 11:12:38 eck the only semi-good news for conductor in that scenario is it's not waiting for a whole lot of replies itself 2016-02-09 11:12:46 eck at least i don't think so [0] https://github.com/openstack/oslo.messaging/blob/icehouse-eol/oslo/messaging/_drivers/amqpdriver.py#L217-L221 [1] https://github.com/openstack/oslo.messaging/blob/icehouse-eol/oslo/messaging/_drivers/amqpdriver.py#L410 [2] https://github.com/openstack/oslo.messaging/blob/icehouse-eol/oslo/messaging/_drivers/amqpdriver.py#L416 [3] Icehouse: https://github.com/openstack/oslo.messaging/blob/icehouse-eol/oslo/messaging/_drivers/amqpdriver.py#L350-L365 [3] Master: https://github.com/openstack/oslo.messaging/blob/master/oslo_messaging/_drivers/amqpdriver.py#L374-L389 [4] https://github.com/openstack/oslo.messaging/blob/icehouse-eol/oslo/messaging/_drivers/amqpdriver.py#L184
I think is still a valid bug on master. Still needs investigation. Updating version to Newton (10).
Retarget to RHOS 10.
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
I'm going to go ahead and close this. In theory, we might be able to get a *very* small performance improvement here if we can parallelize the reply waiters, but it's really not worth the effort and would risk introducing new bugs.