Bug 1278176
Summary: | Failure to reconnect when connection lost during message acknowlegement | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | John Eckersberg <jeckersb> |
Component: | python-oslo-messaging | Assignee: | John Eckersberg <jeckersb> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Ofer Blaut <oblaut> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 6.0 (Juno) | CC: | apevec, fpercoco, hguemar, jtaleric, karlthered, kgiusti, lhh, mwagner, sclewis, ushkalim, yeylon |
Target Milestone: | async | Keywords: | Triaged, ZStream |
Target Release: | 6.0 (Juno) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | python-oslo-messaging-1.4.1-8.el7ost | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | 1270353 | Environment: | |
Last Closed: | 2016-03-03 17:46:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1270353, 1278177, 1280016, 1280019, 1280020, 1288634, 1296680, 1300044 | ||
Bug Blocks: |
Comment 2
John Eckersberg
2015-11-04 21:25:42 UTC
Adding these 3 to the mix: https://code.engineering.redhat.com/gerrit/61124 fix memory leak for function _safe_log https://code.engineering.redhat.com/gerrit/61125 cleanup connection pool return https://code.engineering.redhat.com/gerrit/61126 Fix amqp connection pool leak in ConnectionContext This has all the required acks now and the patches have landed. John Eckersberg 2016-01-11 17:32:46 EST (In reply to Udi Shkalim from comment #8) > Hi John, > > Wanted to try and reproduce the same scenario mention above. > What will be the best way to do so in order to verify solution? > > Thanks, > Udi Since the timing is important here, I've written a patch that will simulate the failure by resetting the underlying connection 10% of the time in order to demonstrate the failure and the recovery. To install on the controller: # pushd /usr/lib/python2.7/site-packages; curl http://git.engineering.redhat.com/git/users/jeckersb/oslo.messaging/patch/?id=ackbreaker | patch -p1; popd Then restart a service (I usually just use openstack-nova-conductor). Before the fix, you will see... 2016-01-11 17:27:21.599 21973 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying. 2016-01-11 17:27:21.599 21973 TRACE root Traceback (most recent call last): 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 95, in inner_func 2016-01-11 17:27:21.599 21973 TRACE root return infunc(*args, **kwargs) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 98, in _executor_thread 2016-01-11 17:27:21.599 21973 TRACE root self._dispatch(incoming) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 86, in _dispatch 2016-01-11 17:27:21.599 21973 TRACE root spawn_with(ctxt=self.dispatcher(incoming), pool=self._greenpool) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 53, in spawn_with 2016-01-11 17:27:21.599 21973 TRACE root callback = ctxt.__enter__() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2016-01-11 17:27:21.599 21973 TRACE root return self.gen.next() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 135, in __call__ 2016-01-11 17:27:21.599 21973 TRACE root incoming.acknowledge() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 79, in acknowledge 2016-01-11 17:27:21.599 21973 TRACE root self.acknowledge_callback() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 171, in acknowledge 2016-01-11 17:27:21.599 21973 TRACE root self._raw_message.ack() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack 2016-01-11 17:27:21.599 21973 TRACE root self.channel.basic_ack(self.delivery_tag) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1583, in basic_ack 2016-01-11 17:27:21.599 21973 TRACE root self._send_method((60, 80), args) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method 2016-01-11 17:27:21.599 21973 TRACE root self.channel_id, method_sig, args, content, 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method 2016-01-11 17:27:21.599 21973 TRACE root write_frame(1, channel, payload) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame 2016-01-11 17:27:21.599 21973 TRACE root frame_type, channel, size, payload, 0xce, 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 377, in sendall 2016-01-11 17:27:21.599 21973 TRACE root tail = self.send(data, flags) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 359, in send 2016-01-11 17:27:21.599 21973 TRACE root total_sent += fd.send(data[total_sent:], flags) 2016-01-11 17:27:21.599 21973 TRACE root error: [Errno 32] Broken pipe 2016-01-11 17:27:21.599 21973 TRACE root And there will be no attempt to reconnect. It will basically hang here forever. Then: - apply the updated oslo.messaging package - *repatch the file*! - restart nova-conductor And there will be a reconnect after the error: 2016-01-11 17:32:39.015 1070 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying. 2016-01-11 17:32:39.015 1070 TRACE root Traceback (most recent call last): 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 95, in inner_func 2016-01-11 17:32:39.015 1070 TRACE root return infunc(*args, **kwargs) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 98, in _executor_thread 2016-01-11 17:32:39.015 1070 TRACE root self._dispatch(incoming) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 86, in _dispatch 2016-01-11 17:32:39.015 1070 TRACE root spawn_with(ctxt=self.dispatcher(incoming), pool=self._greenpool) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 53, in spawn_with 2016-01-11 17:32:39.015 1070 TRACE root callback = ctxt.__enter__() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2016-01-11 17:32:39.015 1070 TRACE root return self.gen.next() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 135, in __call__ 2016-01-11 17:32:39.015 1070 TRACE root incoming.acknowledge() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 79, in acknowledge 2016-01-11 17:32:39.015 1070 TRACE root self.acknowledge_callback() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 172, in acknowledge 2016-01-11 17:32:39.015 1070 TRACE root self._raw_message.ack() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack 2016-01-11 17:32:39.015 1070 TRACE root self.channel.basic_ack(self.delivery_tag) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1583, in basic_ack 2016-01-11 17:32:39.015 1070 TRACE root self._send_method((60, 80), args) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method 2016-01-11 17:32:39.015 1070 TRACE root self.channel_id, method_sig, args, content, 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method 2016-01-11 17:32:39.015 1070 TRACE root write_frame(1, channel, payload) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame 2016-01-11 17:32:39.015 1070 TRACE root frame_type, channel, size, payload, 0xce, 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 377, in sendall 2016-01-11 17:32:39.015 1070 TRACE root tail = self.send(data, flags) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 359, in send 2016-01-11 17:32:39.015 1070 TRACE root total_sent += fd.send(data[total_sent:], flags) 2016-01-11 17:32:39.015 1070 TRACE root error: [Errno 32] Broken pipe 2016-01-11 17:32:39.015 1070 TRACE root 2016-01-11 17:32:40.018 1070 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.0.2.8:5672 is unreachable: <AMQPError: unknown error>. Trying again in 1 seconds. 2016-01-11 17:32:41.038 1070 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 192.0.2.8:5672 |