Bug 1270353
Summary: | Failure to reconnect when connection lost during message acknowlegement | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | John Eckersberg <jeckersb> | |
Component: | python-oslo-messaging | Assignee: | John Eckersberg <jeckersb> | |
Status: | CLOSED ERRATA | QA Contact: | Udi Shkalim <ushkalim> | |
Severity: | high | Docs Contact: | ||
Priority: | urgent | |||
Version: | 7.0 (Kilo) | CC: | apevec, dsafford, fpercoco, hguemar, jeckersb, jtaleric, kgiusti, lhh, lnatapov, mfuruta, mwagner, oblaut, plemenko, sputhenp, vcojot, yeylon | |
Target Milestone: | async | Keywords: | Triaged, ZStream | |
Target Release: | 7.0 (Kilo) | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | python-oslo-messaging-1.8.3-3.el7ost | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1278176 1278177 (view as bug list) | Environment: | ||
Last Closed: | 2016-01-20 13:12:29 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1278176, 1278177 |
Description
John Eckersberg
2015-10-09 18:16:31 UTC
This seems to be because this is missing in OSP7: https://review.openstack.org/#/c/213780/ It was merged into stable/kilo after the 1.8.3 tag. @eck Thanks for the analysis. I'll backport the patch and let you know. Hi John, Wanted to try and reproduce the same scenario mention above. What will be the best way to do so in order to verify solution? Thanks, Udi (In reply to Udi Shkalim from comment #8) > Hi John, > > Wanted to try and reproduce the same scenario mention above. > What will be the best way to do so in order to verify solution? > > Thanks, > Udi Since the timing is important here, I've written a patch that will simulate the failure by resetting the underlying connection 10% of the time in order to demonstrate the failure and the recovery. To install on the controller: # pushd /usr/lib/python2.7/site-packages; curl http://git.engineering.redhat.com/git/users/jeckersb/oslo.messaging/patch/?id=ackbreaker | patch -p1; popd Then restart a service (I usually just use openstack-nova-conductor). Before the fix, you will see... 2016-01-11 17:27:21.599 21973 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying. 2016-01-11 17:27:21.599 21973 TRACE root Traceback (most recent call last): 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 95, in inner_func 2016-01-11 17:27:21.599 21973 TRACE root return infunc(*args, **kwargs) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 98, in _executor_thread 2016-01-11 17:27:21.599 21973 TRACE root self._dispatch(incoming) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 86, in _dispatch 2016-01-11 17:27:21.599 21973 TRACE root spawn_with(ctxt=self.dispatcher(incoming), pool=self._greenpool) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 53, in spawn_with 2016-01-11 17:27:21.599 21973 TRACE root callback = ctxt.__enter__() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2016-01-11 17:27:21.599 21973 TRACE root return self.gen.next() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 135, in __call__ 2016-01-11 17:27:21.599 21973 TRACE root incoming.acknowledge() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 79, in acknowledge 2016-01-11 17:27:21.599 21973 TRACE root self.acknowledge_callback() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 171, in acknowledge 2016-01-11 17:27:21.599 21973 TRACE root self._raw_message.ack() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack 2016-01-11 17:27:21.599 21973 TRACE root self.channel.basic_ack(self.delivery_tag) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1583, in basic_ack 2016-01-11 17:27:21.599 21973 TRACE root self._send_method((60, 80), args) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method 2016-01-11 17:27:21.599 21973 TRACE root self.channel_id, method_sig, args, content, 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method 2016-01-11 17:27:21.599 21973 TRACE root write_frame(1, channel, payload) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame 2016-01-11 17:27:21.599 21973 TRACE root frame_type, channel, size, payload, 0xce, 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 377, in sendall 2016-01-11 17:27:21.599 21973 TRACE root tail = self.send(data, flags) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 359, in send 2016-01-11 17:27:21.599 21973 TRACE root total_sent += fd.send(data[total_sent:], flags) 2016-01-11 17:27:21.599 21973 TRACE root error: [Errno 32] Broken pipe 2016-01-11 17:27:21.599 21973 TRACE root And there will be no attempt to reconnect. It will basically hang here forever. Then: - apply the updated oslo.messaging package - *repatch the file*! - restart nova-conductor And there will be a reconnect after the error: 2016-01-11 17:32:39.015 1070 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying. 2016-01-11 17:32:39.015 1070 TRACE root Traceback (most recent call last): 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 95, in inner_func 2016-01-11 17:32:39.015 1070 TRACE root return infunc(*args, **kwargs) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 98, in _executor_thread 2016-01-11 17:32:39.015 1070 TRACE root self._dispatch(incoming) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 86, in _dispatch 2016-01-11 17:32:39.015 1070 TRACE root spawn_with(ctxt=self.dispatcher(incoming), pool=self._greenpool) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 53, in spawn_with 2016-01-11 17:32:39.015 1070 TRACE root callback = ctxt.__enter__() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2016-01-11 17:32:39.015 1070 TRACE root return self.gen.next() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 135, in __call__ 2016-01-11 17:32:39.015 1070 TRACE root incoming.acknowledge() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 79, in acknowledge 2016-01-11 17:32:39.015 1070 TRACE root self.acknowledge_callback() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 172, in acknowledge 2016-01-11 17:32:39.015 1070 TRACE root self._raw_message.ack() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack 2016-01-11 17:32:39.015 1070 TRACE root self.channel.basic_ack(self.delivery_tag) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1583, in basic_ack 2016-01-11 17:32:39.015 1070 TRACE root self._send_method((60, 80), args) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method 2016-01-11 17:32:39.015 1070 TRACE root self.channel_id, method_sig, args, content, 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method 2016-01-11 17:32:39.015 1070 TRACE root write_frame(1, channel, payload) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame 2016-01-11 17:32:39.015 1070 TRACE root frame_type, channel, size, payload, 0xce, 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 377, in sendall 2016-01-11 17:32:39.015 1070 TRACE root tail = self.send(data, flags) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 359, in send 2016-01-11 17:32:39.015 1070 TRACE root total_sent += fd.send(data[total_sent:], flags) 2016-01-11 17:32:39.015 1070 TRACE root error: [Errno 32] Broken pipe 2016-01-11 17:32:39.015 1070 TRACE root 2016-01-11 17:32:40.018 1070 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.0.2.8:5672 is unreachable: <AMQPError: unknown error>. Trying again in 1 seconds. 2016-01-11 17:32:41.038 1070 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 192.0.2.8:5672 Thanks John - worked like a charm. Verified on python-oslo-messaging-1.8.3-3.el7ost From steps to reproduce above got the expected error and re-connection. 2016-01-14 09:59:02.248 19142 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying. 2016-01-14 09:59:02.248 19142 TRACE root Traceback (most recent call last): 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 95, in inner_func 2016-01-14 09:59:02.248 19142 TRACE root return infunc(*args, **kwargs) 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 98, in _executor_thread 2016-01-14 09:59:02.248 19142 TRACE root self._dispatch(incoming) 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 86, in _dispatch 2016-01-14 09:59:02.248 19142 TRACE root spawn_with(ctxt=self.dispatcher(incoming), pool=self._greenpool) 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 53, in spawn_with 2016-01-14 09:59:02.248 19142 TRACE root callback = ctxt.__enter__() 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2016-01-14 09:59:02.248 19142 TRACE root return self.gen.next() 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 135, in __call__ 2016-01-14 09:59:02.248 19142 TRACE root incoming.acknowledge() 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 79, in acknowledge 2016-01-14 09:59:02.248 19142 TRACE root self.acknowledge_callback() 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 172, in acknowledge 2016-01-14 09:59:02.248 19142 TRACE root self._raw_message.ack() 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack 2016-01-14 09:59:02.248 19142 TRACE root self.channel.basic_ack(self.delivery_tag) 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1583, in basic_ack 2016-01-14 09:59:02.248 19142 TRACE root self._send_method((60, 80), args) 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method 2016-01-14 09:59:02.248 19142 TRACE root self.channel_id, method_sig, args, content, 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method 2016-01-14 09:59:02.248 19142 TRACE root write_frame(1, channel, payload) 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame 2016-01-14 09:59:02.248 19142 TRACE root frame_type, channel, size, payload, 0xce, 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 377, in sendall 2016-01-14 09:59:02.248 19142 TRACE root tail = self.send(data, flags) 2016-01-14 09:59:02.248 19142 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 359, in send 2016-01-14 09:59:02.248 19142 TRACE root total_sent += fd.send(data[total_sent:], flags) 2016-01-14 09:59:02.248 19142 TRACE root error: [Errno 32] Broken pipe 2016-01-14 09:59:02.248 19142 TRACE root 2016-01-14 09:59:03.251 19142 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.0.2.10:5672 is unreachable: <AMQPError: unknown error>. Trying again in 1 seconds. 2016-01-14 09:59:04.295 19142 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 192.0.2.10:5672 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:0047 |