Icehouse backport - https://review.openstack.org/#/c/195691/ (Abandoned due to EOL, but still valid)
https://code.engineering.redhat.com/gerrit/#/c/60809/
John Eckersberg 2016-01-11 17:32:46 EST (In reply to Udi Shkalim from comment #8) > Hi John, > > Wanted to try and reproduce the same scenario mention above. > What will be the best way to do so in order to verify solution? > > Thanks, > Udi Since the timing is important here, I've written a patch that will simulate the failure by resetting the underlying connection 10% of the time in order to demonstrate the failure and the recovery. To install on the controller: # pushd /usr/lib/python2.7/site-packages; curl http://git.engineering.redhat.com/git/users/jeckersb/oslo.messaging/patch/?id=ackbreaker | patch -p1; popd Then restart a service (I usually just use openstack-nova-conductor). Before the fix, you will see... 2016-01-11 17:27:21.599 21973 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying. 2016-01-11 17:27:21.599 21973 TRACE root Traceback (most recent call last): 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 95, in inner_func 2016-01-11 17:27:21.599 21973 TRACE root return infunc(*args, **kwargs) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 98, in _executor_thread 2016-01-11 17:27:21.599 21973 TRACE root self._dispatch(incoming) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 86, in _dispatch 2016-01-11 17:27:21.599 21973 TRACE root spawn_with(ctxt=self.dispatcher(incoming), pool=self._greenpool) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 53, in spawn_with 2016-01-11 17:27:21.599 21973 TRACE root callback = ctxt.__enter__() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2016-01-11 17:27:21.599 21973 TRACE root return self.gen.next() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 135, in __call__ 2016-01-11 17:27:21.599 21973 TRACE root incoming.acknowledge() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 79, in acknowledge 2016-01-11 17:27:21.599 21973 TRACE root self.acknowledge_callback() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 171, in acknowledge 2016-01-11 17:27:21.599 21973 TRACE root self._raw_message.ack() 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack 2016-01-11 17:27:21.599 21973 TRACE root self.channel.basic_ack(self.delivery_tag) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1583, in basic_ack 2016-01-11 17:27:21.599 21973 TRACE root self._send_method((60, 80), args) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method 2016-01-11 17:27:21.599 21973 TRACE root self.channel_id, method_sig, args, content, 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method 2016-01-11 17:27:21.599 21973 TRACE root write_frame(1, channel, payload) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame 2016-01-11 17:27:21.599 21973 TRACE root frame_type, channel, size, payload, 0xce, 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 377, in sendall 2016-01-11 17:27:21.599 21973 TRACE root tail = self.send(data, flags) 2016-01-11 17:27:21.599 21973 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 359, in send 2016-01-11 17:27:21.599 21973 TRACE root total_sent += fd.send(data[total_sent:], flags) 2016-01-11 17:27:21.599 21973 TRACE root error: [Errno 32] Broken pipe 2016-01-11 17:27:21.599 21973 TRACE root And there will be no attempt to reconnect. It will basically hang here forever. Then: - apply the updated oslo.messaging package - *repatch the file*! - restart nova-conductor And there will be a reconnect after the error: 2016-01-11 17:32:39.015 1070 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying. 2016-01-11 17:32:39.015 1070 TRACE root Traceback (most recent call last): 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 95, in inner_func 2016-01-11 17:32:39.015 1070 TRACE root return infunc(*args, **kwargs) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 98, in _executor_thread 2016-01-11 17:32:39.015 1070 TRACE root self._dispatch(incoming) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 86, in _dispatch 2016-01-11 17:32:39.015 1070 TRACE root spawn_with(ctxt=self.dispatcher(incoming), pool=self._greenpool) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 53, in spawn_with 2016-01-11 17:32:39.015 1070 TRACE root callback = ctxt.__enter__() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2016-01-11 17:32:39.015 1070 TRACE root return self.gen.next() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 135, in __call__ 2016-01-11 17:32:39.015 1070 TRACE root incoming.acknowledge() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 79, in acknowledge 2016-01-11 17:32:39.015 1070 TRACE root self.acknowledge_callback() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 172, in acknowledge 2016-01-11 17:32:39.015 1070 TRACE root self._raw_message.ack() 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack 2016-01-11 17:32:39.015 1070 TRACE root self.channel.basic_ack(self.delivery_tag) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1583, in basic_ack 2016-01-11 17:32:39.015 1070 TRACE root self._send_method((60, 80), args) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method 2016-01-11 17:32:39.015 1070 TRACE root self.channel_id, method_sig, args, content, 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method 2016-01-11 17:32:39.015 1070 TRACE root write_frame(1, channel, payload) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame 2016-01-11 17:32:39.015 1070 TRACE root frame_type, channel, size, payload, 0xce, 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 377, in sendall 2016-01-11 17:32:39.015 1070 TRACE root tail = self.send(data, flags) 2016-01-11 17:32:39.015 1070 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 359, in send 2016-01-11 17:32:39.015 1070 TRACE root total_sent += fd.send(data[total_sent:], flags) 2016-01-11 17:32:39.015 1070 TRACE root error: [Errno 32] Broken pipe 2016-01-11 17:32:39.015 1070 TRACE root 2016-01-11 17:32:40.018 1070 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.0.2.8:5672 is unreachable: <AMQPError: unknown error>. Trying again in 1 seconds. 2016-01-11 17:32:41.038 1070 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 192.0.2.8:5672
reproduction instructions in the original bug. https://bugzilla.redhat.com/show_bug.cgi?id=1270353#c9
Tested python-oslo-messaging-1.3.1-7.el7ost.noarch 2016-02-03 08:24:01.544 26403 TRACE root self._send_method((60, 80), args) 2016-02-03 08:24:01.544 26403 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method 2016-02-03 08:24:01.544 26403 TRACE root self.channel_id, method_sig, args, content, 2016-02-03 08:24:01.544 26403 TRACE root File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method 2016-02-03 08:24:01.544 26403 TRACE root write_frame(1, channel, payload) 2016-02-03 08:24:01.544 26403 TRACE root File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame 2016-02-03 08:24:01.544 26403 TRACE root frame_type, channel, size, payload, 0xce, 2016-02-03 08:24:01.544 26403 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 310, in sendall 2016-02-03 08:24:01.544 26403 TRACE root tail = self.send(data, flags) 2016-02-03 08:24:01.544 26403 TRACE root File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 295, in send 2016-02-03 08:24:01.544 26403 TRACE root total_sent += fd.send(data[total_sent:], flags) 2016-02-03 08:24:01.544 26403 TRACE root error: [Errno 32] Broken pipe 2016-02-03 08:24:01.544 26403 TRACE root 2016-02-03 08:24:02.570 26403 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to consume message from queue: <AMQPError: unknown error> 2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last): 2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 649, in ensure 2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs) 2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 738, in _consume 2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit raise self.connection.recoverable_connection_errors[0] 2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit RecoverableConnectionError: <AMQPError: unknown error> 2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit 2016-02-03 08:24:02.571 26403 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 10.35.186.156:5672 2016-02-03 08:24:02.572 26403 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds... 2016-02-03 08:24:03.606 26403 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 10.35.186.156:5672
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0167.html