Bug 1278177 - Failure to reconnect when connection lost during message acknowlegement
Summary: Failure to reconnect when connection lost during message acknowlegement
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-oslo-messaging
Version: 5.0 (RHEL 7)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: async
: 5.0 (RHEL 7)
Assignee: John Eckersberg
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On: 1270353
Blocks: 1278176 1280016 1280019 1280020 1288634 1300044
TreeView+ depends on / blocked
 
Reported: 2015-11-04 20:45 UTC by John Eckersberg
Modified: 2023-02-22 23:02 UTC (History)
9 users (show)

Fixed In Version: python-oslo-messaging-1.3.1-5.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of: 1270353
: 1280016 1280019 1280020 (view as bug list)
Environment:
Last Closed: 2016-02-11 13:43:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0167 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Bug Fix Advisory 2016-02-11 18:41:37 UTC

Comment 2 John Eckersberg 2015-11-04 21:26:30 UTC
Icehouse backport - https://review.openstack.org/#/c/195691/

(Abandoned due to EOL, but still valid)

Comment 3 John Eckersberg 2015-11-04 21:43:01 UTC
https://code.engineering.redhat.com/gerrit/#/c/60809/

Comment 4 Udi Shkalim 2016-01-13 14:17:07 UTC
John Eckersberg 2016-01-11 17:32:46 EST
(In reply to Udi Shkalim from comment #8)
> Hi John,
> 
> Wanted to try and reproduce the same scenario mention above.
> What will be the best way to do so in order to verify solution?
> 
> Thanks,
> Udi

Since the timing is important here, I've written a patch that will simulate the failure by resetting the underlying connection 10% of the time in order to demonstrate the failure and the recovery.

To install on the controller:

# pushd /usr/lib/python2.7/site-packages; curl http://git.engineering.redhat.com/git/users/jeckersb/oslo.messaging/patch/?id=ackbreaker | patch -p1; popd

Then restart a service (I usually just use openstack-nova-conductor).

Before the fix, you will see...

2016-01-11 17:27:21.599 21973 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying.
2016-01-11 17:27:21.599 21973 TRACE root Traceback (most recent call last):
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 95, in inner_func
2016-01-11 17:27:21.599 21973 TRACE root     return infunc(*args, **kwargs)
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 98, in _executor_thread
2016-01-11 17:27:21.599 21973 TRACE root     self._dispatch(incoming)
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 86, in _dispatch
2016-01-11 17:27:21.599 21973 TRACE root     spawn_with(ctxt=self.dispatcher(incoming), pool=self._greenpool)
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 53, in spawn_with
2016-01-11 17:27:21.599 21973 TRACE root     callback = ctxt.__enter__()
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2016-01-11 17:27:21.599 21973 TRACE root     return self.gen.next()
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 135, in __call__
2016-01-11 17:27:21.599 21973 TRACE root     incoming.acknowledge()
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 79, in acknowledge
2016-01-11 17:27:21.599 21973 TRACE root     self.acknowledge_callback()
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 171, in acknowledge
2016-01-11 17:27:21.599 21973 TRACE root     self._raw_message.ack()
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack
2016-01-11 17:27:21.599 21973 TRACE root     self.channel.basic_ack(self.delivery_tag)
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1583, in basic_ack
2016-01-11 17:27:21.599 21973 TRACE root     self._send_method((60, 80), args)
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
2016-01-11 17:27:21.599 21973 TRACE root     self.channel_id, method_sig, args, content,
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
2016-01-11 17:27:21.599 21973 TRACE root     write_frame(1, channel, payload)
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
2016-01-11 17:27:21.599 21973 TRACE root     frame_type, channel, size, payload, 0xce,
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 377, in sendall
2016-01-11 17:27:21.599 21973 TRACE root     tail = self.send(data, flags)
2016-01-11 17:27:21.599 21973 TRACE root   File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 359, in send
2016-01-11 17:27:21.599 21973 TRACE root     total_sent += fd.send(data[total_sent:], flags)
2016-01-11 17:27:21.599 21973 TRACE root error: [Errno 32] Broken pipe
2016-01-11 17:27:21.599 21973 TRACE root 

And there will be no attempt to reconnect.  It will basically hang here forever.

Then:

- apply the updated oslo.messaging package
- *repatch the file*!
- restart nova-conductor

And there will be a reconnect after the error:

2016-01-11 17:32:39.015 1070 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying.
2016-01-11 17:32:39.015 1070 TRACE root Traceback (most recent call last):
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 95, in inner_func
2016-01-11 17:32:39.015 1070 TRACE root     return infunc(*args, **kwargs)
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 98, in _executor_thread
2016-01-11 17:32:39.015 1070 TRACE root     self._dispatch(incoming)
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 86, in _dispatch
2016-01-11 17:32:39.015 1070 TRACE root     spawn_with(ctxt=self.dispatcher(incoming), pool=self._greenpool)
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py", line 53, in spawn_with
2016-01-11 17:32:39.015 1070 TRACE root     callback = ctxt.__enter__()
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2016-01-11 17:32:39.015 1070 TRACE root     return self.gen.next()
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 135, in __call__
2016-01-11 17:32:39.015 1070 TRACE root     incoming.acknowledge()
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 79, in acknowledge
2016-01-11 17:32:39.015 1070 TRACE root     self.acknowledge_callback()
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 172, in acknowledge
2016-01-11 17:32:39.015 1070 TRACE root     self._raw_message.ack()
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack
2016-01-11 17:32:39.015 1070 TRACE root     self.channel.basic_ack(self.delivery_tag)
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1583, in basic_ack
2016-01-11 17:32:39.015 1070 TRACE root     self._send_method((60, 80), args)
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
2016-01-11 17:32:39.015 1070 TRACE root     self.channel_id, method_sig, args, content,
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
2016-01-11 17:32:39.015 1070 TRACE root     write_frame(1, channel, payload)
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
2016-01-11 17:32:39.015 1070 TRACE root     frame_type, channel, size, payload, 0xce,
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 377, in sendall
2016-01-11 17:32:39.015 1070 TRACE root     tail = self.send(data, flags)
2016-01-11 17:32:39.015 1070 TRACE root   File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 359, in send
2016-01-11 17:32:39.015 1070 TRACE root     total_sent += fd.send(data[total_sent:], flags)
2016-01-11 17:32:39.015 1070 TRACE root error: [Errno 32] Broken pipe
2016-01-11 17:32:39.015 1070 TRACE root 
2016-01-11 17:32:40.018 1070 ERROR oslo_messaging._drivers.impl_rabbit [-] AMQP server on 192.0.2.8:5672 is unreachable: <AMQPError: unknown error>. Trying again in 1 seconds.
2016-01-11 17:32:41.038 1070 INFO oslo_messaging._drivers.impl_rabbit [-] Reconnected to AMQP server on 192.0.2.8:5672

Comment 5 Ofer Blaut 2016-01-20 10:33:00 UTC
reproduction instructions in the original bug. 

https://bugzilla.redhat.com/show_bug.cgi?id=1270353#c9

Comment 9 Ofer Blaut 2016-02-03 13:34:45 UTC
Tested   python-oslo-messaging-1.3.1-7.el7ost.noarch

2016-02-03 08:24:01.544 26403 TRACE root     self._send_method((60, 80), args)
2016-02-03 08:24:01.544 26403 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
2016-02-03 08:24:01.544 26403 TRACE root     self.channel_id, method_sig, args, content,
2016-02-03 08:24:01.544 26403 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
2016-02-03 08:24:01.544 26403 TRACE root     write_frame(1, channel, payload)
2016-02-03 08:24:01.544 26403 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
2016-02-03 08:24:01.544 26403 TRACE root     frame_type, channel, size, payload, 0xce,
2016-02-03 08:24:01.544 26403 TRACE root   File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 310, in sendall
2016-02-03 08:24:01.544 26403 TRACE root     tail = self.send(data, flags)
2016-02-03 08:24:01.544 26403 TRACE root   File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 295, in send
2016-02-03 08:24:01.544 26403 TRACE root     total_sent += fd.send(data[total_sent:], flags)
2016-02-03 08:24:01.544 26403 TRACE root error: [Errno 32] Broken pipe
2016-02-03 08:24:01.544 26403 TRACE root
2016-02-03 08:24:02.570 26403 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to consume message from queue: <AMQPError: unknown error>
2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 649, in ensure
2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit     return method(*args, **kwargs)
2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 738, in _consume
2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit     raise self.connection.recoverable_connection_errors[0]
2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit RecoverableConnectionError: <AMQPError: unknown error>
2016-02-03 08:24:02.570 26403 TRACE oslo.messaging._drivers.impl_rabbit
2016-02-03 08:24:02.571 26403 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 10.35.186.156:5672
2016-02-03 08:24:02.572 26403 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2016-02-03 08:24:03.606 26403 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 10.35.186.156:5672

Comment 11 errata-xmlrpc 2016-02-11 13:43:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0167.html


Note You need to log in before you can comment on or make changes to this bug.