Bug 1280016 - Failure to reconnect when connection lost during message acknowlegement
Failure to reconnect when connection lost during message acknowlegement
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron (Show other bugs)
5.0 (RHEL 7)
Unspecified Unspecified
urgent Severity urgent
: async
: 5.0 (RHEL 7)
Assigned To: Flavio Percoco
Itzik Brown
: GSSTriaged, Triaged, ZStream
Depends On: 1278177
Blocks: 1278176
  Show dependency treegraph
 
Reported: 2015-11-10 13:08 EST by Flavio Percoco
Modified: 2016-04-26 20:16 EDT (History)
17 users (show)

See Also:
Fixed In Version: openstack-neutron-2014.1.5-8.el7ost
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1278177
: 1296680 (view as bug list)
Environment:
Last Closed: 2016-02-01 10:12:31 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Reset connection 1 in 10 message ACKs (bz1280016) (1.58 KB, application/mbox)
2016-01-21 08:41 EST, Flavio Percoco
no flags Details

  None (edit)
Comment 4 Itzik Brown 2016-01-19 08:51:41 EST
Hi,
Can you please write the steps required to verify?
Comment 5 Flavio Percoco 2016-01-19 13:42:48 EST
Itzik,

This one is actually quite hard to test in a live environment. You could try to replicate it following the steps in the bug description:

(1) nova-compute is happily consuming messages from rabbitmq
(2) nova-compute receives some message M
(3) the connection to rabbitmq on which M arrived is reset
(4) nova-compute (and oslo.messaging and kombu...) try to acknowledge message M
(5) exception bubbles up for connection reset by peer
(6) connection is *not* re-established
(7) nova-compute no longer consumes messages from rabbitmq
(8) restarting nova-compute service causes messages to be consumed again

But again, replicating this issue in a live environment might not be trivial.
Comment 6 John Eckersberg 2016-01-19 13:47:20 EST
I put makeshift reproduction instructions in the original bug.  You'll probably have to adjust the patch in order to apply to the neutron copy of oslo.rpc but it's a start.

https://bugzilla.redhat.com/show_bug.cgi?id=1270353#c9
Comment 7 Flavio Percoco 2016-01-21 08:41 EST
Created attachment 1116927 [details]
Reset connection 1 in 10 message ACKs (bz1280016)

The attached patch should do the same the one John mentioned in c#6 does for oslo.messaging. I tried to backport it to the best of my knowledge but I haven't tested it.

I'm happy to assist the testing of this BZ on Monday (or tomorrow) if needed. Pls, provide access to the test environment.
Comment 8 Itzik Brown 2016-01-21 13:45:27 EST
Checked with openstack-neutron-2014.1.5-9.el7ost.noarch
With Flavio's patch got the following:

2016-01-21 13:35:31.210 1726 ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: [Errno 32] Broken pipe
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common Traceback (most recent call last):
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 598, in ensure
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     return method(*args, **kwargs)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 685, in _consume
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     return self.connection.drain_events(timeout=timeout)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 275, in drain_events
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     return self.transport.drain_events(self.connection, **kwargs)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 91, in drain_events
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     return connection.drain_events(**kwargs)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 325, in drain_events
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     return amqp_method(channel, args, content)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1908, in _basic_deliver
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     fun(msg)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 207, in _callback
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     self._callback_handler(message, callback)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 183, in _callback_handler
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     message.ack()
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     self.channel.basic_ack(self.delivery_tag)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1583, in basic_ack
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     self._send_method((60, 80), args)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     self.channel_id, method_sig, args, content,
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     write_frame(1, channel, payload)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     frame_type, channel, size, payload, 0xce,
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 310, in sendall
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     tail = self.send(data, flags)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 295, in send
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     total_sent += fd.send(data[total_sent:], flags)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common error: [Errno 32] Broken pipe
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common
2016-01-21 13:35:31.211 1726 INFO neutron.openstack.common.rpc.common [-] Reconnecting to AMQP server on 10.35.186.78:5672
2016-01-21 13:35:31.227 1726 DEBUG neutron.context [req-b3d8ff6a-49b7-4149-96b1-087e2aaf9243 None] Arguments dropped when creating context: {'project_name': None, 'tenant': None} __init__ /usr/lib/python2.7/site-packages/neutron/context.py:84
2016-01-21 13:35:31.248 1726 INFO neutron.openstack.common.rpc.common [-] Connected to AMQP server on 10.35.186.78:5672
2016-01-21 13:35:31.251 1726 DEBUG neutron.openstack.common.rpc.amqp [-] received {u'_context_roles': [u'admin'], u'_context_request_id': u'req-b3d8ff6a-49b7-4149-96b1-087e2aaf9243', u'_context_read_deleted': u'no', u'_context_user_name': None, u'_context_project_name': None, u'namespace': None, u'_context_tenant_id': None, u'args': {u'agent_state': {u'agent_state': {u'topic': u'N/A', u'binary': u'neutron-metadata-agent', u'host': u'controller.novalocal', u'agent_type': u'Metadata agent', u'configurations': {u'nova_metadata_port': 8775, u'nova_metadata_ip': u'10.35.186.78', u'metadata_proxy_socket': u'/var/lib/neutron/metadata_proxy'}}}, u'time': u'2016-01-21T18:35:31.203124'}, u'_context_tenant': None, u'_unique_id': u'74fa6a2dbd1847eb9c54101d6dfaddbb', u'_context_is_admin': True, u'version': u'1.0', u'_context_timestamp': u'2016-01-21 18:26:30.678305', u'_context_tenant_name': None, u'_context_user': None, u'_context_user_id': None, u'method': u'report_state', u'_context_project_id': None} _safe_log /usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/common.py:280
Comment 10 errata-xmlrpc 2016-02-01 10:12:31 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0093.html

Note You need to log in before you can comment on or make changes to this bug.