Bug 1280016
Summary: | Failure to reconnect when connection lost during message acknowlegement | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Flavio Percoco <fpercoco> | ||||
Component: | openstack-neutron | Assignee: | Flavio Percoco <fpercoco> | ||||
Status: | CLOSED ERRATA | QA Contact: | Itzik Brown <itbrown> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 5.0 (RHEL 7) | CC: | amuller, apevec, chrisw, dmaley, fpercoco, jeckersb, jlibosva, jtaleric, kgiusti, lhh, mwagner, nyechiel, oblaut, sclewis, tfreger, yeylon | ||||
Target Milestone: | async | Keywords: | GSSTriaged, Triaged, ZStream | ||||
Target Release: | 5.0 (RHEL 7) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openstack-neutron-2014.1.5-8.el7ost | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 1278177 | ||||||
: | 1296680 (view as bug list) | Environment: | |||||
Last Closed: | 2016-02-01 15:12:31 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1278177 | ||||||
Bug Blocks: | 1278176 | ||||||
Attachments: |
|
Comment 4
Itzik Brown
2016-01-19 13:51:41 UTC
Itzik, This one is actually quite hard to test in a live environment. You could try to replicate it following the steps in the bug description: (1) nova-compute is happily consuming messages from rabbitmq (2) nova-compute receives some message M (3) the connection to rabbitmq on which M arrived is reset (4) nova-compute (and oslo.messaging and kombu...) try to acknowledge message M (5) exception bubbles up for connection reset by peer (6) connection is *not* re-established (7) nova-compute no longer consumes messages from rabbitmq (8) restarting nova-compute service causes messages to be consumed again But again, replicating this issue in a live environment might not be trivial. I put makeshift reproduction instructions in the original bug. You'll probably have to adjust the patch in order to apply to the neutron copy of oslo.rpc but it's a start. https://bugzilla.redhat.com/show_bug.cgi?id=1270353#c9 Created attachment 1116927 [details] Reset connection 1 in 10 message ACKs (bz1280016) The attached patch should do the same the one John mentioned in c#6 does for oslo.messaging. I tried to backport it to the best of my knowledge but I haven't tested it. I'm happy to assist the testing of this BZ on Monday (or tomorrow) if needed. Pls, provide access to the test environment. Checked with openstack-neutron-2014.1.5-9.el7ost.noarch With Flavio's patch got the following: 2016-01-21 13:35:31.210 1726 ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: [Errno 32] Broken pipe 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common Traceback (most recent call last): 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 598, in ensure 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common return method(*args, **kwargs) 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 685, in _consume 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common return self.connection.drain_events(timeout=timeout) 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 275, in drain_events 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common return self.transport.drain_events(self.connection, **kwargs) 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 91, in drain_events 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common return connection.drain_events(**kwargs) 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 325, in drain_events 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common return amqp_method(channel, args, content) 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1908, in _basic_deliver 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common fun(msg) 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 207, in _callback 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common self._callback_handler(message, callback) 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 183, in _callback_handler 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common message.ack() 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common self.channel.basic_ack(self.delivery_tag) 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1583, in basic_ack 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common self._send_method((60, 80), args) 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common self.channel_id, method_sig, args, content, 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common write_frame(1, channel, payload) 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common frame_type, channel, size, payload, 0xce, 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 310, in sendall 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common tail = self.send(data, flags) 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 295, in send 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common total_sent += fd.send(data[total_sent:], flags) 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common error: [Errno 32] Broken pipe 2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common 2016-01-21 13:35:31.211 1726 INFO neutron.openstack.common.rpc.common [-] Reconnecting to AMQP server on 10.35.186.78:5672 2016-01-21 13:35:31.227 1726 DEBUG neutron.context [req-b3d8ff6a-49b7-4149-96b1-087e2aaf9243 None] Arguments dropped when creating context: {'project_name': None, 'tenant': None} __init__ /usr/lib/python2.7/site-packages/neutron/context.py:84 2016-01-21 13:35:31.248 1726 INFO neutron.openstack.common.rpc.common [-] Connected to AMQP server on 10.35.186.78:5672 2016-01-21 13:35:31.251 1726 DEBUG neutron.openstack.common.rpc.amqp [-] received {u'_context_roles': [u'admin'], u'_context_request_id': u'req-b3d8ff6a-49b7-4149-96b1-087e2aaf9243', u'_context_read_deleted': u'no', u'_context_user_name': None, u'_context_project_name': None, u'namespace': None, u'_context_tenant_id': None, u'args': {u'agent_state': {u'agent_state': {u'topic': u'N/A', u'binary': u'neutron-metadata-agent', u'host': u'controller.novalocal', u'agent_type': u'Metadata agent', u'configurations': {u'nova_metadata_port': 8775, u'nova_metadata_ip': u'10.35.186.78', u'metadata_proxy_socket': u'/var/lib/neutron/metadata_proxy'}}}, u'time': u'2016-01-21T18:35:31.203124'}, u'_context_tenant': None, u'_unique_id': u'74fa6a2dbd1847eb9c54101d6dfaddbb', u'_context_is_admin': True, u'version': u'1.0', u'_context_timestamp': u'2016-01-21 18:26:30.678305', u'_context_tenant_name': None, u'_context_user': None, u'_context_user_id': None, u'method': u'report_state', u'_context_project_id': None} _safe_log /usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/common.py:280 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0093.html |