Bug 1280016

Summary: Failure to reconnect when connection lost during message acknowlegement
Product: Red Hat OpenStack Reporter: Flavio Percoco <fpercoco>
Component: openstack-neutronAssignee: Flavio Percoco <fpercoco>
Status: CLOSED ERRATA QA Contact: Itzik Brown <itbrown>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.0 (RHEL 7)CC: amuller, apevec, chrisw, dmaley, fpercoco, jeckersb, jlibosva, jtaleric, kgiusti, lhh, mwagner, nyechiel, oblaut, sclewis, tfreger, yeylon
Target Milestone: asyncKeywords: GSSTriaged, Triaged, ZStream
Target Release: 5.0 (RHEL 7)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-2014.1.5-8.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1278177
: 1296680 (view as bug list) Environment:
Last Closed: 2016-02-01 15:12:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1278177    
Bug Blocks: 1278176    
Attachments:
Description Flags
Reset connection 1 in 10 message ACKs (bz1280016) none

Comment 4 Itzik Brown 2016-01-19 13:51:41 UTC
Hi,
Can you please write the steps required to verify?

Comment 5 Flavio Percoco 2016-01-19 18:42:48 UTC
Itzik,

This one is actually quite hard to test in a live environment. You could try to replicate it following the steps in the bug description:

(1) nova-compute is happily consuming messages from rabbitmq
(2) nova-compute receives some message M
(3) the connection to rabbitmq on which M arrived is reset
(4) nova-compute (and oslo.messaging and kombu...) try to acknowledge message M
(5) exception bubbles up for connection reset by peer
(6) connection is *not* re-established
(7) nova-compute no longer consumes messages from rabbitmq
(8) restarting nova-compute service causes messages to be consumed again

But again, replicating this issue in a live environment might not be trivial.

Comment 6 John Eckersberg 2016-01-19 18:47:20 UTC
I put makeshift reproduction instructions in the original bug.  You'll probably have to adjust the patch in order to apply to the neutron copy of oslo.rpc but it's a start.

https://bugzilla.redhat.com/show_bug.cgi?id=1270353#c9

Comment 7 Flavio Percoco 2016-01-21 13:41:14 UTC
Created attachment 1116927 [details]
Reset connection 1 in 10 message ACKs (bz1280016)

The attached patch should do the same the one John mentioned in c#6 does for oslo.messaging. I tried to backport it to the best of my knowledge but I haven't tested it.

I'm happy to assist the testing of this BZ on Monday (or tomorrow) if needed. Pls, provide access to the test environment.

Comment 8 Itzik Brown 2016-01-21 18:45:27 UTC
Checked with openstack-neutron-2014.1.5-9.el7ost.noarch
With Flavio's patch got the following:

2016-01-21 13:35:31.210 1726 ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: [Errno 32] Broken pipe
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common Traceback (most recent call last):
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 598, in ensure
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     return method(*args, **kwargs)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 685, in _consume
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     return self.connection.drain_events(timeout=timeout)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 275, in drain_events
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     return self.transport.drain_events(self.connection, **kwargs)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 91, in drain_events
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     return connection.drain_events(**kwargs)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 325, in drain_events
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     return amqp_method(channel, args, content)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1908, in _basic_deliver
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     fun(msg)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 207, in _callback
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     self._callback_handler(message, callback)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 183, in _callback_handler
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     message.ack()
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/kombu/message.py", line 88, in ack
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     self.channel.basic_ack(self.delivery_tag)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1583, in basic_ack
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     self._send_method((60, 80), args)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     self.channel_id, method_sig, args, content,
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     write_frame(1, channel, payload)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 182, in write_frame
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     frame_type, channel, size, payload, 0xce,
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 310, in sendall
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     tail = self.send(data, flags)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 295, in send
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common     total_sent += fd.send(data[total_sent:], flags)
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common error: [Errno 32] Broken pipe
2016-01-21 13:35:31.210 1726 TRACE neutron.openstack.common.rpc.common
2016-01-21 13:35:31.211 1726 INFO neutron.openstack.common.rpc.common [-] Reconnecting to AMQP server on 10.35.186.78:5672
2016-01-21 13:35:31.227 1726 DEBUG neutron.context [req-b3d8ff6a-49b7-4149-96b1-087e2aaf9243 None] Arguments dropped when creating context: {'project_name': None, 'tenant': None} __init__ /usr/lib/python2.7/site-packages/neutron/context.py:84
2016-01-21 13:35:31.248 1726 INFO neutron.openstack.common.rpc.common [-] Connected to AMQP server on 10.35.186.78:5672
2016-01-21 13:35:31.251 1726 DEBUG neutron.openstack.common.rpc.amqp [-] received {u'_context_roles': [u'admin'], u'_context_request_id': u'req-b3d8ff6a-49b7-4149-96b1-087e2aaf9243', u'_context_read_deleted': u'no', u'_context_user_name': None, u'_context_project_name': None, u'namespace': None, u'_context_tenant_id': None, u'args': {u'agent_state': {u'agent_state': {u'topic': u'N/A', u'binary': u'neutron-metadata-agent', u'host': u'controller.novalocal', u'agent_type': u'Metadata agent', u'configurations': {u'nova_metadata_port': 8775, u'nova_metadata_ip': u'10.35.186.78', u'metadata_proxy_socket': u'/var/lib/neutron/metadata_proxy'}}}, u'time': u'2016-01-21T18:35:31.203124'}, u'_context_tenant': None, u'_unique_id': u'74fa6a2dbd1847eb9c54101d6dfaddbb', u'_context_is_admin': True, u'version': u'1.0', u'_context_timestamp': u'2016-01-21 18:26:30.678305', u'_context_tenant_name': None, u'_context_user': None, u'_context_user_id': None, u'method': u'report_state', u'_context_project_id': None} _safe_log /usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/common.py:280

Comment 10 errata-xmlrpc 2016-02-01 15:12:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0093.html