Bug 1401542 - INTERNAL ERROR Failed to declare consumer for topic
Summary: INTERNAL ERROR Failed to declare consumer for topic
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-ryu
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z6
: 10.0 (Newton)
Assignee: Daniel Alvarez Sanchez
QA Contact: Toni Freger
URL:
Whiteboard:
: 1465782 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-05 14:24 UTC by Asaf Hirshberg
Modified: 2020-08-13 08:44 UTC (History)
18 users (show)

Fixed In Version: python-ryu-4.9-2.1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-10 18:08:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
rabbitmq log (488.23 KB, text/plain)
2016-12-05 14:24 UTC, Asaf Hirshberg
no flags Details
SOSreport Controller-1 openvswitch InternalError (15.39 MB, application/x-xz)
2017-01-19 21:14 UTC, Marian Krcmarik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3128111 0 None None None 2017-07-26 14:47:48 UTC

Description Asaf Hirshberg 2016-12-05 14:24:58 UTC
Created attachment 1228073 [details]
rabbitmq log

Description of problem:
On system start up, after rebooting all controllers(none gracefully) neutron openvswitch got internal error:

2016-11-29 11:08:51.701 4971 INFO oslo.messaging._drivers.impl_rabbit [req-3adcf615-2282-47bb-9d19-164a158f2374 - - - - -] [352a6a1f-db50-40cd-8534-90b0b1e0a5cf] Reconnected to AMQP server on 10.35.169.11:5672 via [amqp] clientwith port 39286.
2016-11-29 11:08:51.722 4971 ERROR oslo.messaging._drivers.impl_rabbit [req-3adcf615-2282-47bb-9d19-164a158f2374 - - - - -] Failed to declare consumer for topic 'dhcp_agent.overcloud-controller-2.localdomain': (0, 0): (541) INTERNAL_ERROR
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service [req-3adcf615-2282-47bb-9d19-164a158f2374 - - - - -] Error starting thread.
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service Traceback (most recent call last):
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 708, in run_service
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     service.start()
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/neutron/service.py", line 330, in start
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     super(Service, self).start()
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 154, in wrapper
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     return f(*args, **kwargs)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 268, in start
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     self.conn.consume_in_threads()
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 294, in consume_in_threads
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     server.start()
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 268, in wrapper
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     log_after, timeout_timer)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 188, in run_once
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     post_fn = fn()
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 267, in <lambda>
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     states[state].run_once(lambda: fn(self, *args, **kwargs),
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 420, in start
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     self.listener = self._create_listener()
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 121, in _create_listener
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     return self.transport._listen(self._target, 1, None)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 112, in _listen
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     batch_timeout)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 481, in listen
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     callback=listener)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1132, in declare_topic_consumer
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     self.declare_consumer(consumer)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1040, in declare_consumer
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     error_callback=_connect_error)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 814, in ensure
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     ret, channel = autoretry_method()
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 436, in _ensured
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     return fun(*args, **kwargs)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 508, in __call__
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     return fun(*args, channel=channels[0], **kwargs), channels[0]
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 790, in execute_method
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     method()
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1028, in _declare_consumer
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     consumer.declare(self)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 302, in declare
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     self.queue.declare()
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 522, in declare
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     self.queue_declare(nowait, passive=False)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 548, in queue_declare
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     nowait=nowait)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1258, in queue_declare
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     (50, 11),  # Channel.queue_declare_ok
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 67, in wait
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     self.channel_id, allowed_methods)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 273, in _wait_method
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     self.wait()
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 69, in wait
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     return self.dispatch_method(method_sig, args, content)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 87, in dispatch_method
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     return amqp_method(self, args)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 529, in _close
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service     (class_id, method_id), ConnectionError)
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service InternalError: (0, 0): (541) INTERNAL_ERROR
2016-11-29 11:08:51.723 4971 ERROR oslo_service.service
[root@overcloud-controller-2 ~]#
Version-Release number of selected component (if applicable):

(automation) [stack@puma42 sts]$ neutron agent-list
+--------------------------------------+--------------------+------------------------------------+-------------------+-------+----------------+---------------------------+
| id                                   | agent_type         | host                               | availability_zone | alive | admin_state_up | binary                    |
+--------------------------------------+--------------------+------------------------------------+-------------------+-------+----------------+---------------------------+
| 0d08b50d-2ed6-48d1-8b91-70beeeb9a342 | Metadata agent     | overcloud-controller-1.localdomain |                   | :-)   | True           | neutron-metadata-agent    |
| 405ec568-556b-41e1-ac1f-ac0c29d85873 | Open vSwitch agent | overcloud-controller-1.localdomain |                   | :-)   | True           | neutron-openvswitch-agent |
| 515dfcd8-7aa1-412c-952a-04c5e20ca716 | L3 agent           | overcloud-controller-2.localdomain | nova              | :-)   | True           | neutron-l3-agent          |
| 54dd365c-40f3-4e0d-a78f-c1ad980afe3a | L3 agent           | overcloud-controller-1.localdomain | nova              | :-)   | True           | neutron-l3-agent          |
| 5a42d3fa-f4b2-4877-9f7c-351e11b0d6b4 | DHCP agent         | overcloud-controller-2.localdomain | nova              | xxx   | True           | neutron-dhcp-agent        |
| 61b33004-e2e4-4d7d-80a9-1b68d945219d | Open vSwitch agent | overcloud-controller-2.localdomain |                   | :-)   | True           | neutron-openvswitch-agent |
| 7c190db3-515c-427a-a380-eb114f4cb9e6 | L3 agent           | overcloud-controller-0.localdomain | nova              | :-)   | True           | neutron-l3-agent          |
| 97972c85-b5e8-4d55-b424-611364d1b076 | DHCP agent         | overcloud-controller-1.localdomain | nova              | xxx   | True           | neutron-dhcp-agent        |
| bc2c0a48-c917-43fb-85ca-7dcfeccfe5f6 | Metadata agent     | overcloud-controller-2.localdomain |                   | :-)   | True           | neutron-metadata-agent    |
| d6180df4-34f9-463e-8696-d2d131179607 | DHCP agent         | overcloud-controller-0.localdomain | nova              | :-)   | True           | neutron-dhcp-agent        |
| e329be8b-7f74-42c0-a852-f932b30b0d18 | Metadata agent     | overcloud-controller-0.localdomain |                   | :-)   | True           | neutron-metadata-agent    |
| ee28b426-01dd-4f56-a6d3-4ae8ec4766d8 | Open vSwitch agent | overcloud-compute-0.localdomain    |                   | :-)   | True           | neutron-openvswitch-agent |
| f950772b-85f5-4e21-aa6c-ef74eab24317 | Open vSwitch agent | overcloud-controller-0.localdomain |                   | :-)   | True           | neutron-openvswitch-agent |
+--------------------------------------+--------------------+------------------------------------+-------------------+-------+----------------+---------------------------+

rabbitmq log is attached.

How reproducible:
every time

Steps to Reproduce:
restart 3 controllers, wait for them to start.

Additional info:
OpenStack-10.0-RHEL-7 Puddle: 2016-11-29.1
Bare-metal setup, 3 controllers, 2 computes

Comment 4 Assaf Muller 2017-01-02 12:07:39 UTC
Please provide an SOS report of a controller showing the issue.

Comment 6 Asaf Hirshberg 2017-01-08 11:40:08 UTC
Assaf Muller,

NO. this setup is no longer exist and i couldn't manage to reproduce it in the the current puddle.

Comment 7 Assaf Muller 2017-01-09 11:32:25 UTC
Please re-open if relevant.

Comment 8 Marian Krcmarik 2017-01-19 21:14:41 UTC
Created attachment 1242601 [details]
SOSreport Controller-1 openvswitch InternalError

Comment 9 Marian Krcmarik 2017-01-19 21:15:47 UTC
(In reply to Assaf Muller from comment #4)
> Please provide an SOS report of a controller showing the issue.

I attached sosreport from controller with failed openvswitch-agent

Comment 10 Assaf Muller 2017-01-26 21:18:57 UTC
(In reply to Marian Krcmarik from comment #9)
> (In reply to Assaf Muller from comment #4)
> > Please provide an SOS report of a controller showing the issue.
> 
> I attached sosreport from controller with failed openvswitch-agent

Can you please supply reproduction steps?

Comment 12 Marian Krcmarik 2017-01-27 11:51:38 UTC
(In reply to Assaf Muller from comment #10)
> (In reply to Marian Krcmarik from comment #9)
> > (In reply to Assaf Muller from comment #4)
> > > Please provide an SOS report of a controller showing the issue.
> > 
> > I attached sosreport from controller with failed openvswitch-agent
> 
> Can you please supply reproduction steps?

https://bugzilla.redhat.com/show_bug.cgi?id=1401542#c0

Comment 14 Marian Krcmarik 2017-05-25 08:48:27 UTC
The bug started to appear in our automation very frequently - reset of all networkers at once is enough to trigger it.

Comment 20 PURANDHAR SAIRAM MANNIDI 2017-06-28 23:34:37 UTC
*** Bug 1465782 has been marked as a duplicate of this bug. ***

Comment 23 Daniel Alvarez Sanchez 2017-07-04 13:47:59 UTC
I've been looking into it and I think I found the root cause:

1. When OVS agent starts it tries to connect to AMQP server.
2. Since all services are starting and rabbit is still not ready,
   it fails to connect.
3. An uncaught exception occurs at that point which hits [0].
4. The agent dies.

Apparently, installed ryu package (4.3) has a bug [1] which was fixed in 4.4 [2] by introducing a semaphore in the close method.

$ grep ryu  ./dnucs002-controller-2.localdomain/installed-rpms
python-ryu-4.3-2.el7ost.noarch                              Tue Feb 28 16:14:29 2017

The solution was to bump the required version to 4.4 [3]

[0] https://github.com/openstack/neutron/blob/stable/newton/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py#L46
[1] https://bugs.launchpad.net/neutron/+bug/1589746
[2] https://github.com/osrg/ryu/commit/b0ab4f16028c452374c5a0f22bd970038194f142
[3] https://review.openstack.org/#/c/336799/

Comment 24 Daniel Alvarez Sanchez 2017-07-04 14:12:02 UTC
OSP10 now includes ryu 4.9 as of 2017-06-28 [1]. The solution would be upgrading to the latest OSP10 release.

[1] https://rhn.redhat.com/errata/RHBA-2017-1587.html

Comment 27 Lon Hohberger 2017-09-06 19:57:24 UTC
According to our records, this should be resolved by python-ryu-4.9-2.1.el7ost.  This build is available now.


Note You need to log in before you can comment on or make changes to this bug.