Created attachment 1228073 [details] rabbitmq log Description of problem: On system start up, after rebooting all controllers(none gracefully) neutron openvswitch got internal error: 2016-11-29 11:08:51.701 4971 INFO oslo.messaging._drivers.impl_rabbit [req-3adcf615-2282-47bb-9d19-164a158f2374 - - - - -] [352a6a1f-db50-40cd-8534-90b0b1e0a5cf] Reconnected to AMQP server on 10.35.169.11:5672 via [amqp] clientwith port 39286. 2016-11-29 11:08:51.722 4971 ERROR oslo.messaging._drivers.impl_rabbit [req-3adcf615-2282-47bb-9d19-164a158f2374 - - - - -] Failed to declare consumer for topic 'dhcp_agent.overcloud-controller-2.localdomain': (0, 0): (541) INTERNAL_ERROR 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service [req-3adcf615-2282-47bb-9d19-164a158f2374 - - - - -] Error starting thread. 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service Traceback (most recent call last): 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 708, in run_service 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service service.start() 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/neutron/service.py", line 330, in start 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service super(Service, self).start() 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 154, in wrapper 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service return f(*args, **kwargs) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 268, in start 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service self.conn.consume_in_threads() 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 294, in consume_in_threads 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service server.start() 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 268, in wrapper 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service log_after, timeout_timer) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 188, in run_once 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service post_fn = fn() 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 267, in <lambda> 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service states[state].run_once(lambda: fn(self, *args, **kwargs), 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/server.py", line 420, in start 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service self.listener = self._create_listener() 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 121, in _create_listener 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service return self.transport._listen(self._target, 1, None) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 112, in _listen 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service batch_timeout) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 481, in listen 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service callback=listener) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1132, in declare_topic_consumer 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service self.declare_consumer(consumer) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1040, in declare_consumer 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service error_callback=_connect_error) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 814, in ensure 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service ret, channel = autoretry_method() 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 436, in _ensured 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service return fun(*args, **kwargs) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 508, in __call__ 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service return fun(*args, channel=channels[0], **kwargs), channels[0] 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 790, in execute_method 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service method() 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1028, in _declare_consumer 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service consumer.declare(self) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 302, in declare 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service self.queue.declare() 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 522, in declare 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service self.queue_declare(nowait, passive=False) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 548, in queue_declare 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service nowait=nowait) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1258, in queue_declare 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service (50, 11), # Channel.queue_declare_ok 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 67, in wait 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service self.channel_id, allowed_methods) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 273, in _wait_method 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service self.wait() 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 69, in wait 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service return self.dispatch_method(method_sig, args, content) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 87, in dispatch_method 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service return amqp_method(self, args) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 529, in _close 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service (class_id, method_id), ConnectionError) 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service InternalError: (0, 0): (541) INTERNAL_ERROR 2016-11-29 11:08:51.723 4971 ERROR oslo_service.service [root@overcloud-controller-2 ~]# Version-Release number of selected component (if applicable): (automation) [stack@puma42 sts]$ neutron agent-list +--------------------------------------+--------------------+------------------------------------+-------------------+-------+----------------+---------------------------+ | id | agent_type | host | availability_zone | alive | admin_state_up | binary | +--------------------------------------+--------------------+------------------------------------+-------------------+-------+----------------+---------------------------+ | 0d08b50d-2ed6-48d1-8b91-70beeeb9a342 | Metadata agent | overcloud-controller-1.localdomain | | :-) | True | neutron-metadata-agent | | 405ec568-556b-41e1-ac1f-ac0c29d85873 | Open vSwitch agent | overcloud-controller-1.localdomain | | :-) | True | neutron-openvswitch-agent | | 515dfcd8-7aa1-412c-952a-04c5e20ca716 | L3 agent | overcloud-controller-2.localdomain | nova | :-) | True | neutron-l3-agent | | 54dd365c-40f3-4e0d-a78f-c1ad980afe3a | L3 agent | overcloud-controller-1.localdomain | nova | :-) | True | neutron-l3-agent | | 5a42d3fa-f4b2-4877-9f7c-351e11b0d6b4 | DHCP agent | overcloud-controller-2.localdomain | nova | xxx | True | neutron-dhcp-agent | | 61b33004-e2e4-4d7d-80a9-1b68d945219d | Open vSwitch agent | overcloud-controller-2.localdomain | | :-) | True | neutron-openvswitch-agent | | 7c190db3-515c-427a-a380-eb114f4cb9e6 | L3 agent | overcloud-controller-0.localdomain | nova | :-) | True | neutron-l3-agent | | 97972c85-b5e8-4d55-b424-611364d1b076 | DHCP agent | overcloud-controller-1.localdomain | nova | xxx | True | neutron-dhcp-agent | | bc2c0a48-c917-43fb-85ca-7dcfeccfe5f6 | Metadata agent | overcloud-controller-2.localdomain | | :-) | True | neutron-metadata-agent | | d6180df4-34f9-463e-8696-d2d131179607 | DHCP agent | overcloud-controller-0.localdomain | nova | :-) | True | neutron-dhcp-agent | | e329be8b-7f74-42c0-a852-f932b30b0d18 | Metadata agent | overcloud-controller-0.localdomain | | :-) | True | neutron-metadata-agent | | ee28b426-01dd-4f56-a6d3-4ae8ec4766d8 | Open vSwitch agent | overcloud-compute-0.localdomain | | :-) | True | neutron-openvswitch-agent | | f950772b-85f5-4e21-aa6c-ef74eab24317 | Open vSwitch agent | overcloud-controller-0.localdomain | | :-) | True | neutron-openvswitch-agent | +--------------------------------------+--------------------+------------------------------------+-------------------+-------+----------------+---------------------------+ rabbitmq log is attached. How reproducible: every time Steps to Reproduce: restart 3 controllers, wait for them to start. Additional info: OpenStack-10.0-RHEL-7 Puddle: 2016-11-29.1 Bare-metal setup, 3 controllers, 2 computes
Please provide an SOS report of a controller showing the issue.
Assaf Muller, NO. this setup is no longer exist and i couldn't manage to reproduce it in the the current puddle.
Please re-open if relevant.
Created attachment 1242601 [details] SOSreport Controller-1 openvswitch InternalError
(In reply to Assaf Muller from comment #4) > Please provide an SOS report of a controller showing the issue. I attached sosreport from controller with failed openvswitch-agent
(In reply to Marian Krcmarik from comment #9) > (In reply to Assaf Muller from comment #4) > > Please provide an SOS report of a controller showing the issue. > > I attached sosreport from controller with failed openvswitch-agent Can you please supply reproduction steps?
(In reply to Assaf Muller from comment #10) > (In reply to Marian Krcmarik from comment #9) > > (In reply to Assaf Muller from comment #4) > > > Please provide an SOS report of a controller showing the issue. > > > > I attached sosreport from controller with failed openvswitch-agent > > Can you please supply reproduction steps? https://bugzilla.redhat.com/show_bug.cgi?id=1401542#c0
The bug started to appear in our automation very frequently - reset of all networkers at once is enough to trigger it.
*** Bug 1465782 has been marked as a duplicate of this bug. ***
I've been looking into it and I think I found the root cause: 1. When OVS agent starts it tries to connect to AMQP server. 2. Since all services are starting and rabbit is still not ready, it fails to connect. 3. An uncaught exception occurs at that point which hits [0]. 4. The agent dies. Apparently, installed ryu package (4.3) has a bug [1] which was fixed in 4.4 [2] by introducing a semaphore in the close method. $ grep ryu ./dnucs002-controller-2.localdomain/installed-rpms python-ryu-4.3-2.el7ost.noarch Tue Feb 28 16:14:29 2017 The solution was to bump the required version to 4.4 [3] [0] https://github.com/openstack/neutron/blob/stable/newton/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py#L46 [1] https://bugs.launchpad.net/neutron/+bug/1589746 [2] https://github.com/osrg/ryu/commit/b0ab4f16028c452374c5a0f22bd970038194f142 [3] https://review.openstack.org/#/c/336799/
OSP10 now includes ryu 4.9 as of 2017-06-28 [1]. The solution would be upgrading to the latest OSP10 release. [1] https://rhn.redhat.com/errata/RHBA-2017-1587.html
According to our records, this should be resolved by python-ryu-4.9-2.1.el7ost. This build is available now.