Bug 1300280

Summary: Network creation fails after openstack-service restart neutron
Product: Red Hat OpenStack Reporter: Itzik Brown <itbrown>
Component: openstack-neutronAssignee: Terry Wilson <twilson>
Status: CLOSED WORKSFORME QA Contact: Toni Freger <tfreger>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: amuller, chrisw, fpercoco, itbrown, nyechiel, yeylon, zhunatuzi
Target Milestone: ---   
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-30 09:10:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Part of Neutron server log none

Description Itzik Brown 2016-01-20 12:07:08 UTC
Created attachment 1116630 [details]
Part of Neutron server log

Description of problem:
After using openstack-service start neutron (or restart) I can't create a network and see oslo messaging errors in the neutron log.

The following services are running on the controller:
# openstack-service status neutron
MainPID=72131 Id=neutron-dhcp-agent.service ActiveState=active
MainPID=72132 Id=neutron-l3-agent.service ActiveState=active
MainPID=72133 Id=neutron-lbaas-agent.service ActiveState=active
MainPID=72134 Id=neutron-metadata-agent.service ActiveState=active
MainPID=72135 Id=neutron-openvswitch-agent.service ActiveState=active
MainPID=72136 Id=neutron-server.service ActiveState=active


Version-Release number of selected component (if applicable):
RHEL7.2
Installed using packstack
openstack-neutron-7.0.1-6.el7ost.noarch
python-neutron-7.0.1-6.el7ost.noarch
python-oslo-messaging-2.5.0-1.el7ost.noarch

How reproducible:


Steps to Reproduce:
1. On the controller run openstack-service stop neutron
2. Run openstack-service start neutron
3. Verify that running the command neutron net-create hangs

Actual results:


Expected results:


Additional info:

Comment 2 Flavio Percoco 2016-01-20 12:43:18 UTC
Looking at the logs, the most evident error is:

2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher [req-e40ae200-49c6-43c4-b801-d6608292dd8e - - - - -] Exception during message handling: Received 0x01 while expecting 0xce
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     executor_callback))
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 101, in reply
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     self._send_reply(conn, reply, failure, log_failure=log_failure)
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 73, in _send_reply
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     conn.direct_send(self.reply_q, rpc_common.serialize_msg(msg))
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1105, in direct_send
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     exchange, msg, routing_key=msg_id)
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 966, in _ensure_publishing
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     self.ensure(method, retry=retry, error_callback=_error_callback)
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 688, in ensure
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     ret, channel = autoretry_method()
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 436, in _ensured
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     return fun(*args, **kwargs)
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 508, in __call__
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     return fun(*args, channel=channels[0], **kwargs), channels[0]
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 664, in execute_method
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     method()
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1066, in _publish_and_retry_on_missing_exchange
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     timeout=timeout)
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 999, in _publish
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     producer.publish(msg, expiration=expiration)
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 172, in publish
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     routing_key, mandatory, immediate, exchange, declare)
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 188, in _publish
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     mandatory=mandatory, immediate=immediate,
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 2130, in basic_publish_confirm
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     self.wait([(60, 80)])
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 67, in wait
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     self.channel_id, allowed_methods)
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 240, in _wait_method
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     self.method_reader.read_method()
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher     raise m
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher UnexpectedFrame: Received 0x01 while expecting 0xce
2016-01-20 14:09:44.666 30639 ERROR oslo_messaging.rpc.dispatcher 
2016-01-20 14:09:44.670 30639 ERROR oslo_messaging._drivers.common [req-e40ae200-49c6-43c4-b801-d6608292dd8e - - - - -] Returning exception Received 0x01 while expecting 0xce to caller


The above is normally caused when the connection to the broker is shared by multiple threads and processes. In fact, looking at the logs (before this error) there are warnings printed by oslo.messaging suggesting the process had been forked:

2016-01-20 14:07:43.868 30625 WARNING oslo.messaging._drivers.impl_rabbit [req-e40ae200-49c6-43c4-b801-d6608292dd8e - - - - -] Process forked after connection established! This can result in unpredictable behavio
r. See: http://docs.openstack.org/developer/oslo_messaging/transport.html


This fork seems to happen several times. Any chance to get neutron folks help with figuring out where this is happening in the code?

Comment 3 Assaf Muller 2016-01-20 13:43:04 UTC
Assigned to Terry for further investigation.

Comment 4 Terry Wilson 2016-03-29 19:59:12 UTC
I haven't been able to reproduce this issue yet after many restarts (though with openstack-neutron-7.0.1-14 and phython-oslo-messaging-2.5.0-5). How reproducible is the issue on your side? Do you still have the issue with the latest build? If so, could you post your neutron configs and packstack answer file?

Comment 5 Itzik Brown 2016-03-30 09:10:32 UTC
Checked with python-neutron-7.0.1-14.el7ost.noarch and it doesn't happen.

Comment 6 Juno 2017-04-05 03:44:52 UTC
Can you tell me how you fix this bug? I find this error in openstack mitaka serveral times.

Comment 7 Itzik Brown 2017-04-05 06:03:28 UTC
(In reply to Juno from comment #6)
> Can you tell me how you fix this bug? I find this error in openstack mitaka
> serveral times.

What is the version of python-neutron? If it happens please open a bug.