Description of problem: The DHCP server starts after quantum-server in the startup sequence. This results in an error in the logs, and a nonfunctional DHCP setup. Booting any instances results in timeouts on the dhcp discover operation with errors in the dhcp agent's log. The root cause is DHCP agent expecting quantum-server to already be running and not re-attempting (or doing so incorrectly) to connect to the quantum server over amqp). We had a similar problem in heat long ago :) Version-Release number of selected component (if applicable): Puddle 2013.5.31.1 How reproducible: 100% Steps to Reproduce: 1. install RHEL6.4 in VM - follow quantum install directions 2. when the boot vm step is completed, open virtual machine manager 3. login to the cirros console as root pw cubswin:) 4. try to ifup/ifdown eth0 - it will not be able to obtain an IP address 5. service quantum-dhcp-agent stop 6. service quantum-dhcp-agent start 7. start new vm 8. open the new vm in the console 9. it will have an ip address Actual results: No DHCP address obtained by instance Expected results: DHCP address obtained by instance Additional info: The root cause of the problem is that the quantum-server starts in the init sequence after the dhcp agent - eg: lrwxrwxrwx. 1 root root 29 Jun 3 15:30 /etc/rc.d/rc3.d/S97quantum-ovs-cleanup -> ../init.d/quantum-ovs-cleanup lrwxrwxrwx. 1 root root 28 Jun 3 15:21 /etc/rc.d/rc3.d/S98quantum-dhcp-agent -> ../init.d/quantum-dhcp-agent lrwxrwxrwx. 1 root root 26 Jun 3 15:21 /etc/rc.d/rc3.d/S98quantum-l3-agent -> ../init.d/quantum-l3-agent lrwxrwxrwx. 1 root root 32 Jun 3 15:21 /etc/rc.d/rc3.d/S98quantum-metadata-agent -> ../init.d/quantum-metadata-agent lrwxrwxrwx. 1 root root 35 Jun 3 15:21 /etc/rc.d/rc3.d/S98quantum-openvswitch-agent -> ../init.d/quantum-openvswitch-agent lrwxrwxrwx. 1 root root 24 Jun 3 22:01 /etc/rc.d/rc3.d/S98quantum-server -> ../init.d/quantum-server init scripts are started in order as read with readdir() or friends. A very simple solution which I tested was to change the init level of quantum-server to 97 and quantum-ovs-cleanup to 96. This resolves the problem. Logs: 2013-06-03 22:04:16 ERROR [quantum.agent.dhcp_agent] Failed reporting state! Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/quantum/agent/dhcp_agent.py", line 700, in _report_state self.agent_state) File "/usr/lib/python2.6/site-packages/quantum/agent/rpc.py", line 66, in report_state topic=self.topic) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/proxy.py", line 80, in call return rpc.call(context, self._get_topic(topic), msg, timeout) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/__init__.py", line 140, in call return _get_impl().call(CONF, context, topic, msg, timeout) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 611, in call rpc_amqp.get_connection_pool(conf, Connection)) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py", line 613, in call rv = list(rv) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py", line 555, in __iter__ self.done() File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__ self.gen.next() File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py", line 552, in __iter__ self._iterator.next() File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 436, in iterconsume yield self.ensure(_error_callback, _consume) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 380, in ensure error_callback(e) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 421, in _error_callback raise rpc_common.Timeout() Timeout: Timeout while waiting on RPC response. 2013-06-03 22:04:16 WARNING [quantum.openstack.common.loopingcall] task run outlasted interval by 56.074657 sec 2013-06-03 22:04:16 INFO [quantum.agent.dhcp_agent] Synchronizing state 2013-06-03 22:04:21 WARNING [quantum.openstack.common.loopingcall] task run outlasted interval by 0.879631 sec With the solution in place, no such backtrace occurs and dhcp operates as expected
diff --git a/quantum-ovs-cleanup.init b/quantum-ovs-cleanup.init index 88f47a5..c150afe 100644 --- a/quantum-ovs-cleanup.init +++ b/quantum-ovs-cleanup.init @@ -2,7 +2,7 @@ # # quantum-ovs-cleanup OpenStack Open vSwitch cleanup utility # -# chkconfig: - 97 02 +# chkconfig: - 96 02 # description: Purge Open vSwitch of the Quantum devices ### END INIT INFO diff --git a/quantum-server.init b/quantum-server.init index 6a33b94..4666bde 100644 --- a/quantum-server.init +++ b/quantum-server.init @@ -2,7 +2,7 @@ # # quantum OpenStack Software Defined Networking Service # -# chkconfig: - 98 02 +# chkconfig: - 97 02 # description: quantum provides an API to \ # * request and configure virtual networks ### END INIT INFO
openstack-quantum-2013.1.1-9.el6ost.noarch couldn't recreate the problem allinone setup cirros image 1. ifdown 2. ifup result: ip is obtained (~1min time)
Tested on grizzly running on rhel6.4 with openstack-quantum-2013.1.1-10.el6ost (puddle 2013-06-05.2), OpenStack Networking with OVS, it looks like the fix failed, probably because there is no quantum-ovs-clean in the machine. The env includes 2 machines, one of the is a compute node, the second is quantum-server+l3+dhcp+all other services, installed with packstack. The quantum server still runs before the quantum-dhcp-agent and the error still appears in the dhcp log file. # ll /etc/rc.d/rc`runlevel | awk '{print $NF}'`.d/ | grep quantum lrwxrwxrwx. 1 root root 24 Jun 7 19:38 S97quantum-server -> ../init.d/quantum-server lrwxrwxrwx. 1 root root 28 Jun 7 19:37 S98quantum-dhcp-agent -> ../init.d/quantum-dhcp-agent lrwxrwxrwx. 1 root root 26 Jun 7 19:37 S98quantum-l3-agent -> ../init.d/quantum-l3-agent lrwxrwxrwx. 1 root root 32 Jun 7 19:37 S98quantum-metadata-agent -> ../init.d/quantum-metadata-agent lrwxrwxrwx. 1 root root 35 Jun 7 19:38 S98quantum-openvswitch-agent -> ../init.d/quantum-openvswitch-agent 2013-06-07 19:37:41 INFO [quantum.common.config] Logging enabled! 2013-06-07 19:37:42 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.23:5672 2013-06-07 19:37:42 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.23:5672 2013-06-07 19:37:42 INFO [quantum.agent.dhcp_agent] DHCP agent started 2013-06-07 19:38:42 ERROR [quantum.agent.dhcp_agent] Failed reporting state! Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/quantum/agent/dhcp_agent.py", line 700, in _report_state self.agent_state) File "/usr/lib/python2.6/site-packages/quantum/agent/rpc.py", line 66, in report_state topic=self.topic) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/proxy.py", line 80, in call return rpc.call(context, self._get_topic(topic), msg, timeout) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/__init__.py", line 140, in call return _get_impl().call(CONF, context, topic, msg, timeout) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 611, in call rpc_amqp.get_connection_pool(conf, Connection)) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py", line 613, in call rv = list(rv) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py", line 555, in __iter__ self.done() File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__ self.gen.next() File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py", line 552, in __iter__ self._iterator.next() File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 436, in iterconsume yield self.ensure(_error_callback, _consume) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 380, in ensure error_callback(e) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 421, in _error_callback raise rpc_common.Timeout() Timeout: Timeout while waiting on RPC response. 2013-06-07 19:38:42 WARNING [quantum.openstack.common.loopingcall] task run outlasted interval by 56.10093 sec 2013-06-07 19:38:42 INFO [quantum.agent.dhcp_agent] Synchronizing state 2013-06-07 19:40:31 INFO [quantum.openstack.common.service] Caught SIGTERM, exiting 2013-06-07 19:40:31 INFO [quantum.common.config] Logging enabled! 2013-06-07 19:40:31 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.23:5672 2013-06-07 19:40:31 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.23:5672 2013-06-07 19:40:31 INFO [quantum.agent.dhcp_agent] DHCP agent started 2013-06-07 19:40:31 INFO [quantum.agent.dhcp_agent] Synchronizing state 2013-06-07 19:41:39 WARNING [quantum.openstack.common.loopingcall] task run outlasted interval by 0.126473 sec 2013-06-07 20:31:48 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.23:5672
btw, the same happens in puddle 2013-06-07.1 where quantum-ovs-cleanup is installed, the same error message appears in /var/log/quantum/dhcp-agent.log. lrwxrwxrwx. 1 root root 29 Jun 8 15:51 S96quantum-ovs-cleanup -> ../init.d/quantum-ovs-cleanup lrwxrwxrwx. 1 root root 24 Jun 8 15:51 S97quantum-server -> ../init.d/quantum-server lrwxrwxrwx. 1 root root 28 Jun 8 15:50 S98quantum-dhcp-agent -> ../init.d/quantum-dhcp-agent lrwxrwxrwx. 1 root root 26 Jun 8 15:50 S98quantum-l3-agent -> ../init.d/quantum-l3-agent lrwxrwxrwx. 1 root root 32 Jun 8 15:50 S98quantum-metadata-agent -> ../init.d/quantum-metadata-agent lrwxrwxrwx. 1 root root 35 Jun 8 15:51 S98quantum-openvswitch-agent -> ../init.d/quantum-openvswitch-agent 2013-06-08 15:50:51 INFO [quantum.common.config] Logging enabled! 2013-06-08 15:50:51 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.23:5672 2013-06-08 15:50:51 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.23:5672 2013-06-08 15:50:51 INFO [quantum.agent.dhcp_agent] DHCP agent started 2013-06-08 15:51:51 ERROR [quantum.agent.dhcp_agent] Failed reporting state! Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/quantum/agent/dhcp_agent.py", line 700, in _report_state self.agent_state) File "/usr/lib/python2.6/site-packages/quantum/agent/rpc.py", line 66, in report_state topic=self.topic) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/proxy.py", line 80, in call return rpc.call(context, self._get_topic(topic), msg, timeout) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/__init__.py", line 140, in call return _get_impl().call(CONF, context, topic, msg, timeout) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 611, in call rpc_amqp.get_connection_pool(conf, Connection)) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py", line 613, in call rv = list(rv) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py", line 555, in __iter__ self.done() File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__ self.gen.next() File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/amqp.py", line 552, in __iter__ self._iterator.next() File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 436, in iterconsume yield self.ensure(_error_callback, _consume) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 380, in ensure error_callback(e) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 421, in _error_callback raise rpc_common.Timeout() Timeout: Timeout while waiting on RPC response. 2013-06-08 15:51:51 WARNING [quantum.openstack.common.loopingcall] task run outlasted interval by 56.099052 sec 2013-06-08 15:51:51 INFO [quantum.agent.dhcp_agent] Synchronizing state 2013-06-08 15:55:40 INFO [quantum.openstack.common.service] Caught SIGTERM, exiting 2013-06-08 15:55:40 INFO [quantum.common.config] Logging enabled! 2013-06-08 15:55:40 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.23:5672 2013-06-08 15:55:40 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.23:5672 2013-06-08 15:55:40 INFO [quantum.agent.dhcp_agent] DHCP agent started 2013-06-08 15:55:40 INFO [quantum.agent.dhcp_agent] Synchronizing state 2013-06-08 15:56:51 WARNING [quantum.openstack.common.loopingcall] task run outlasted interval by 2.732581 sec
Rami, The main problem is not the error printing, it was an unability to obtain a DHCP address in all in one mode. Since your testing multiple nodes, can you check that you can (or cannot) obtain a DHCP address?
(In reply to Steven Dake from comment #6) > Rami, > > The main problem is not the error printing, it was an unability to obtain a is this traceback is part of this bug? maybe we should seperate into two bugs? > DHCP address in all in one mode. Since your testing multiple nodes, can you > check that you can (or cannot) obtain a DHCP address? In multiple nodes I do obtain a dhcp address.
The trace indicates that the DHCP agent is unable to communicate with the Quantum plugin. This may happen when the plugin is restarted or there is a network error. The internals of the qpid messaging layer have retries until there is no answer and an exception is thrown. ".... File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 380, in ensure error_callback(e) File "/usr/lib/python2.6/site-packages/quantum/openstack/common/rpc/impl_qpid.py", line 421, in _error_callback raise rpc_common.Timeout() Timeout: Timeout while waiting on RPC response." This is not related to the issue reported.
Moving this bug to verified per comments 3 & 8.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0933.html