Description of problem: A new bug introduced by BZ 889786 On a compute node, we should also make sure that init starts nova compute after it starts the cleanup utility after we reboot the node. Because the cleanup utility would delete the port that nova compute has plugged into a tap interface or a veth, if init starts it after init starts nova compute. As a result, the network traffic of the related instance will not reach br-int. We should make init starts nova compute after it starts the OVS cleanup utility, so that nova compute will replug the interface or the veth into the port that has been deleted by the cleanup utility. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1.create an instance on a compute node 2.stop quantum ovs agent 3.start the ovs cleanup utility Pretend that init starts the utility after it starts nova compute 4.start quantum ovs agent Actual results: Pinging the instance in the DHCP network namespace would fail. # sudo ip netns exec qdhcp-49e40fb9-c3d3-419f-a275-32ee4c47071e ping 10.0.0.6 Expected results: ping doesn't fail. Additional info:
So, the cleanup agent is already set at priority 97, which should run before nova-compute's priority of 98. Perhaps we should have the process block until the cleanup is finished? I'll look into it
@apevec: you were on the original bug's discussion. Is there a reason that we shouldn't make the init script block, since the process isn't actually a daemon?
Yes, cleanup init script should not exit before cleanup is done, otherwise you run into race conditions.
So, we should make it block? The wording in your respond was a bit unclear. Currently, the process is run with the `daemon` tool, which makes it run in the background in a non-blocking way. I'm thinking we just need to remove the `daemon` part from the init script: > daemon --user neutron $exec --log-file /var/log/$proj/ovs-cleanup.log --config-file /etc/$proj/$proj.conf --config-file $config &>/dev/null becomes > runuser -s /bin/bash $user -c "neutron $exec --log-file /var/log/$proj/ovs-cleanup.log --config-file /etc/$proj/$proj.conf --config-file $config &>/dev/null"
Sorry if unclear, I left implementation details to you :) Looking back to original BZ, looks like Garry switched to using "daemon" (first version of the script didn't have it) only for --user option. So replacing it with "runuser" is correct, just in your comment 5 it should be runuser -s /bin/bash neutron -c "$exec ..." Also note recent change in Havana Neutron packages added support for dist.conf http://pkgs.fedoraproject.org/cgit/openstack-neutron.git/commit/?h=el6-havana&id=afcc20396ac906b29dcd1ee2cb32138e4eab59ce where -c "$exec..." from above would break due to quoting from ${configs[@]/#/--config-file } + runuser -s /bin/bash neutron -c '/usr/bin/neutron-ovs-cleanup --log-file /var/log/neutron/ovs-cleanup.log --config-file /usr/share/neutron/neutron-dist.conf' '--config-file /etc/neutron/neutron.conf' '--config-file /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini &>/dev/null' runuser: unrecognized option '--config-file /etc/neutron/neutron.conf' BTW, this should be fixed for RHOS 3.0 too, so I'm adding 3.0z flag.
@oblaut: just FYI: the script priority here is correct; the issue is that the cleanup tool is being run as a daemon, which means the script exits before the cleanup is finished, meaning that nova can launch while the actual cleanup script is still running.
@oblaut: I think the bug may actually be a race condition. It on whether the daemonized version of the cleanup util finishes before the nova daemon starts, so reproducing it may be difficult, unfortunately.
I have followed the steps but VM has no traffic , OVS cleanup-logs 2013-10-31 09:11:17 INFO [quantum.common.config] Logging enabled! 2013-10-31 09:11:17 INFO [quantum.agent.ovs_cleanup_util] Cleaning br-int 2013-10-31 09:11:18 INFO [quantum.agent.ovs_cleanup_util] OVS cleanup completed successfully 2013-10-31 12:40:42 INFO [quantum.common.config] Logging enabled! 2013-10-31 12:40:43 INFO [quantum.agent.ovs_cleanup_util] Cleaning br-int 2013-10-31 12:40:45 INFO [quantum.agent.ovs_cleanup_util] Delete qvo2ed438a8-a5 2013-10-31 12:40:46 INFO [quantum.agent.ovs_cleanup_util] Delete qvo4f81529e-a9 2013-10-31 12:40:47 INFO [quantum.agent.ovs_cleanup_util] Delete qvoc2cdc2cc-2c 2013-10-31 12:40:47 INFO [quantum.agent.ovs_cleanup_util] OVS cleanup completed successfully 2013-10-31 12:52:36 INFO [quantum.common.config] Logging enabled! 2013-10-31 12:52:37 INFO [quantum.agent.ovs_cleanup_util] Cleaning br-int 2013-10-31 12:52:37 INFO [quantum.agent.ovs_cleanup_util] OVS cleanup completed successfully ~ OVS-logs 2013-10-31 11:03:14 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.17:5672 2013-10-31 11:03:15 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Port c2cdc2cc-2c42-43f5-9f3f-05702625746d added 2013-10-31 11:03:15 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Port c2cdc2cc-2c42-43f5-9f3f-05702625746d updated. Details: {u'admin_state_up': True, u'network_id': u'b7f821d3-b437-400e-b294-e43aa1330184', u'segmentation_id': 201, u'physical_network': u'inter-vlan', u'device': u'c2cdc2cc-2c42-43f5-9f3f-05702625746d', u'port_id': u'c2cdc2cc-2c42-43f5-9f3f-05702625746d', u'network_type': u'vlan'} 2013-10-31 11:03:15 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Assigning 1 as local vlan for net-id=b7f821d3-b437-400e-b294-e43aa1330184 2013-10-31 11:03:16 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Port 2ed438a8-a585-40d8-8df2-f89ccd610858 added 2013-10-31 11:03:16 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Port 2ed438a8-a585-40d8-8df2-f89ccd610858 updated. Details: {u'admin_state_up': True, u'network_id': u'a5fb3fe2-a1bb-439b-95c2-a69b844cc185', u'segmentation_id': 202, u'physical_network': u'inter-vlan', u'device': u'2ed438a8-a585-40d8-8df2-f89ccd610858', u'port_id': u'2ed438a8-a585-40d8-8df2-f89ccd610858', u'network_type': u'vlan'} 2013-10-31 11:03:16 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Assigning 2 as local vlan for net-id=a5fb3fe2-a1bb-439b-95c2-a69b844cc185 2013-10-31 11:03:16 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Port 4f81529e-a9a9-467a-b6a3-c18b8d86a7f8 added 2013-10-31 11:03:17 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Port 4f81529e-a9a9-467a-b6a3-c18b8d86a7f8 updated. Details: {u'admin_state_up': True, u'network_id': u'b7f821d3-b437-400e-b294-e43aa1330184', u'segmentation_id': 201, u'physical_network': u'inter-vlan', u'device': u'4f81529e-a9a9-467a-b6a3-c18b8d86a7f8', u'port_id': u'4f81529e-a9a9-467a-b6a3-c18b8d86a7f8', u'network_type': u'vlan'} 2013-10-31 11:40:18 INFO [quantum.agent.securitygroups_rpc] Security group member updated [u'9e9f2c59-b3bf-4d1e-be0b-135f9f944e4e'] 2013-10-31 11:40:18 INFO [quantum.agent.securitygroups_rpc] Refresh firewall rules 2013-10-31 11:40:18 INFO [quantum.agent.securitygroups_rpc] Provider rule updated 2013-10-31 11:40:18 INFO [quantum.agent.securitygroups_rpc] Refresh firewall rules 2013-10-31 11:52:00 INFO [quantum.agent.securitygroups_rpc] Security group member updated [u'9e9f2c59-b3bf-4d1e-be0b-135f9f944e4e'] 2013-10-31 11:52:00 INFO [quantum.agent.securitygroups_rpc] Refresh firewall rules 2013-10-31 11:52:00 INFO [quantum.agent.securitygroups_rpc] Security group member updated [u'9e9f2c59-b3bf-4d1e-be0b-135f9f944e4e'] 2013-10-31 11:52:00 INFO [quantum.agent.securitygroups_rpc] Refresh firewall rules 2013-10-31 12:31:35 INFO [quantum.agent.securitygroups_rpc] Security group rule updated [u'9e9f2c59-b3bf-4d1e-be0b-135f9f944e4e'] 2013-10-31 12:31:35 INFO [quantum.agent.securitygroups_rpc] Refresh firewall rules 2013-10-31 12:41:07 INFO [quantum.common.config] Logging enabled! 2013-10-31 12:41:08 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Mapping physical network inter-vlan to bridge br-eth3 2013-10-31 12:41:09 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.17:5672 2013-10-31 12:41:10 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Agent initialized successfully, now running... 2013-10-31 12:41:10 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Agent out of sync with plugin! 2013-10-31 12:41:10 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.17:5672 2013-10-31 12:54:30 INFO [quantum.common.config] Logging enabled! 2013-10-31 12:54:30 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Mapping physical network inter-vlan to bridge br-eth3 2013-10-31 12:54:32 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.17:5672 2013-10-31 12:54:32 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Agent initialized successfully, now running... 2013-10-31 12:54:32 INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Agent out of sync with plugin! 2013-10-31 12:54:32 INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.17:5672
wrong bug , meant to update https://bugzilla.redhat.com/show_bug.cgi?id=1022578
Tested - openstack-neutron-2013.2-5.el6ost.noarch The reproduction steps are 1. stop ovs-agent 2. stop nova-compute 3. run cleanup 4. start ovs-agent 5. start nova-compute Only after restarting nova compute the interfaces are back before the cleanup [root@puma34 ~]# ifconfig | grep a67 qbra67f01b4-4d Link encap:Ethernet HWaddr BA:79:83:BB:F0:52 qvba67f01b4-4d Link encap:Ethernet HWaddr BA:79:83:BB:F0:52 qvoa67f01b4-4d Link encap:Ethernet HWaddr A2:C6:ED:E0:E3:EC tapa67f01b4-4d Link encap:Ethernet HWaddr FE:16:3E:41:81:8E after the clean up [root@puma34 ~]# ifconfig | grep a67 qbra67f01b4-4d Link encap:Ethernet HWaddr FE:16:3E:41:81:8E tapa67f01b4-4d Link encap:Ethernet HWaddr FE:16:3E:41:81:8E [root@puma34 ~]# service neutron-openvswitch-agent start Starting neutron-openvswitch-agent: [ OK ] [root@puma34 ~]# ifconfig | grep a67 qbra67f01b4-4d Link encap:Ethernet HWaddr FE:16:3E:41:81:8E tapa67f01b4-4d Link encap:Ethernet HWaddr FE:16:3E:41:81:8E [root@puma34 ~]# service openstack-nova-compute start Starting openstack-nova-compute: [ OK ] [root@puma34 ~]# ifconfig | grep a67 qbra67f01b4-4d Link encap:Ethernet HWaddr 2A:94:86:48:2A:28 qvba67f01b4-4d Link encap:Ethernet HWaddr 2A:94:86:48:2A:28 qvoa67f01b4-4d Link encap:Ethernet HWaddr E6:24:4F:AE:B9:0A tapa67f01b4-4d Link encap:Ethernet HWaddr FE:16:3E:41:81:8E
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2013-1859.html