Bug 1010941 - neutron-ovs-cleanup delete the port that nova compute has plugged into an tap interface
Summary: neutron-ovs-cleanup delete the port that nova compute has plugged into an tap...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: beta
: 4.0
Assignee: Solly Ross
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks: 1022578
TreeView+ depends on / blocked
 
Reported: 2013-09-23 10:38 UTC by Jian Wen
Modified: 2019-09-09 17:15 UTC (History)
14 users (show)

Fixed In Version: openstack-neutron-2013.2-3.el6
Doc Type: Bug Fix
Doc Text:
The asynchronous operation of service startup resulted in neutron-ovs-cleanup finishing its run after nova-compute startup. Consequently, devices necessary for proper function were deleted. With this fix, neutron-ovs-cleanup now blocks while it runs. This ensures there is no interference with nova-compute port creation.
Clone Of:
: 1022578 (view as bug list)
Environment:
Last Closed: 2013-12-20 00:24:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2013:1859 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory 2013-12-21 00:01:48 UTC

Description Jian Wen 2013-09-23 10:38:44 UTC
Description of problem:
A new bug introduced by BZ 889786

On a compute node, we should also make sure that init starts
nova compute after it starts the cleanup utility after
we reboot the node.
Because the cleanup utility would delete the port that nova
compute has plugged into a tap interface or a veth, if init
starts it after init starts nova compute.
As a result, the network traffic of the related instance will
not reach br-int.

We should make init starts nova compute after it starts the OVS
cleanup utility, so that nova compute will replug the interface
or the veth into the port that has been deleted by the cleanup
utility.


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1.create an instance on a compute node
2.stop quantum ovs agent 
3.start the ovs cleanup utility
  Pretend that init starts the utility after it starts nova
  compute
4.start quantum ovs agent

Actual results:
Pinging the instance in the DHCP network namespace would fail.
# sudo ip netns exec qdhcp-49e40fb9-c3d3-419f-a275-32ee4c47071e ping 10.0.0.6

Expected results:
ping doesn't fail.

Additional info:

Comment 2 Solly Ross 2013-10-16 17:58:41 UTC
So, the cleanup agent is already set at priority 97, which should run before nova-compute's priority of 98.  Perhaps we should have the process block until the cleanup is finished?  I'll look into it

Comment 3 Solly Ross 2013-10-16 18:06:06 UTC
@apevec: you were on the original bug's discussion.  Is there a reason that we shouldn't make the init script block, since the process isn't actually a daemon?

Comment 4 Alan Pevec 2013-10-16 18:51:59 UTC
Yes, cleanup init script should not exit before cleanup is done, otherwise you run into race conditions.

Comment 5 Solly Ross 2013-10-16 20:35:09 UTC
So, we should make it block?  The wording in your respond was a bit unclear.  Currently, the process is run with the `daemon` tool, which makes it run in the background in a non-blocking way.  I'm thinking we just need to remove the `daemon` part from the init script: 

> daemon --user neutron $exec --log-file /var/log/$proj/ovs-cleanup.log --config-file /etc/$proj/$proj.conf --config-file $config &>/dev/null

becomes

> runuser -s /bin/bash $user -c "neutron $exec --log-file /var/log/$proj/ovs-cleanup.log --config-file /etc/$proj/$proj.conf --config-file $config &>/dev/null"

Comment 6 Alan Pevec 2013-10-17 10:18:33 UTC
Sorry if unclear, I left implementation details to  you :)
Looking back to original BZ, looks like Garry switched to using "daemon" (first version of the script didn't have it) only for --user option.
So replacing it with "runuser" is correct, just in your comment 5 it should be runuser -s /bin/bash neutron -c "$exec ..."
Also note recent change in Havana Neutron packages added support for dist.conf
http://pkgs.fedoraproject.org/cgit/openstack-neutron.git/commit/?h=el6-havana&id=afcc20396ac906b29dcd1ee2cb32138e4eab59ce
where -c "$exec..." from above would break due to quoting from ${configs[@]/#/--config-file }

+ runuser -s /bin/bash neutron -c '/usr/bin/neutron-ovs-cleanup --log-file /var/log/neutron/ovs-cleanup.log --config-file /usr/share/neutron/neutron-dist.conf' '--config-file /etc/neutron/neutron.conf' '--config-file /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini &>/dev/null'
runuser: unrecognized option '--config-file /etc/neutron/neutron.conf'

BTW, this should be fixed for RHOS 3.0 too, so I'm adding 3.0z flag.

Comment 12 Solly Ross 2013-10-22 19:05:33 UTC
@oblaut: just FYI: the script priority here is correct; the issue is that the cleanup tool is being run as a daemon, which means the script exits before the cleanup is finished, meaning that nova can launch while the actual cleanup script is still running.

Comment 16 Solly Ross 2013-10-25 21:56:06 UTC
@oblaut: I think the bug may actually be a race condition.  It on whether the daemonized version of the cleanup util finishes before the nova daemon starts, so reproducing it may be difficult, unfortunately.

Comment 18 Ofer Blaut 2013-10-31 14:07:04 UTC
I have followed the steps but VM has no traffic , 
OVS 

cleanup-logs
2013-10-31 09:11:17     INFO [quantum.common.config] Logging enabled!
2013-10-31 09:11:17     INFO [quantum.agent.ovs_cleanup_util] Cleaning br-int
2013-10-31 09:11:18     INFO [quantum.agent.ovs_cleanup_util] OVS cleanup completed successfully
2013-10-31 12:40:42     INFO [quantum.common.config] Logging enabled!
2013-10-31 12:40:43     INFO [quantum.agent.ovs_cleanup_util] Cleaning br-int
2013-10-31 12:40:45     INFO [quantum.agent.ovs_cleanup_util] Delete qvo2ed438a8-a5
2013-10-31 12:40:46     INFO [quantum.agent.ovs_cleanup_util] Delete qvo4f81529e-a9
2013-10-31 12:40:47     INFO [quantum.agent.ovs_cleanup_util] Delete qvoc2cdc2cc-2c
2013-10-31 12:40:47     INFO [quantum.agent.ovs_cleanup_util] OVS cleanup completed successfully
2013-10-31 12:52:36     INFO [quantum.common.config] Logging enabled!
2013-10-31 12:52:37     INFO [quantum.agent.ovs_cleanup_util] Cleaning br-int
2013-10-31 12:52:37     INFO [quantum.agent.ovs_cleanup_util] OVS cleanup completed successfully
~                                                                                                    
OVS-logs

2013-10-31 11:03:14     INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.17:5672
2013-10-31 11:03:15     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Port c2cdc2cc-2c42-43f5-9f3f-05702625746d added
2013-10-31 11:03:15     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Port c2cdc2cc-2c42-43f5-9f3f-05702625746d updated. Details: {u'admin_state_up': True, u'network_id': u'b7f821d3-b437-400e-b294-e43aa1330184', u'segmentation_id': 201, u'physical_network': u'inter-vlan', u'device': u'c2cdc2cc-2c42-43f5-9f3f-05702625746d', u'port_id': u'c2cdc2cc-2c42-43f5-9f3f-05702625746d', u'network_type': u'vlan'}
2013-10-31 11:03:15     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Assigning 1 as local vlan for net-id=b7f821d3-b437-400e-b294-e43aa1330184
2013-10-31 11:03:16     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Port 2ed438a8-a585-40d8-8df2-f89ccd610858 added
2013-10-31 11:03:16     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Port 2ed438a8-a585-40d8-8df2-f89ccd610858 updated. Details: {u'admin_state_up': True, u'network_id': u'a5fb3fe2-a1bb-439b-95c2-a69b844cc185', u'segmentation_id': 202, u'physical_network': u'inter-vlan', u'device': u'2ed438a8-a585-40d8-8df2-f89ccd610858', u'port_id': u'2ed438a8-a585-40d8-8df2-f89ccd610858', u'network_type': u'vlan'}
2013-10-31 11:03:16     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Assigning 2 as local vlan for net-id=a5fb3fe2-a1bb-439b-95c2-a69b844cc185
2013-10-31 11:03:16     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Port 4f81529e-a9a9-467a-b6a3-c18b8d86a7f8 added
2013-10-31 11:03:17     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Port 4f81529e-a9a9-467a-b6a3-c18b8d86a7f8 updated. Details: {u'admin_state_up': True, u'network_id': u'b7f821d3-b437-400e-b294-e43aa1330184', u'segmentation_id': 201, u'physical_network': u'inter-vlan', u'device': u'4f81529e-a9a9-467a-b6a3-c18b8d86a7f8', u'port_id': u'4f81529e-a9a9-467a-b6a3-c18b8d86a7f8', u'network_type': u'vlan'}
2013-10-31 11:40:18     INFO [quantum.agent.securitygroups_rpc] Security group member updated [u'9e9f2c59-b3bf-4d1e-be0b-135f9f944e4e']
2013-10-31 11:40:18     INFO [quantum.agent.securitygroups_rpc] Refresh firewall rules
2013-10-31 11:40:18     INFO [quantum.agent.securitygroups_rpc] Provider rule updated
2013-10-31 11:40:18     INFO [quantum.agent.securitygroups_rpc] Refresh firewall rules
2013-10-31 11:52:00     INFO [quantum.agent.securitygroups_rpc] Security group member updated [u'9e9f2c59-b3bf-4d1e-be0b-135f9f944e4e']
2013-10-31 11:52:00     INFO [quantum.agent.securitygroups_rpc] Refresh firewall rules
2013-10-31 11:52:00     INFO [quantum.agent.securitygroups_rpc] Security group member updated [u'9e9f2c59-b3bf-4d1e-be0b-135f9f944e4e']
2013-10-31 11:52:00     INFO [quantum.agent.securitygroups_rpc] Refresh firewall rules
2013-10-31 12:31:35     INFO [quantum.agent.securitygroups_rpc] Security group rule updated [u'9e9f2c59-b3bf-4d1e-be0b-135f9f944e4e']
2013-10-31 12:31:35     INFO [quantum.agent.securitygroups_rpc] Refresh firewall rules
2013-10-31 12:41:07     INFO [quantum.common.config] Logging enabled!
2013-10-31 12:41:08     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Mapping physical network inter-vlan to bridge br-eth3
2013-10-31 12:41:09     INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.17:5672
2013-10-31 12:41:10     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Agent initialized successfully, now running...
2013-10-31 12:41:10     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Agent out of sync with plugin!
2013-10-31 12:41:10     INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.17:5672
2013-10-31 12:54:30     INFO [quantum.common.config] Logging enabled!
2013-10-31 12:54:30     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Mapping physical network inter-vlan to bridge br-eth3
2013-10-31 12:54:32     INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.17:5672
2013-10-31 12:54:32     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Agent initialized successfully, now running...
2013-10-31 12:54:32     INFO [quantum.plugins.openvswitch.agent.ovs_quantum_agent] Agent out of sync with plugin!
2013-10-31 12:54:32     INFO [quantum.openstack.common.rpc.impl_qpid] Connected to AMQP server on 10.35.160.17:5672

Comment 19 Ofer Blaut 2013-10-31 14:08:44 UTC
wrong bug , meant to update https://bugzilla.redhat.com/show_bug.cgi?id=1022578

Comment 20 Ofer Blaut 2013-11-19 19:18:56 UTC
Tested - openstack-neutron-2013.2-5.el6ost.noarch

The reproduction steps are

1. stop ovs-agent
2. stop nova-compute
3. run cleanup
4. start ovs-agent
5. start nova-compute

Only after restarting nova compute the interfaces are back


before the cleanup 
[root@puma34 ~]# ifconfig | grep a67
qbra67f01b4-4d Link encap:Ethernet  HWaddr BA:79:83:BB:F0:52  
qvba67f01b4-4d Link encap:Ethernet  HWaddr BA:79:83:BB:F0:52  
qvoa67f01b4-4d Link encap:Ethernet  HWaddr A2:C6:ED:E0:E3:EC  
tapa67f01b4-4d Link encap:Ethernet  HWaddr FE:16:3E:41:81:8E  

after the clean up 
[root@puma34 ~]# ifconfig | grep a67
qbra67f01b4-4d Link encap:Ethernet  HWaddr FE:16:3E:41:81:8E  
tapa67f01b4-4d Link encap:Ethernet  HWaddr FE:16:3E:41:81:8E 

[root@puma34 ~]# service neutron-openvswitch-agent start
Starting neutron-openvswitch-agent:                        [  OK  ]
[root@puma34 ~]# ifconfig | grep a67
qbra67f01b4-4d Link encap:Ethernet  HWaddr FE:16:3E:41:81:8E  
tapa67f01b4-4d Link encap:Ethernet  HWaddr FE:16:3E:41:81:8E  
[root@puma34 ~]# service openstack-nova-compute start
Starting openstack-nova-compute:                           [  OK  ]
[root@puma34 ~]# ifconfig | grep a67
qbra67f01b4-4d Link encap:Ethernet  HWaddr 2A:94:86:48:2A:28  
qvba67f01b4-4d Link encap:Ethernet  HWaddr 2A:94:86:48:2A:28  
qvoa67f01b4-4d Link encap:Ethernet  HWaddr E6:24:4F:AE:B9:0A  
tapa67f01b4-4d Link encap:Ethernet  HWaddr FE:16:3E:41:81:8E

Comment 23 errata-xmlrpc 2013-12-20 00:24:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html


Note You need to log in before you can comment on or make changes to this bug.