Bug 1120719 - Restarting openvswitch service destroys connectivity with l2pop enabled
Summary: Restarting openvswitch service destroys connectivity with l2pop enabled
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 5.0 (RHEL 7)
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: ---
Assignee: Jakub Libosvar
QA Contact: Toni Freger
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-17 14:44 UTC by Assaf Muller
Modified: 2016-04-27 05:28 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-13 21:56:20 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
OpenStack gerrit 107409 None None None Never
Launchpad 1332450 None None None Never

Description Assaf Muller 2014-07-17 14:44:15 UTC
Description of problem:
Link to upstream discussion:
http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg29606.html

User reports that restarting the OVS agent or the openvswitch service (Unclear if the issue happens when restarting one or the other) causes tunnels to not be formed correctly.

This happens only with l2pop enabled.

It is unclear if the issue manifests in RHOS 5 or is this a recent master regression.

Comment 1 Nir Yechiel 2014-07-17 15:19:08 UTC
Nir, can you please try to reproduce this issue against OSP 5? This might be a blocker as l2pop is our default configuration on Packstack & Staypuft.

Thanks,
Nir

Comment 5 Nir Magnezi 2014-07-20 09:06:47 UTC
amuller and myself tested this.

Assaf will address the first issue the user mentioned.

The second issue, quote: "If the openvswitch restarted, all flows will be lost, including all l2pop flows, the agent is unable to fetch or recreate the l2pop flows"

We have found that restarting the openvswitch agent kills the tunnel devices.
Yet, restarting neutron-openvswitch-agent restored those tunnel devices.

Terry, perhaps this issue can be resolved via systemd?

Comment 8 Assaf Muller 2014-07-21 11:25:13 UTC
Changing title, priority, severity to fit the following bug:

'service openvswitch restart' kills tunnel devices, which is fixed by 'service neutron-openvswitch-agent restart'. If this can be fixed by modifications to systemV/d, great.

Comment 11 Ihar Hrachyshka 2014-09-15 11:39:18 UTC
L2 agent should probably detect OVS restart and handle it correctly as of: https://review.openstack.org/#/c/101447/

Comment 12 Jakub Libosvar 2014-09-15 13:16:28 UTC
I tested on aio machine that flows are re-created after ovs is restarted as Ihar pointed out. I'm moving this bug to ON_QA to have proper testing of qe.

Comment 14 Nir Magnezi 2014-09-23 12:46:41 UTC
Tested with: openstack-neutron-openvswitch-2014.1.2-4.el6ost.noarch

I see the same behavior as I mentioned in Comment #5:
Post openvswitch service restart, the tunnel devices are not restored, and therefore connectivity is broken.

In regards to the flows, this is how there look after restart:

NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=15.827s, table=0, n_packets=4, n_bytes=288, idle_age=6, priority=0 actions=drop
 cookie=0x0, duration=15.884s, table=0, n_packets=0, n_bytes=0, idle_age=15, priority=1,in_port=1 actions=resubmit(,1)
 cookie=0x0, duration=15.717s, table=1, n_packets=0, n_bytes=0, idle_age=15, priority=1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,21)
 cookie=0x0, duration=15.773s, table=1, n_packets=0, n_bytes=0, idle_age=15, priority=1,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
 cookie=0x0, duration=15.663s, table=2, n_packets=0, n_bytes=0, idle_age=15, priority=0 actions=drop
 cookie=0x0, duration=15.607s, table=3, n_packets=0, n_bytes=0, idle_age=15, priority=0 actions=drop
 cookie=0x0, duration=14.062s, table=3, n_packets=0, n_bytes=0, idle_age=14, priority=1,tun_id=0xa actions=mod_vlan_vid:1,resubmit(,10)
 cookie=0x0, duration=15.549s, table=10, n_packets=0, n_bytes=0, idle_age=15, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1
 cookie=0x0, duration=15.495s, table=20, n_packets=0, n_bytes=0, idle_age=15, priority=0 actions=resubmit(,21)
 cookie=0x0, duration=14.238s, table=20, n_packets=0, n_bytes=0, idle_age=14, priority=2,dl_vlan=1,dl_dst=fa:16:3e:c7:56:c2 actions=strip_vlan,set_tunnel:0xa,output:2
 cookie=0x0, duration=14.237s, table=20, n_packets=0, n_bytes=0, idle_age=14, priority=2,dl_vlan=1,dl_dst=fa:16:3e:97:e8:97 actions=strip_vlan,set_tunnel:0xa,output:3
 cookie=0x0, duration=14.237s, table=20, n_packets=0, n_bytes=0, idle_age=14, priority=2,dl_vlan=1,dl_dst=fa:16:3e:41:d6:68 actions=strip_vlan,set_tunnel:0xa,output:3
 cookie=0x0, duration=14.238s, table=20, n_packets=0, n_bytes=0, idle_age=14, priority=2,dl_vlan=1,dl_dst=fa:16:3e:e5:c4:0f actions=strip_vlan,set_tunnel:0xa,output:2
 cookie=0x0, duration=15.438s, table=21, n_packets=0, n_bytes=0, idle_age=15, priority=0 actions=drop
 cookie=0x0, duration=14.522s, table=21, n_packets=0, n_bytes=0, idle_age=14, dl_vlan=1 actions=strip_vlan,set_tunnel:0xa,output:2,output:3

Comment 16 Jakub Libosvar 2014-10-06 11:22:34 UTC
fdb entries are not re-created after restarting openvswitch as this information is held in db and agent doesn't ask neutron-server for it.

Comment 22 Ihar Hrachyshka 2015-03-19 10:43:58 UTC
The fix in upstream is another one now (still in progress): https://review.openstack.org/#/c/159775/

Comment 23 Jon Schlueter 2015-08-13 20:28:54 UTC
upstream reviews 159775, 101581 and 107409 have been abandoned. Launchpad bug is scheduled for liberty-3 milestone

Comment 25 Nir Yechiel 2015-08-13 21:56:20 UTC
I don't think that there a special reason to track this bug anymore. Closing for now as the issue is clear and workaround exists.


Note You need to log in before you can comment on or make changes to this bug.