Bug 1120719

Summary: Restarting openvswitch service destroys connectivity with l2pop enabled
Product: Red Hat OpenStack Reporter: Assaf Muller <amuller>
Component: openstack-neutronAssignee: Jakub Libosvar <jlibosva>
Status: CLOSED UPSTREAM QA Contact: Toni Freger <tfreger>
Severity: low Docs Contact:
Priority: medium    
Version: 5.0 (RHEL 7)CC: amuller, chrisw, ihrachys, jlibosva, jschluet, lpeer, nyechiel, oblaut, rlandman, sclewis, sgordon, tfreger, twilson, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-13 21:56:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Assaf Muller 2014-07-17 14:44:15 UTC
Description of problem:
Link to upstream discussion:
http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg29606.html

User reports that restarting the OVS agent or the openvswitch service (Unclear if the issue happens when restarting one or the other) causes tunnels to not be formed correctly.

This happens only with l2pop enabled.

It is unclear if the issue manifests in RHOS 5 or is this a recent master regression.

Comment 1 Nir Yechiel 2014-07-17 15:19:08 UTC
Nir, can you please try to reproduce this issue against OSP 5? This might be a blocker as l2pop is our default configuration on Packstack & Staypuft.

Thanks,
Nir

Comment 5 Nir Magnezi 2014-07-20 09:06:47 UTC
amuller and myself tested this.

Assaf will address the first issue the user mentioned.

The second issue, quote: "If the openvswitch restarted, all flows will be lost, including all l2pop flows, the agent is unable to fetch or recreate the l2pop flows"

We have found that restarting the openvswitch agent kills the tunnel devices.
Yet, restarting neutron-openvswitch-agent restored those tunnel devices.

Terry, perhaps this issue can be resolved via systemd?

Comment 8 Assaf Muller 2014-07-21 11:25:13 UTC
Changing title, priority, severity to fit the following bug:

'service openvswitch restart' kills tunnel devices, which is fixed by 'service neutron-openvswitch-agent restart'. If this can be fixed by modifications to systemV/d, great.

Comment 11 Ihar Hrachyshka 2014-09-15 11:39:18 UTC
L2 agent should probably detect OVS restart and handle it correctly as of: https://review.openstack.org/#/c/101447/

Comment 12 Jakub Libosvar 2014-09-15 13:16:28 UTC
I tested on aio machine that flows are re-created after ovs is restarted as Ihar pointed out. I'm moving this bug to ON_QA to have proper testing of qe.

Comment 14 Nir Magnezi 2014-09-23 12:46:41 UTC
Tested with: openstack-neutron-openvswitch-2014.1.2-4.el6ost.noarch

I see the same behavior as I mentioned in Comment #5:
Post openvswitch service restart, the tunnel devices are not restored, and therefore connectivity is broken.

In regards to the flows, this is how there look after restart:

NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=15.827s, table=0, n_packets=4, n_bytes=288, idle_age=6, priority=0 actions=drop
 cookie=0x0, duration=15.884s, table=0, n_packets=0, n_bytes=0, idle_age=15, priority=1,in_port=1 actions=resubmit(,1)
 cookie=0x0, duration=15.717s, table=1, n_packets=0, n_bytes=0, idle_age=15, priority=1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,21)
 cookie=0x0, duration=15.773s, table=1, n_packets=0, n_bytes=0, idle_age=15, priority=1,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
 cookie=0x0, duration=15.663s, table=2, n_packets=0, n_bytes=0, idle_age=15, priority=0 actions=drop
 cookie=0x0, duration=15.607s, table=3, n_packets=0, n_bytes=0, idle_age=15, priority=0 actions=drop
 cookie=0x0, duration=14.062s, table=3, n_packets=0, n_bytes=0, idle_age=14, priority=1,tun_id=0xa actions=mod_vlan_vid:1,resubmit(,10)
 cookie=0x0, duration=15.549s, table=10, n_packets=0, n_bytes=0, idle_age=15, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1
 cookie=0x0, duration=15.495s, table=20, n_packets=0, n_bytes=0, idle_age=15, priority=0 actions=resubmit(,21)
 cookie=0x0, duration=14.238s, table=20, n_packets=0, n_bytes=0, idle_age=14, priority=2,dl_vlan=1,dl_dst=fa:16:3e:c7:56:c2 actions=strip_vlan,set_tunnel:0xa,output:2
 cookie=0x0, duration=14.237s, table=20, n_packets=0, n_bytes=0, idle_age=14, priority=2,dl_vlan=1,dl_dst=fa:16:3e:97:e8:97 actions=strip_vlan,set_tunnel:0xa,output:3
 cookie=0x0, duration=14.237s, table=20, n_packets=0, n_bytes=0, idle_age=14, priority=2,dl_vlan=1,dl_dst=fa:16:3e:41:d6:68 actions=strip_vlan,set_tunnel:0xa,output:3
 cookie=0x0, duration=14.238s, table=20, n_packets=0, n_bytes=0, idle_age=14, priority=2,dl_vlan=1,dl_dst=fa:16:3e:e5:c4:0f actions=strip_vlan,set_tunnel:0xa,output:2
 cookie=0x0, duration=15.438s, table=21, n_packets=0, n_bytes=0, idle_age=15, priority=0 actions=drop
 cookie=0x0, duration=14.522s, table=21, n_packets=0, n_bytes=0, idle_age=14, dl_vlan=1 actions=strip_vlan,set_tunnel:0xa,output:2,output:3

Comment 16 Jakub Libosvar 2014-10-06 11:22:34 UTC
fdb entries are not re-created after restarting openvswitch as this information is held in db and agent doesn't ask neutron-server for it.

Comment 22 Ihar Hrachyshka 2015-03-19 10:43:58 UTC
The fix in upstream is another one now (still in progress): https://review.openstack.org/#/c/159775/

Comment 23 Jon Schlueter 2015-08-13 20:28:54 UTC
upstream reviews 159775, 101581 and 107409 have been abandoned. Launchpad bug is scheduled for liberty-3 milestone

Comment 25 Nir Yechiel 2015-08-13 21:56:20 UTC
I don't think that there a special reason to track this bug anymore. Closing for now as the issue is clear and workaround exists.