Description of problem: Link to upstream discussion: http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg29606.html User reports that restarting the OVS agent or the openvswitch service (Unclear if the issue happens when restarting one or the other) causes tunnels to not be formed correctly. This happens only with l2pop enabled. It is unclear if the issue manifests in RHOS 5 or is this a recent master regression.
Nir, can you please try to reproduce this issue against OSP 5? This might be a blocker as l2pop is our default configuration on Packstack & Staypuft. Thanks, Nir
amuller and myself tested this. Assaf will address the first issue the user mentioned. The second issue, quote: "If the openvswitch restarted, all flows will be lost, including all l2pop flows, the agent is unable to fetch or recreate the l2pop flows" We have found that restarting the openvswitch agent kills the tunnel devices. Yet, restarting neutron-openvswitch-agent restored those tunnel devices. Terry, perhaps this issue can be resolved via systemd?
Changing title, priority, severity to fit the following bug: 'service openvswitch restart' kills tunnel devices, which is fixed by 'service neutron-openvswitch-agent restart'. If this can be fixed by modifications to systemV/d, great.
L2 agent should probably detect OVS restart and handle it correctly as of: https://review.openstack.org/#/c/101447/
I tested on aio machine that flows are re-created after ovs is restarted as Ihar pointed out. I'm moving this bug to ON_QA to have proper testing of qe.
Tested with: openstack-neutron-openvswitch-2014.1.2-4.el6ost.noarch I see the same behavior as I mentioned in Comment #5: Post openvswitch service restart, the tunnel devices are not restored, and therefore connectivity is broken. In regards to the flows, this is how there look after restart: NXST_FLOW reply (xid=0x4): cookie=0x0, duration=15.827s, table=0, n_packets=4, n_bytes=288, idle_age=6, priority=0 actions=drop cookie=0x0, duration=15.884s, table=0, n_packets=0, n_bytes=0, idle_age=15, priority=1,in_port=1 actions=resubmit(,1) cookie=0x0, duration=15.717s, table=1, n_packets=0, n_bytes=0, idle_age=15, priority=1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,21) cookie=0x0, duration=15.773s, table=1, n_packets=0, n_bytes=0, idle_age=15, priority=1,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20) cookie=0x0, duration=15.663s, table=2, n_packets=0, n_bytes=0, idle_age=15, priority=0 actions=drop cookie=0x0, duration=15.607s, table=3, n_packets=0, n_bytes=0, idle_age=15, priority=0 actions=drop cookie=0x0, duration=14.062s, table=3, n_packets=0, n_bytes=0, idle_age=14, priority=1,tun_id=0xa actions=mod_vlan_vid:1,resubmit(,10) cookie=0x0, duration=15.549s, table=10, n_packets=0, n_bytes=0, idle_age=15, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1 cookie=0x0, duration=15.495s, table=20, n_packets=0, n_bytes=0, idle_age=15, priority=0 actions=resubmit(,21) cookie=0x0, duration=14.238s, table=20, n_packets=0, n_bytes=0, idle_age=14, priority=2,dl_vlan=1,dl_dst=fa:16:3e:c7:56:c2 actions=strip_vlan,set_tunnel:0xa,output:2 cookie=0x0, duration=14.237s, table=20, n_packets=0, n_bytes=0, idle_age=14, priority=2,dl_vlan=1,dl_dst=fa:16:3e:97:e8:97 actions=strip_vlan,set_tunnel:0xa,output:3 cookie=0x0, duration=14.237s, table=20, n_packets=0, n_bytes=0, idle_age=14, priority=2,dl_vlan=1,dl_dst=fa:16:3e:41:d6:68 actions=strip_vlan,set_tunnel:0xa,output:3 cookie=0x0, duration=14.238s, table=20, n_packets=0, n_bytes=0, idle_age=14, priority=2,dl_vlan=1,dl_dst=fa:16:3e:e5:c4:0f actions=strip_vlan,set_tunnel:0xa,output:2 cookie=0x0, duration=15.438s, table=21, n_packets=0, n_bytes=0, idle_age=15, priority=0 actions=drop cookie=0x0, duration=14.522s, table=21, n_packets=0, n_bytes=0, idle_age=14, dl_vlan=1 actions=strip_vlan,set_tunnel:0xa,output:2,output:3
fdb entries are not re-created after restarting openvswitch as this information is held in db and agent doesn't ask neutron-server for it.
The fix in upstream is another one now (still in progress): https://review.openstack.org/#/c/159775/
upstream reviews 159775, 101581 and 107409 have been abandoned. Launchpad bug is scheduled for liberty-3 milestone
I don't think that there a special reason to track this bug anymore. Closing for now as the issue is clear and workaround exists.