Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1569062

Summary: ovs-fw does not reinstate GRE conntrack entry
Product: Red Hat OpenStack Reporter: Jakub Libosvar <jlibosva>
Component: openstack-neutronAssignee: Jakub Libosvar <jlibosva>
Status: CLOSED ERRATA QA Contact: Roee Agiman <ragiman>
Severity: high Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: amuller, bhaley, chrisw, jlibosva, nyechiel, ragiman, srevivo
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-11.0.2-7.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-20 12:51:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jakub Libosvar 2018-04-18 14:44:08 UTC
*High level description:*

We have VMs running GRE tunnels between them with OVSFW and SG implemented along with GRE conntrack helper loaded on the hypervisor. GRE works as expected but the tunnel breaks whenever there is a neutron ovs agent event causing some exception like the below AMQP timeouts or OVSFW port not found :

AMQP Timeout :

2017-04-07 19:07:03.001 5275 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent MessagingTimeout: Timed out waiting for a reply to message ID 4035644808d24ce9aae65a6ee567021c
2017-04-07 19:07:03.001 5275 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
2017-04-07 19:07:03.003 5275 WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent.OVSNeutronAgent._report_state' run outlasted interval by 120.01 sec
2017-04-07 19:07:03.041 5275 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Agent has just been revived. Doing a full sync.
2017-04-07 19:07:06.747 5275 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] rpc_loop doing a full sync.
2017-04-07 19:07:06.841 5275 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] Agent out of sync with plugin!

OVSFWPortNOtFound:

2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent self.firewall.prepare_port_filter(device)
2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/openstack/venvs/neutron-14.0.5/lib/python2.7/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py", line 272, in prepare_port_filter
2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent of_port = self.get_or_create_ofport(port)
2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/openstack/venvs/neutron-14.0.5/lib/python2.7/site-packages/neutron/agent/linux/openvswitch_firewall/firewall.py", line 246, in get_or_create_ofport
2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent raise OVSFWPortNotFound(port_id=port_id)
2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent OVSFWPortNotFound: Port 01f7c714-1828-4768-9810-a0ec25dd2b92 is not managed by this agent.
2017-03-30 18:31:05.048 5160 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
2017-03-30 18:31:05.072 5160 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-db74f32b-5370-4a5f-86bf-935eba1490d0 - - - - -] Agent out of sync with plugin!

The agent throws out of sync messages and starts to initialize neutron ports once again along with fresh SG rules.

2017-04-07 19:07:07.110 5275 INFO neutron.agent.securitygroups_rpc [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] Preparing filters for devices set([u'4b14619f-3b9e-4103-b9d7-9c7e52c797d8'])
2017-04-07 19:07:07.215 5275 ERROR neutron.agent.linux.openvswitch_firewall.firewall [req-521c07b4-f53d-4665-b728-fc5f00191294 - - - - -] Initializing port 4b14619f-3b9e-4103-b9d7-9c7e52c797d8 that was already initialized.

During this process, when it prepares new filters for all ports, its marking the conntrack entry for certain GRE connection(high traffic) as invalid.

root@server:/var/log# conntrack -L -o extended -p gre -f ipv4
ipv4 2 gre 47 178 src=1.1.1.203 dst=2.2.2.66 srckey=0x0 dstkey=0x0 src=2.2.2.66 dst=1.1.1.203 srckey=0x0 dstkey=0x0 [ASSURED] mark=1 zone=5 use=1
ipv4 2 gre 47 179 src=5.5.5.104 dst=4.4.4.187 srckey=0x0 dstkey=0x0 src=4.4.4.187 dst=5.5.5.104 srckey=0x0 dstkey=0x0 [ASSURED] mark=0 zone=5 use=1

And that connection state remains invalid, unless someone reboots the VM, or flushes the connection directly on the conntrack or through OVS.

We have a blanket any protocol any port any IP SG rule during this scenario, we even tried adding specific rules to allow IP 47 for GRE. But nothing fixed this problem.

Was checking for ovs-conntrack helper specific bugs and came across https://patchwork.ozlabs.org/patch/755615/ - is this bug being triggered in the above scenario ? Is this a bug in the ovs-fw code or this something on the ovs-conntrack implementation.

OpenStack Version : Newton.
Hypervisor OS : Ubuntu 16.04.2
Kernel version : 4.4.0-70-generic
OVS version : 2.6.1

Comment 9 errata-xmlrpc 2018-08-20 12:51:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2514

Comment 10 Slawek Kaplonski 2018-09-24 06:54:51 UTC
*** Bug 1568155 has been marked as a duplicate of this bug. ***