Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1867769

Summary: OSP 16.1 openvswitch firewall driver blocks egress vrrp multicast [OVS/ML2]
Product: Red Hat OpenStack Reporter: Matt Flusche <mflusche>
Component: openstack-neutronAssignee: Rodolfo Alonso <ralonsoh>
Status: CLOSED WONTFIX QA Contact: Eran Kuris <ekuris>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.1 (Train)CC: amuller, bcafarel, chrisw, cmuresan, ctrautma, ddelcian, dhill, echaudro, fbaudin, hakhande, jiqiu, laparici, ldenny, ralonsoh, rcunha, scohen, skaplons, vchundur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-05 09:28:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1876459, 1876560    
Bug Blocks:    

Description Matt Flusche 2020-08-10 17:55:16 UTC
Description of problem:
Note: I'm opening this bz initially as a neutron bug to get their input; perhaps this is an openvswitch bug.

It seems openvswitch in OSP 16 incorrectly blocks egress vrrp multicast instance traffic when using the openvswitch firewall driver.


OSP 16.1
OVS/ML2 neutron plugin
openvswitch firewall driver

- From tcpdump we see traffic being blocked from the vrrp multicast sender port (egress traffic).  By default all egress traffic should pass.

- Other udp multicast traffic does not get blocked.  
- With port security disabled this traffic is successful.
- Looking at the OVS flow rules; it seems this vrrp multicast traffic is hitting a flow rule that checks for invalid packets and then drops (Note: no other real traffic in this environment as it is a new deployment being tested).

- Incrementing packet match while reproducing the vrrp multicast failure/block.
# for i in 1 2 3; do ovs-ofctl dump-flows br-int |grep '+inv+trk'   ; echo "-------- sleep 10 sec ---------"; sleep 10; done
 cookie=0x2daf79ecdc8d8054, duration=13193.379s, table=72, n_packets=655236, n_bytes=35383440, idle_age=0, priority=50,ct_state=+inv+trk actions=resubmit(,93)
 cookie=0x2daf79ecdc8d8054, duration=13193.379s, table=82, n_packets=0, n_bytes=0, idle_age=65534, priority=50,ct_state=+inv+trk actions=resubmit(,93)
-------- sleep 10 sec ---------
 cookie=0x2daf79ecdc8d8054, duration=13203.396s, table=72, n_packets=655246, n_bytes=35383980, idle_age=0, priority=50,ct_state=+inv+trk actions=resubmit(,93)
 cookie=0x2daf79ecdc8d8054, duration=13203.396s, table=82, n_packets=0, n_bytes=0, idle_age=65534, priority=50,ct_state=+inv+trk actions=resubmit(,93)
-------- sleep 10 sec ---------
 cookie=0x2daf79ecdc8d8054, duration=13213.416s, table=72, n_packets=655256, n_bytes=35384520, idle_age=0, priority=50,ct_state=+inv+trk actions=resubmit(,93)
 cookie=0x2daf79ecdc8d8054, duration=13213.416s, table=82, n_packets=0, n_bytes=0, idle_age=65534, priority=50,ct_state=+inv+trk actions=resubmit(,93)

Table 93:

 cookie=0x2daf79ecdc8d8054, duration=503301.633s, table=93, n_packets=624638, n_bytes=33731572, idle_age=65534, hard_age=65534, priority=0 actions=drop

I'll provide additional details in private comments

Version-Release number of selected component (if applicable):
OSP 16.1

$ egrep 'openvswitch|neutron' installed-rpms 
network-scripts-openvswitch2.11-2.11.0-56.20200327gita4efc59.el8fdp.x86_64 Mon Aug  3 17:12:29 2020
network-scripts-openvswitch2.13-2.13.0-39.el8fdp.x86_64     Mon Aug  3 17:09:38 2020
openvswitch2.13-2.13.0-39.el8fdp.x86_64                     Mon Aug  3 17:10:31 2020
openvswitch-selinux-extra-policy-1.0-22.el8fdp.noarch       Wed May 13 09:41:09 2020
puppet-neutron-15.5.1-0.20200514103419.0a45ec7.el8ost.noarch Mon Aug  3 17:09:21 2020
python3-neutronclient-6.14.0-0.20200310192910.115f60f.el8ost.noarch Mon Aug  3 17:10:27 2020
rhosp-openvswitch-2.13-8.el8ost.noarch                      Mon Aug  3 17:12:27 2020


How reproducible:
Unknown

Steps to Reproduce:
1. OSP 16.1 deployment with ML2/OVS and openvswitch firewall driver
2. deploy instances that run vrrp (keepalived)
3. note vrrp communication issues

Actual results:
vrrp is being blocked on the egress ovs ports

Expected results:
all egress traffic should pass by default.

Additional info:

Comment 20 Eelco Chaudron 2020-08-31 13:11:39 UTC
Hi Matt,

Can you get me the following info to continue the investigation:

- Get a capture of the actual VRRP packet not making it trough. This was you can determine the src/dst MAC and IPs. 
- Save this packet to a pcap file so I can take a look at the details also
- With this capture convert the specific packet for use with ofproto trace doing "ovs-pcap out.pcap"
- Now use this packet to do the ofproto trace, for example (make sure to use your port and packet data):

 ovs-appctl ofproto/trace ovs_pvp_br0 in_port=<your port> <your hex packet output from ovs-pcap> 

- Get me an openvswitch only SOS report right after the above so I can compare the data and do a replication.

Thanks,

Eelco

Comment 21 Eelco Chaudron 2020-09-02 12:41:29 UTC
Any update on #20? Asking as this is marked as an urgent BZ, and escalated as PM high but got no input for 2 days.

Comment 26 Eelco Chaudron 2020-09-02 13:53:45 UTC
Thanks for the info, I'll try to replicate the behavior on my dev setup.

Comment 27 Eelco Chaudron 2020-09-03 11:29:30 UTC
So the issue is clear, connection tracking for the userspace datapath only supports TCP/UDP/ICMPv4/ICMPv6, all other protocols are NOT supported. For connection tracking this results for the state to be set to +inv.

Not sure how OSP rules are build but looking at the current OVS ruleset in the SOS report it can be solved by adding the following rules:

ovs-ofctl add-flow br-int "table=72 priority=77,ct_state=+inv,ip,reg5=0x9,nw_proto=112 actions=resubmit(,73)"
ovs-ofctl add-flow br-int "table=73 priority=90,ct_state=+inv,ip,reg5=0x9 actions=resubmit(,91)"

At least this worked in my replication setup:

ovs-ofctl del-flows ovs_pvp_br0
ovs-ofctl add-flow ovs_pvp_br0 "in_port=dpdk0,actions=goto_table:60"

ovs-ofctl add-flow ovs_pvp_br0 "table=60,priority=100,in_port=dpdk0,actions=set_field:0x9->reg5,set_field:0x2->reg6,resubmit(,71)"

ovs-ofctl add-flow ovs_pvp_br0 "table=71,priority=65,ip,reg5=0x9,in_port=dpdk0,dl_src=00:00:5e:00:01:01,nw_src=192.168.0.30,actions=ct(table=72,zone=NXM_NX_REG6[0..15])"
ovs-ofctl add-flow ovs_pvp_br0 "table=71 priority=0 actions=drop"

ovs-ofctl add-flow ovs_pvp_br0 "table=72 priority=77,ct_state=+est-rel-rpl,ip,reg5=0x9,nw_dst=192.168.0.0/24,nw_proto=112 actions=resubmit(,73)"
ovs-ofctl add-flow ovs_pvp_br0 "table=72 priority=77,ct_state=+new-est,ip,reg5=0x9,nw_dst=192.168.0.0/24,nw_proto=112 actions=resubmit(,73)"
ovs-ofctl add-flow ovs_pvp_br0 "table=72 priority=77,ct_state=+est-rel-rpl,ip,reg5=0x9,nw_proto=112 actions=resubmit(,73)"
ovs-ofctl add-flow ovs_pvp_br0 "table=72 priority=77,ct_state=+new-est,ip,reg5=0x9,nw_proto=112 actions=resubmit(,73)"
ovs-ofctl add-flow ovs_pvp_br0 "table=72 priority=77,ct_state=+inv,ip,reg5=0x9,nw_proto=112 actions=resubmit(,73)"
ovs-ofctl add-flow ovs_pvp_br0 "table=72 priority=0 actions=drop"

ovs-ofctl add-flow ovs_pvp_br0 "table=73 priority=90,ct_state=+inv,ip,reg5=0x9 actions=resubmit(,91)"
ovs-ofctl add-flow ovs_pvp_br0 "table=73 priority=90,ct_state=+new-est,ip,reg5=0x9 actions=ct(commit,zone=NXM_NX_REG6[0..15]),resubmit(,91)"
ovs-ofctl add-flow ovs_pvp_br0 "table=73 priority=80,reg5=0x9 actions=resubmit(,91)"

ovs-ofctl add-flow ovs_pvp_br0 "table=91 actions=dpdk1"

Comment 28 Rodolfo Alonso 2020-09-03 13:52:03 UTC
Hi Matt:

According to Eelco's reply (thanks for the analysis) and further conversations out of this BZ, this is not going to be solved in the next months. Then the customer could decide to:
1) Use kernel OVS. VRRP protocol packets are correctly tracked and forwarded and OVS firewall can be used.
2) For those systems transmitting/receiving VRRP traffic, disable the OVS firewall. That will disable security groups in this system but will allow VRRP traffic to be transmitted.

Regards.

Comment 29 Eelco Chaudron 2020-09-03 13:56:37 UTC
(In reply to Rodolfo Alonso from comment #28)
> Hi Matt:
> 
> According to Eelco's reply (thanks for the analysis) and further
> conversations out of this BZ, this is not going to be solved in the next
> months. 

I'm referring here to the OVS-DPDK connection tracking implementation, however, I do think the workaround above with the additional (or replacing of) the CT rules could fix this. However, this requires a Neutron change. 

> Then the customer could decide to:
> 1) Use kernel OVS. VRRP protocol packets are correctly tracked and forwarded
> and OVS firewall can be used.
> 2) For those systems transmitting/receiving VRRP traffic, disable the OVS
> firewall. That will disable security groups in this system but will allow
> VRRP traffic to be transmitted.

These can be short term workarounds.

Also, I guess this BZ should go back to the neutron team to investigate the neutron change.

Comment 31 Rodolfo Alonso 2020-09-03 13:58:41 UTC
Hi Eelco:

This is highly unlikely that we implement this kind of workaround just for a specific protocol and a specific backend.

Regards.

Comment 33 Eelco Chaudron 2020-09-03 14:49:50 UTC
(In reply to Rodolfo Alonso from comment #31)
> Hi Eelco:
> 
> This is highly unlikely that we implement this kind of workaround just for a
> specific protocol and a specific backend.

Ok, will leave this to the Neutron team to decide, will assign this BZ back to them for now.

I'll create an RFE BZ for OVS to see if this can ever be added in the future ;)

Comment 34 Eelco Chaudron 2020-09-03 15:02:11 UTC
> (In reply to Rodolfo Alonso from comment #31)
> This is highly unlikely that we implement this kind of workaround just for a
> specific protocol and a specific backend.

I think you could say for this specific datapath you need it for any protocol not being UDP/TCP/ICMPv4/ICMPv6.