Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1805592

Summary:	[OVN2.13][20.B]After guest send a malformed arp, it can't ping other host through a logical router
Product:	Red Hat Enterprise Linux Fast Datapath	Reporter:	ying xu <yinxu>
Component:	ovn2.13	Assignee:	OVN Team <ovnteam>
Status:	CLOSED NOTABUG	QA Contact:	Jianlin Shi <jishi>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	FDP 20.A	CC:	ctrautma, dcbw, fhallal, gcerami, jishi, nusiddiq, ralongi, tredaelli
Target Milestone:	---	Flags:	gcerami: needinfo?
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1775525	Environment:
Last Closed:	2020-07-15 07:13:26 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1775525
Bug Blocks:	1791190

Description ying xu 2020-02-21 05:20:33 UTC

+++ This bug was initially created as a clone of Bug #1775525 +++

Description of problem:
After guest send a malformed arp, it can't ping other host through a logical router

Version-Release number of selected component (if applicable):
[root@dell-per730-57 multicast]# rpm -qa|grep openvs
openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch
openvswitch2.11-2.11.0-26.el7fdp.x86_64
[root@dell-per730-57 multicast]# rpm -qa|grep ovn
ovn2.11-2.11.1-20.el7fdp.x86_64
ovn2.11-central-2.11.1-20.el7fdp.x86_64
ovn2.11-host-2.11.1-20.el7fdp.x86_64


How reproducible:
everytime 

Steps to Reproduce:
topo
         hv1_v1
           |
hv1_vm0---S2----r1-----public(ls)
                |
     hv0_vm0---S3---vm2

1. setup a env of the topo above(or run the case ovn_test_nat to get the env)
[root@dell-per730-19 multicast]# ovn-nbctl show
switch b726549c-4249-4e34-8300-c62f1cbc6ca1 (s2)
    port hv1_vm01_vnet1
        addresses: ["00:de:ad:01:01:01 172.16.102.12"]
    port hv1_vm00_vnet1
        addresses: ["00:de:ad:01:00:01 172.16.102.11"]
    port s2_r1
        type: router
        addresses: ["00:de:ad:ff:01:02 172.16.102.1"]
        router-port: r1_s2
switch 67056c80-7d36-4730-9036-fd4fb5d29310 (public)
    port ln_p1
        type: localnet
        addresses: ["unknown"]
    port public_r1
        type: router
        router-port: r1_public
switch c8f319ef-4db1-4964-aefd-b5288ad1b652 (s3)
    port hv0_vm00_vnet1
        addresses: ["00:de:ad:00:00:01 172.16.103.11"]
    port vm2
        addresses: ["00:00:00:00:00:02"]
    port s3_r1
        type: router
        addresses: ["00:de:ad:ff:01:03 172.16.103.1"]
        router-port: r1_s3
    port hv0_vm01_vnet1
        addresses: ["00:de:ad:00:01:01 172.16.103.12"]
router 964bb90a-f52c-4fae-ba32-520da109b83b (r1)
    port r1_public
        mac: "40:44:00:00:00:03"
        networks: ["172.16.104.1/24"]
    port r1_s2
        mac: "00:de:ad:ff:01:02"
        networks: ["172.16.102.1/24"]
    port r1_s3
        mac: "00:de:ad:ff:01:03"
        networks: ["172.16.103.1/24"]
    nat 2f3e76c8-ab37-4345-b84c-bcea8f3ec231
        external ip: "172.16.104.200"
        logical ip: "172.16.102.11"
        type: "dnat_and_snat"
    nat dbc3612c-bc65-4221-b52b-d29aa7ed7a4b
        external ip: "172.16.104.201"
        logical ip: "172.16.103.11"
        type: "dnat_and_snat"

2. after this step,ping from vm2 to hv0_vm0/hv1_vm0/hv1_vm1 all pass.
ip netns exec vm2 ping 172.16.102.11 -c 3'
PING 172.16.102.11 (172.16.102.11) 56(84) bytes of data.
64 bytes from 172.16.102.11: icmp_seq=1 ttl=63 time=1.24 ms
64 bytes from 172.16.102.11: icmp_seq=2 ttl=63 time=0.294 ms
64 bytes from 172.16.102.11: icmp_seq=3 ttl=63 time=0.230 ms

--- 172.16.102.11 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms

3. send a malformed arp packets from vm2
from scapy.all import *

sendp(Ether(src="00:de:ad:01:00:01", dst="ff:ff:ff:ff:ff:ff")/ARP(op=1,hwsrc='00:de:ad:01:00:01',hwdst='00:00:00:00:00:00',psrc='172.16.103.13',pdst='0.0.0.0'),iface="vm2")

4. from vm2 to hv0_vm0/hv1_vm0/hv1_vm1,only to hv0_vm0 pass.
vm2 can't communicate hosts through router.
ping the ip of the router also failed.
ip netns exec vm2 ping 172.16.102.11 -c 3'
PING 172.16.102.11 (172.16.102.11) 56(84) bytes of data.

--- 172.16.102.11 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 1999ms
ip netns exec vm2 ping 172.16.103.1 -c 3'
PING 172.16.103.1 (172.16.103.1) 56(84) bytes of data.

--- 172.16.103.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 1999ms


Actual results:
ping failed

Expected results:
ping pass

Additional info:

--- Additional comment from ying xu on 2019-11-22 07:58:55 UTC ---

Comment 1 Gabriele Cerami 2020-06-27 13:09:25 UTC

I wouldn't say the packet is malformed. Packet is still valid but it's spoofing the mac using one of an existing host on the other switch.

Anyway, I was unable to reproduce the bug in an environment with Fedora and the upstream version of ovs/ovn.

Is there a way to access the lab with this environment and launch the test case you're mentioning ?

Thanks.

Comment 2 ying xu 2020-07-06 09:38:39 UTC

(In reply to Gabriele Cerami from comment #1)
> I wouldn't say the packet is malformed. Packet is still valid but it's
> spoofing the mac using one of an existing host on the other switch.
> 
> Anyway, I was unable to reproduce the bug in an environment with Fedora and
> the upstream version of ovs/ovn.
> 
> Is there a way to access the lab with this environment and launch the test
> case you're mentioning ?
> 
> Thanks.

If you want the env you can ping me in irc(id:yinxu),I will set it for you.

Comment 4 Gabriele Cerami 2020-07-13 22:54:30 UTC

Thanks Ying for the environment provided.

I was able to modify the test to get a step by step analysis before and after
sending the malformed packet

As a high overview, what happens when vm2 sends ICMP requests to the other hosts, is that the other
hosts receive the packet, they reply correctly, but the reply gets swallowed.

A trace on the return path (the ICMP replies from other hosts) shows the problem

ovn-trace --friendly-names s2 'inport == "hv1_vm00_vnet1" && icmp4.type == 0 && eth.src == 00:de:ad:01:00:01 && ip4.src == 172.16.102.11 && eth.dst == 00:de:ad:ff:01:02 && ip4.dst == 172.16.103.13 && ip.ttl == 64'
# icmp,reg14=0x2,vlan_tci=0x0000,dl_src=00:de:ad:01:00:01,dl_dst=00:de:ad:ff:01:02,nw_src=172.16.102.11,nw_dst=172.16.103.13,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0

[...]

ingress(dp="r1", inport="r1_s2")
--------------------------------
 0. lr_in_admission (ovn-northd.c:7932): eth.dst == 00:de:ad:ff:01:02 && inport == "r1_s2", priority 50, uuid 34c199c5
    next;
 1. lr_in_lookup_neighbor (ovn-northd.c:7981): 1, priority 0, uuid e94a4498
    reg9[3] = 1;
    next;
 2. lr_in_learn_neighbor (ovn-northd.c:7987): reg9[3] == 1 || reg9[2] == 1, priority 100, uuid a3439552
    next;
 9. lr_in_ip_routing (ovn-northd.c:7556): ip4.dst == 172.16.103.0/24, priority 49, uuid 31af4d7c
    ip.ttl--;
    reg8[0..15] = 0;
    reg0 = ip4.dst;
    reg1 = 172.16.103.1;
    eth.src = 00:de:ad:ff:01:03;
    outport = "r1_s3";
    flags.loopback = 1;
    next;
10. lr_in_ip_routing_ecmp (ovn-northd.c:9530): reg8[0..15] == 0, priority 150, uuid 5fb207d4
    next;
12. lr_in_arp_resolve (ovn-northd.c:10010): ip4, priority 0, uuid cbfdafed
    get_arp(outport, reg0);
    /* MAC binding to 00:de:ad:01:00:01. */
    next;

The malformed packet is successful in his original intention: poison the arp table of the router.
The get_arp function returns a MAC that is not the original MAC of vm2, but the mac sent in the malformed packet.

Of course then the packet gets dropped because the next table does not have any match for the packet contents.

Looking at the database in /var/lib/ovn/ovnsb_db.db I can see the port is added as 

{"_date":1594675403735,"MAC_Binding":{"3311fdca-910e-4d43-8db9-f4692e57b607":{"ip":"172.16.103.13","logical_port":"r1_s3","mac":"00:00:00:00:00:02","datapath":["uuid","6987a142-b710-4e26-a350-ce943626af44"]}},"_comment":"ovn-controller: registering chassis 'hv0'"}

I can see the update to the db with the new mac binding caused by the malformed packet.

{"_date":1594679859567,"MAC_Binding":{"3311fdca-910e-4d43-8db9-f4692e57b607":{"mac":"00:de:ad:01:00:01"}},"_comment":"ovn-controller: registering chassis 'hv0'"}

Sending an arp from vm2 with the correct format and MAC, updates again to the correct binding and fixes the situation.


If we want to avoid this we need to trust only the initial binding information for that port and add rules that block the ARP packet from unknown MACs for that specific port.

Comment 5 Numan Siddique 2020-07-14 09:05:12 UTC

We already support blocking ARP packets from unknown MACs.
For that you need to set the port_security colum of logical switch port.
You can set it as :
ovn-nbctl lsp-set-port-security <port_name> "MAC IP1 .."
In many deployments, the port security column will be same as "addresses" column.

In your testing is port_security set ?

If not, please set it and try again.

Comment 6 Gabriele Cerami 2020-07-14 09:19:51 UTC

Thanks Numan!

the port security was not set.

I launched that command and now the ARP is ignored, and ICMPs continue to flow
regularly even after sending the malformed packet.

So the test case probably needs update.

Thanks all.

Comment 7 Gabriele Cerami 2020-07-14 09:49:21 UTC

Just to make things clearer for everyone.

This is not a bug in OVN, without port security active this is the correct behaviour.
If we want to protect the router from ARP poisoning attack the best mitigation technique is
to activate port security.

So the test needs update.

Comment 8 ying xu 2020-07-15 07:13:26 UTC

close the bug as what comment7 said