+++ This bug was initially created as a clone of Bug #1775525 +++ Description of problem: After guest send a malformed arp, it can't ping other host through a logical router Version-Release number of selected component (if applicable): [root@dell-per730-57 multicast]# rpm -qa|grep openvs openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch openvswitch2.11-2.11.0-26.el7fdp.x86_64 [root@dell-per730-57 multicast]# rpm -qa|grep ovn ovn2.11-2.11.1-20.el7fdp.x86_64 ovn2.11-central-2.11.1-20.el7fdp.x86_64 ovn2.11-host-2.11.1-20.el7fdp.x86_64 How reproducible: everytime Steps to Reproduce: topo hv1_v1 | hv1_vm0---S2----r1-----public(ls) | hv0_vm0---S3---vm2 1. setup a env of the topo above(or run the case ovn_test_nat to get the env) [root@dell-per730-19 multicast]# ovn-nbctl show switch b726549c-4249-4e34-8300-c62f1cbc6ca1 (s2) port hv1_vm01_vnet1 addresses: ["00:de:ad:01:01:01 172.16.102.12"] port hv1_vm00_vnet1 addresses: ["00:de:ad:01:00:01 172.16.102.11"] port s2_r1 type: router addresses: ["00:de:ad:ff:01:02 172.16.102.1"] router-port: r1_s2 switch 67056c80-7d36-4730-9036-fd4fb5d29310 (public) port ln_p1 type: localnet addresses: ["unknown"] port public_r1 type: router router-port: r1_public switch c8f319ef-4db1-4964-aefd-b5288ad1b652 (s3) port hv0_vm00_vnet1 addresses: ["00:de:ad:00:00:01 172.16.103.11"] port vm2 addresses: ["00:00:00:00:00:02"] port s3_r1 type: router addresses: ["00:de:ad:ff:01:03 172.16.103.1"] router-port: r1_s3 port hv0_vm01_vnet1 addresses: ["00:de:ad:00:01:01 172.16.103.12"] router 964bb90a-f52c-4fae-ba32-520da109b83b (r1) port r1_public mac: "40:44:00:00:00:03" networks: ["172.16.104.1/24"] port r1_s2 mac: "00:de:ad:ff:01:02" networks: ["172.16.102.1/24"] port r1_s3 mac: "00:de:ad:ff:01:03" networks: ["172.16.103.1/24"] nat 2f3e76c8-ab37-4345-b84c-bcea8f3ec231 external ip: "172.16.104.200" logical ip: "172.16.102.11" type: "dnat_and_snat" nat dbc3612c-bc65-4221-b52b-d29aa7ed7a4b external ip: "172.16.104.201" logical ip: "172.16.103.11" type: "dnat_and_snat" 2. after this step,ping from vm2 to hv0_vm0/hv1_vm0/hv1_vm1 all pass. ip netns exec vm2 ping 172.16.102.11 -c 3' PING 172.16.102.11 (172.16.102.11) 56(84) bytes of data. 64 bytes from 172.16.102.11: icmp_seq=1 ttl=63 time=1.24 ms 64 bytes from 172.16.102.11: icmp_seq=2 ttl=63 time=0.294 ms 64 bytes from 172.16.102.11: icmp_seq=3 ttl=63 time=0.230 ms --- 172.16.102.11 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2001ms 3. send a malformed arp packets from vm2 from scapy.all import * sendp(Ether(src="00:de:ad:01:00:01", dst="ff:ff:ff:ff:ff:ff")/ARP(op=1,hwsrc='00:de:ad:01:00:01',hwdst='00:00:00:00:00:00',psrc='172.16.103.13',pdst='0.0.0.0'),iface="vm2") 4. from vm2 to hv0_vm0/hv1_vm0/hv1_vm1,only to hv0_vm0 pass. vm2 can't communicate hosts through router. ping the ip of the router also failed. ip netns exec vm2 ping 172.16.102.11 -c 3' PING 172.16.102.11 (172.16.102.11) 56(84) bytes of data. --- 172.16.102.11 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms ip netns exec vm2 ping 172.16.103.1 -c 3' PING 172.16.103.1 (172.16.103.1) 56(84) bytes of data. --- 172.16.103.1 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms Actual results: ping failed Expected results: ping pass Additional info: --- Additional comment from ying xu on 2019-11-22 07:58:55 UTC ---
I wouldn't say the packet is malformed. Packet is still valid but it's spoofing the mac using one of an existing host on the other switch. Anyway, I was unable to reproduce the bug in an environment with Fedora and the upstream version of ovs/ovn. Is there a way to access the lab with this environment and launch the test case you're mentioning ? Thanks.
(In reply to Gabriele Cerami from comment #1) > I wouldn't say the packet is malformed. Packet is still valid but it's > spoofing the mac using one of an existing host on the other switch. > > Anyway, I was unable to reproduce the bug in an environment with Fedora and > the upstream version of ovs/ovn. > > Is there a way to access the lab with this environment and launch the test > case you're mentioning ? > > Thanks. If you want the env you can ping me in irc(id:yinxu),I will set it for you.
Thanks Ying for the environment provided. I was able to modify the test to get a step by step analysis before and after sending the malformed packet As a high overview, what happens when vm2 sends ICMP requests to the other hosts, is that the other hosts receive the packet, they reply correctly, but the reply gets swallowed. A trace on the return path (the ICMP replies from other hosts) shows the problem ovn-trace --friendly-names s2 'inport == "hv1_vm00_vnet1" && icmp4.type == 0 && eth.src == 00:de:ad:01:00:01 && ip4.src == 172.16.102.11 && eth.dst == 00:de:ad:ff:01:02 && ip4.dst == 172.16.103.13 && ip.ttl == 64' # icmp,reg14=0x2,vlan_tci=0x0000,dl_src=00:de:ad:01:00:01,dl_dst=00:de:ad:ff:01:02,nw_src=172.16.102.11,nw_dst=172.16.103.13,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 [...] ingress(dp="r1", inport="r1_s2") -------------------------------- 0. lr_in_admission (ovn-northd.c:7932): eth.dst == 00:de:ad:ff:01:02 && inport == "r1_s2", priority 50, uuid 34c199c5 next; 1. lr_in_lookup_neighbor (ovn-northd.c:7981): 1, priority 0, uuid e94a4498 reg9[3] = 1; next; 2. lr_in_learn_neighbor (ovn-northd.c:7987): reg9[3] == 1 || reg9[2] == 1, priority 100, uuid a3439552 next; 9. lr_in_ip_routing (ovn-northd.c:7556): ip4.dst == 172.16.103.0/24, priority 49, uuid 31af4d7c ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; reg1 = 172.16.103.1; eth.src = 00:de:ad:ff:01:03; outport = "r1_s3"; flags.loopback = 1; next; 10. lr_in_ip_routing_ecmp (ovn-northd.c:9530): reg8[0..15] == 0, priority 150, uuid 5fb207d4 next; 12. lr_in_arp_resolve (ovn-northd.c:10010): ip4, priority 0, uuid cbfdafed get_arp(outport, reg0); /* MAC binding to 00:de:ad:01:00:01. */ next; The malformed packet is successful in his original intention: poison the arp table of the router. The get_arp function returns a MAC that is not the original MAC of vm2, but the mac sent in the malformed packet. Of course then the packet gets dropped because the next table does not have any match for the packet contents. Looking at the database in /var/lib/ovn/ovnsb_db.db I can see the port is added as {"_date":1594675403735,"MAC_Binding":{"3311fdca-910e-4d43-8db9-f4692e57b607":{"ip":"172.16.103.13","logical_port":"r1_s3","mac":"00:00:00:00:00:02","datapath":["uuid","6987a142-b710-4e26-a350-ce943626af44"]}},"_comment":"ovn-controller: registering chassis 'hv0'"} I can see the update to the db with the new mac binding caused by the malformed packet. {"_date":1594679859567,"MAC_Binding":{"3311fdca-910e-4d43-8db9-f4692e57b607":{"mac":"00:de:ad:01:00:01"}},"_comment":"ovn-controller: registering chassis 'hv0'"} Sending an arp from vm2 with the correct format and MAC, updates again to the correct binding and fixes the situation. If we want to avoid this we need to trust only the initial binding information for that port and add rules that block the ARP packet from unknown MACs for that specific port.
We already support blocking ARP packets from unknown MACs. For that you need to set the port_security colum of logical switch port. You can set it as : ovn-nbctl lsp-set-port-security <port_name> "MAC IP1 .." In many deployments, the port security column will be same as "addresses" column. In your testing is port_security set ? If not, please set it and try again.
Thanks Numan! the port security was not set. I launched that command and now the ARP is ignored, and ICMPs continue to flow regularly even after sending the malformed packet. So the test case probably needs update. Thanks all.
Just to make things clearer for everyone. This is not a bug in OVN, without port security active this is the correct behaviour. If we want to protect the router from ARP poisoning attack the best mitigation technique is to activate port security. So the test needs update.
close the bug as what comment7 said