Created attachment 1771848 [details] python script multicast.py used for the test Description of problem: A VM on external network subscribed to a multicast group is receiving 2 multicast packets when only one is sent by a sender VM. The issue is a regression, this was not happen in previous puddles. The problem no happens with VMs connected to an internal network. Note: same behavior on OSP16.1 and OSP16.2. The issue started to occur recently on both versions. Version-Release number of selected component (if applicable): RHOS-16.1-RHEL-8-20210413.n.0 python3-networking-ovn-7.3.1-1.20210409093428.4e24f4c.el8ost.noarch ovn2.13-20.12.0-24.el8fdp.x86_64 openvswitch2.13-2.13.0-79.6.el8fdp.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create a keypar and security group allowing ssh, icmp, igmp and all udp. 2. Spawn 2 VMs on external network, in my case VMs were running on different compute nodes. 3. Install the attached multicast.py script into both VMs. 4. On both VMs start tcpdump as follows: sudo tcpdump -i any -vvneA -s0 -l igmp or port 5001. 6. On one VM (receiver) run multicast.py script as follows: python3 multicast -r -g 225.0.0.100 -p 5001 7. On second VM run multicast script as follows: python3 multicast -s -g 225.0.0.100 -p 5001 -m qweqwe -c 1 Actual results: 2 copies of sent multicast message reached the receiver Expected results: 1 copy of sent multicast message reach the receiver Additional info: Capture on sender [cloud-user@vm1 ~]$ sudo tcpdump -i any -vvneA -s0 -l igmp or port 5001 05:43:18.802558 Out fa:16:3e:ad:7e:cf ethertype IPv4 (0x0800), length 50: (tos 0x0, ttl 32, id 51412, offset 0, flags [DF], proto UDP (17), length 34) 10.0.0.241.53673 > 225.0.0.100.commplex-link: [bad udp cksum 0xec74 -> 0xdffb!] UDP, length 6 E.."..@. ... ......d.......qweqwe Capture on receiver [cloud-user@vm2 ~]$ sudo tcpdump -i any -vvneA -s0 -l igmp or port 5001 05:43:18.781359 M fa:16:3e:ad:7e:cf ethertype IPv4 (0x0800), length 50: (tos 0x0, ttl 32, id 51412, offset 0, flags [DF], proto UDP (17), length 34) 10.0.0.241.53673 > 225.0.0.100.commplex-link: [udp sum ok] UDP, length 6 E.."..@. ... ......d........qweqwe 05:43:18.781420 M fa:16:3e:ad:7e:cf ethertype IPv4 (0x0800), length 50: (tos 0x0, ttl 32, id 51412, offset 0, flags [DF], proto UDP (17), length 34) 10.0.0.241.53673 > 225.0.0.100.commplex-link: [udp sum ok] UDP, length 6 E.."..@. ... ......d........qweqwe When doing the same scenario with VMs connected to internal_A network all works properly (only one multicast packet reaches the receiver) Some ovn configs [heat-admin@controller-0 ~]$ ovn-nbctl list logical_switch _uuid : 219b8041-c1a9-4884-ab9e-ec3b39e29e2b acls : [] dns_records : [28dfcc84-bf94-4f22-9e27-d897ceddfa2f] external_ids : {"neutron:mtu"="1500", "neutron:network_name"=nova, "neutron:revision_number"="3"} forwarding_groups : [] load_balancer : [] name : neutron-d1d14a67-6b57-415e-8503-d82f2b2960c4 other_config : {mcast_flood_unregistered="false", mcast_snoop="true", vlan-passthru="false"} ports : [0e8a9359-29a8-498a-9c9f-7742a7c844eb, 237208ec-f516-4169-ab32-f126fce1b413, 37b15a0b-ead6-4a44-9374-2ffc6509020b, ad93b562-46de-4397-a443-c617ad4b2962, b0b3f6ae-35e2-4c73-a8b0-551ed38dcea0, ca9901bf-1afd-45e8-b54e-9ba033d07d70] qos_rules : [] _uuid : 99f3af51-0f64-4ac9-b00f-e0287183aec8 acls : [] dns_records : [74cd5d2c-a1cf-44b8-8a60-e95926ae6aef] external_ids : {"neutron:mtu"="1442", "neutron:network_name"=internal_A, "neutron:revision_number"="2"} forwarding_groups : [] load_balancer : [] name : neutron-b91c4af4-5aa9-4adb-9209-24d1e6383fab other_config : {mcast_flood_unregistered="false", mcast_snoop="true", vlan-passthru="false"} ports : [172c73ff-9a01-44fe-b3b3-6a9b70907538, 24b7cd0b-ba7b-4475-84c9-20c407462e8a, 7f78ce3c-6dac-4ab8-811f-17d79234937b, b9a27e37-4826-42bf-98c0-0c4c72d23073, c175d285-79fc-4162-a859-f71f3d25f275] qos_rules : [] [heat-admin@controller-0 ~]$ ovn-nbctl show switch 219b8041-c1a9-4884-ab9e-ec3b39e29e2b (neutron-d1d14a67-6b57-415e-8503-d82f2b2960c4) (aka nova) port b9576922-fce1-419f-bcf1-6b42809add3c addresses: ["fa:16:3e:d2:dd:68 10.0.0.226 2620:52:0:13b8::1000:96"] port e05a52ca-af40-48b9-9d37-b062b2acc389 type: localport addresses: ["fa:16:3e:4f:5b:4f 10.0.0.151"] port provnet-fc147a22-9144-4988-a7de-c7d8a09c8269 type: localnet addresses: ["unknown"] port 24dba14f-7bf6-41d3-acbc-5ddc4e0cb3b9 addresses: ["fa:16:3e:9c:3a:66 10.0.0.160 2620:52:0:13b8::1000:2f"] port c232ab61-cb41-4dfe-8277-891c44980172 addresses: ["fa:16:3e:ad:7e:cf 10.0.0.241 2620:52:0:13b8::1000:16"] port f8184ba3-0fb1-45f4-ac66-f038f954df21 type: router router-port: lrp-f8184ba3-0fb1-45f4-ac66-f038f954df21 switch 99f3af51-0f64-4ac9-b00f-e0287183aec8 (neutron-b91c4af4-5aa9-4adb-9209-24d1e6383fab) (aka internal_A) port 7c77b212-56ad-457c-a78a-b5d2f587c868 type: router router-port: lrp-7c77b212-56ad-457c-a78a-b5d2f587c868 port 2853f87a-bad8-418b-ab3f-53eddcb255a5 type: localport addresses: ["fa:16:3e:79:8f:32 192.168.1.2"] port 8586719a-2e17-4cfd-a947-3f26dc4acb0a addresses: ["fa:16:3e:c6:5a:ca 192.168.1.227"] port ff3f16f2-4c6e-41e9-81d3-0c2677bdd219 addresses: ["fa:16:3e:7c:76:9b 192.168.1.118"] port 0eb12224-0c1a-4fe6-a6f7-93651b532b9d addresses: ["fa:16:3e:c3:0c:04 192.168.1.65"] switch 415c772f-e8ff-4007-9e0b-02e04fda871d (neutron-7dd68d59-5042-4efc-815a-a6a7b73a7fdb) (aka heat_tempestconf_network) port 98b4694f-e265-4f16-84e3-d659405708e3 type: localport addresses: ["fa:16:3e:c4:4f:74 192.168.199.2"] router 475793f4-53f8-45c7-8b34-4105d74730b8 (neutron-a748c7d2-f546-43a7-8828-e20dd51ab3fb) (aka routerA) port lrp-7c77b212-56ad-457c-a78a-b5d2f587c868 mac: "fa:16:3e:34:d4:62" networks: ["192.168.1.1/24"] port lrp-f8184ba3-0fb1-45f4-ac66-f038f954df21 mac: "fa:16:3e:34:1d:ab" networks: ["10.0.0.200/24", "2620:52:0:13b8::1000:3d/64"] gateway chassis: [cb69ec80-d6d9-4697-9f5c-b89cce5b7510 a8537419-c77a-4bb3-9045-fdfa9dd34698 855269d0-8f37-43bc-9952-40ccbf3afcf4] nat 01d27a77-9d11-4988-ae56-cad845d3abfe external ip: "10.0.0.197" logical ip: "192.168.1.65" type: "dnat_and_snat" nat 2196c8b4-0c15-4617-997d-dc46ddc9100d external ip: "10.0.0.246" logical ip: "192.168.1.227" type: "dnat_and_snat" nat 61cc3a13-0a43-44ac-a251-e251329706c3 external ip: "10.0.0.200" logical ip: "192.168.1.0/24" type: "snat" nat f28e1371-b7f5-46b5-b3ff-9f558a80a25e external ip: "10.0.0.214" logical ip: "192.168.1.118" type: "dnat_and_snat"
Note: I found that in case sender and receiver VMs are running on the same compute node there are no duplicated packets.
This needs a PM ack and blocker request to be included in 16.2.
(In reply to spower from comment #20) > This needs a PM ack and blocker request to be included in 16.2. Hi Sarah, Sorry for the confusion, we have a different BZ for 16.2 [0] which already has the blocker+ flag. This bug is for 16.1 and the code fixing it is already merged so I think we are good here. I am cleaning up the flags. [0] https://bugzilla.redhat.com/show_bug.cgi?id=1949460
Verified on RHOS-16.1-RHEL-8-20210727.n.1 with python3-networking-ovn-7.3.1-1.20210714143305.4e24f4c.el8ost.noarch. Verified that there are no duplicated multicast packets on external network.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3762