1879241 – [OVN] Pods always lose one multicast packets

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1879241 - [OVN] Pods always lose one multicast packets

Summary: [OVN] Pods always lose one multicast packets

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux Fast Datapath
Classification:	Red Hat
Component:	OVN
Sub Component:
Version:	FDP 20.E
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Dumitru Ceara
QA Contact:	Jianlin Shi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-15 18:22 UTC by Weibin Liang
Modified:	2021-04-06 12:37 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-04-06 12:37:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Weibin Liang 2020-09-15 18:22:40 UTC

Description of problem:
Comparing multicast in SDN, Pods in OVN cluster alway lose one multicast packets during sending 5, 20 and 100 multicast packets testing.

Below are test results from SDN and OVN using latest v4.6 image.


Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-09-15-063156

How reproducible:
Always

Steps to Reproduce:
For OVN cluster:
1. oc new-project  multicast-test
2. oc annotate namespace multicast-test k8s.ovn.org/multicast-enabled="true"
3. oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/Features/Pod/multicast-pod.yaml

Actual results:
## OVN cluster: two pods in the same node
[weliang@weliang verification-tests]$ oc get pod -o wide
NAME             READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
mcast-rc-bqs2j   1/1     Running   0          31s   10.131.0.6    ip-10-0-184-202.us-east-2.compute.internal   <none>           <none>
mcast-rc-g2pmj   1/1     Running   0          31s   10.129.2.18   ip-10-0-215-115.us-east-2.compute.internal   <none>           <none>
mcast-rc-hhjlq   1/1     Running   0          31s   10.128.2.11   ip-10-0-135-183.us-east-2.compute.internal   <none>           <none>
mcast-rc-j6l96   1/1     Running   0          31s   10.131.0.7    ip-10-0-184-202.us-east-2.compute.internal   <none>           <none>
mcast-rc-wh9hh   1/1     Running   0          31s   10.129.2.17   ip-10-0-215-115.us-east-2.compute.internal   <none>           <none>
mcast-rc-xkptm   1/1     Running   0          31s   10.128.2.10   ip-10-0-135-183.us-east-2.compute.internal   <none>           <none>
[weliang@weliang verification-tests]$ oc rsh mcast-rc-g2pmj 
$ omping -m 239.255.254.24  -c 5 10.129.2.18 10.129.2.17
-- snip --
10.129.2.17 :   unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 0.083/0.099/0.129/0.019
10.129.2.17 : multicast, xmt/rcv/%loss = 5/4/19% (seq>=2 0%), min/avg/max/std-dev = 0.122/0.226/0.477/0.168
$ omping -m 239.255.254.24  -c 20 10.129.2.18 10.129.2.17
-- snip --
10.129.2.17 :   unicast, xmt/rcv/%loss = 20/20/0%, min/avg/max/std-dev = 0.065/0.230/2.624/0.564
10.129.2.17 : multicast, xmt/rcv/%loss = 20/20/0%, min/avg/max/std-dev = 0.117/0.292/2.675/0.568
$ omping -m 239.255.254.24  -c 100 10.129.2.18 10.129.2.17
-- snip --
10.129.2.17 :   unicast, xmt/rcv/%loss = 100/100/0%, min/avg/max/std-dev = 0.076/0.121/1.038/0.114
10.129.2.17 : multicast, xmt/rcv/%loss = 100/99/1% (seq>=2 0%), min/avg/max/std-dev = 0.103/0.169/1.070/0.117

[weliang@weliang verification-tests]$ oc rsh mcast-rc-wh9hh
$ omping -m 239.255.254.24  -c 5 10.129.2.18 10.129.2.17
-- snip --
10.129.2.18 :   unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 0.030/0.107/0.139/0.044
10.129.2.18 : multicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 0.160/0.271/0.692/0.236
$ omping -m 239.255.254.24  -c 20 10.129.2.18 10.129.2.17
-- snip --
10.129.2.18 :   unicast, xmt/rcv/%loss = 20/20/0%, min/avg/max/std-dev = 0.062/0.090/0.165/0.024
10.129.2.18 : multicast, xmt/rcv/%loss = 20/19/5% (seq>=2 0%), min/avg/max/std-dev = 0.100/0.149/0.510/0.092
$ omping -m 239.255.254.24  -c 100 10.129.2.18 10.129.2.17
-- snip --
10.129.2.18 :   unicast, xmt/rcv/%loss = 100/100/0%, min/avg/max/std-dev = 0.033/0.122/1.535/0.144
10.129.2.18 : multicast, xmt/rcv/%loss = 100/100/0%, min/avg/max/std-dev = 0.101/0.181/2.123/0.244


## SDN cluster: two pods in the same node
[weliang@weliang verification-tests]$ oc get pod -o wide
NAME             READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
mcast-rc-4djv8   1/1     Running   0          14s   10.128.2.18   ip-10-0-138-7.us-east-2.compute.internal     <none>           <none>
mcast-rc-7fg2j   1/1     Running   0          14s   10.129.2.10   ip-10-0-175-27.us-east-2.compute.internal    <none>           <none>
mcast-rc-bcq64   1/1     Running   0          14s   10.131.0.19   ip-10-0-216-133.us-east-2.compute.internal   <none>           <none>
mcast-rc-h8n7h   1/1     Running   0          14s   10.128.2.17   ip-10-0-138-7.us-east-2.compute.internal     <none>           <none>
mcast-rc-hftfz   1/1     Running   0          14s   10.129.2.9    ip-10-0-175-27.us-east-2.compute.internal    <none>           <none>
mcast-rc-xlwrm   1/1     Running   0          14s   10.131.0.18   ip-10-0-216-133.us-east-2.compute.internal   <none>           <none>
[weliang@weliang verification-tests]$ oc rsh mcast-rc-4djv8
$ omping -m 238.255.254.24  -c 100 10.128.2.18 10.128.2.17   
-- snip --   
10.128.2.17 :   unicast, xmt/rcv/%loss = 100/100/0%, min/avg/max/std-dev = 0.025/0.083/0.249/0.024
10.128.2.17 : multicast, xmt/rcv/%loss = 100/100/0%, min/avg/max/std-dev = 0.183/0.294/1.624/0.143
[weliang@weliang verification-tests]$ oc rsh mcast-rc-h8n7h
$ omping -m 238.255.254.24  -c 100 10.128.2.18 10.128.2.17  
-- snip --
10.128.2.18 :   unicast, xmt/rcv/%loss = 100/100/0%, min/avg/max/std-dev = 0.027/0.445/32.084/3.211
10.128.2.18 : multicast, xmt/rcv/%loss = 100/100/0%, min/avg/max/std-dev = 0.202/0.802/32.291/3.364
$ 

Expected results:
Should no any multicast packets lose in OVN cluster

Additional info:
The issue of pod losing one multicast packets does not happen in V4.5.0-rc.1 and v4.5.0-0.nightly-2020-06-03-105031 where  https://bugzilla.redhat.com/show_bug.cgi?id=1843695 is reported.

Comment 5 Weibin Liang 2020-09-22 15:27:29 UTC

==== 4.5.0-rc.1
[weliang@weliang FILE]$ oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-rc.1   True        False         9m22s   Cluster version is 4.5.0-rc.1
[weliang@weliang FILE]$ oc exec ovnkube-master-8bjkq -- rpm -qa | grep ovn
Defaulting container name to northd.
Use 'oc describe pod/ovnkube-master-8bjkq -n openshift-ovn-kubernetes' to see all of the containers in this pod.
ovn2.13-2.13.0-32.el7fdp.x86_64
ovn2.13-host-2.13.0-32.el7fdp.x86_64
ovn2.13-vtep-2.13.0-32.el7fdp.x86_64
ovn2.13-central-2.13.0-32.el7fdp.x86_64
[weliang@weliang FILE]$ 


==== 4.6.0-0.nightly-2020-09-22-073212
[weliang@weliang FILE]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-09-22-073212   True        False         134m    Cluster version is 4.6.0-0.nightly-2020-09-22-073212
[weliang@weliang FILE]$ oc exec ovnkube-master-jd9q7 -- rpm -qa | grep ovn
Defaulting container name to northd.
Use 'oc describe pod/ovnkube-master-jd9q7 -n openshift-ovn-kubernetes' to see all of the containers in this pod.
ovn2.13-20.06.2-3.el8fdp.x86_64
ovn2.13-host-20.06.2-3.el8fdp.x86_64
ovn2.13-central-20.06.2-3.el8fdp.x86_64
ovn2.13-vtep-20.06.2-3.el8fdp.x86_64
[weliang@weliang FILE]$

Comment 7 Weibin Liang 2021-03-10 21:30:36 UTC

Change to priority to high because this bug block QE OVN multicast automation testing.

Comment 8 Dumitru Ceara 2021-04-02 13:02:42 UTC

I'm not sure this is a bug.  It seems to me that omping on P1 immediately
starts sending multicast traffic after the remote P2 has sent the IGMP
Join report.

It can happen that the IP multicast packet from P1 reaches OVN before the
report from P2 has been processed in which case the packet will be
dropped because the IGMP group record doesn't include P2 yet.

On a 4.8.0-0.ci-2021-04-02-081203 cluster, when switching to iperf
instead of omping, start an IP multicast listener for 224.3.3.3 on pod
P1 and a multicast sender on pod P2:

1000650000@mcast-rc-cwpgz:/$ iperf -s -B 224.3.3.3 -u -T 2 -t 10 -i 5                                                                                                                                                                         
------------------------------------------------------------
Server listening on UDP port 5001
Binding to local address 224.3.3.3
Joining multicast group  224.3.3.3
Receiving 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 224.3.3.3 port 5001 connected with 10.128.6.8 port 59161
[ ID] Interval       Transfer     Bandwidth        Jitter   Lost/Total Datagrams
[  3]  0.0- 5.0 sec   640 KBytes  1.05 Mbits/sec   0.020 ms    0/  446 (0%)
[  3]  0.0- 5.0 sec   642 KBytes  1.05 Mbits/sec   0.018 ms    0/  447 (0%)

1000650000@mcast-rc-gkn4h:/$ iperf -c 224.3.3.3 -u -T 2 -t 5 -i 5                                                                                                                                                                             
------------------------------------------------------------
Client connecting to 224.3.3.3, UDP port 5001
Sending 1470 byte datagrams, IPG target: 11215.21 us (kalman adjust)
Setting multicast TTL to 2
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 10.128.6.8 port 59161 connected with 224.3.3.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec   642 KBytes  1.05 Mbits/sec
[  3]  0.0- 5.0 sec   642 KBytes  1.05 Mbits/sec
[  3] Sent 447 datagrams

Number of sent packets equals number of received packets.

On the other hand, on openshift-sdn, AFAIU IP multicast traffic is
forwarded like any broadcast traffic to all pods in the namespace so
that would explain why there are no drops with omping on openshift-sdn.

What do you think?

Thanks,
Dumitru

Comment 9 Weibin Liang 2021-04-06 12:37:11 UTC

Tested in 4.8.0-0.nightly-2021-04-03-092337 using iperf, start multicast client first then start multicast source, there are no multicast traffic lost.

Could this bug as this issue is shown when use omping but not iperf.

Note You need to log in before you can comment on or make changes to this bug.