Hide Forgot
Description of problem: I've followed the openshift document - [0] - explaining how to configure and verify multicast between pods, but it fails on the verification step. [0] - https://docs.openshift.com/container-platform/4.6/networking/ovn_kubernetes_network_provider/enabling-multicast.html Version-Release number of selected component (if applicable): Client Version: 4.7.0-0.nightly-2020-12-09-112139 Server Version: 4.7.0-0.nightly-2020-12-09-112139 Kubernetes Version: v1.19.2+ad738ba How reproducible: always Steps to Reproduce: 1. configure multicast within the project: ``` oc annotate namespace default k8s.ovn.org/multicast-enabled=true ``` 2. create the multicast listener pod: ``` cat <<EOF| oc create -f - apiVersion: v1 kind: Pod metadata: name: mlistener labels: app: multicast-verify spec: containers: - name: mlistener image: registry.access.redhat.com/ubi8 command: ["/bin/sh", "-c"] args: ["dnf -y install socat hostname && sleep inf"] ports: - containerPort: 30102 name: mlistener protocol: UDP EOF ``` 3. create the multicast client pod: ``` $ cat <<EOF| oc create -f - apiVersion: v1 kind: Pod metadata: name: msender labels: app: multicast-verify spec: containers: - name: msender image: registry.access.redhat.com/ubi8 command: ["/bin/sh", "-c"] args: ["dnf -y install socat && sleep inf"] EOF ``` 4. start the multicast listener: ``` POD_IP=$(oc get pods mlistener -o jsonpath='{.status.podIP}') oc exec mlistener -i -t -- \ socat UDP4-RECVFROM:30102,ip-add-membership=224.1.0.1:$POD_IP,fork EXEC:hostname ``` 5. send a multicast msg: ``` CIDR=$(oc get Network.config.openshift.io cluster \ -o jsonpath='{.status.clusterNetwork[0].cidr}') oc exec msender -i -t -- \ /bin/bash -c "echo | socat STDIO UDP4-DATAGRAM:224.1.0.1:30102,range=$CIDR,ip-multicast-ttl=64" ``` Actual results: multicast client does not receive reply: [root@zeus15 ~]# oc exec msender -i -t -- /bin/bash -c "echo | socat STDIO UDP4-DATAGRAM:224.1.0.1:30102,range=$CIDR,ip-multicast-ttl=255" [root@zeus15 ~]# Additional info: The cluster has indeed multicast enabled: $ oc get project default -ojsonpath={.metadata.annotations} | python -m json.tool { "k8s.ovn.org/multicast-enabled": "true", ... } # traffic works $ oc exec mlistener -i -t -- socat UDP4-RECVFROM:30102,fork EXEC:hostname # listener # client $ oc exec msender -i -t -- /bin/bash -c "echo | socat STDIO UDP4-DATAGRAM:10.128.2.247:30102" mlistener
Using above steps, can not replicate the problem, which cloud platform did you use AWS/GCP/Azure ? [weliang@weliang verification-tests]$ oc exec msender -i -t -- \ > /bin/bash -c "echo | socat STDIO UDP4-DATAGRAM:224.1.0.1:30102,range=$CIDR,ip-multicast-ttl=64" mlistener [weliang@weliang verification-tests]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-12-14-165231 True False 31m Cluster version is 4.7.0-0.nightly-2020-12-14-165231 [weliang@weliang verification-tests]$ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES mlistener 1/1 Running 0 2m19s 10.128.2.44 ip-10-0-64-25.us-east-2.compute.internal <none> <none> msender 1/1 Running 0 2m13s 10.128.2.45 ip-10-0-64-25.us-east-2.compute.internal <none> <none> [weliang@weliang verification-tests]$ oc project Using project "default" on server "https://api.weliang-175.qe.devcluster.openshift.com:6443". [weliang@weliang verification-tests]$
(In reply to Weibin Liang from comment #1) > Using above steps, can not replicate the problem, which cloud platform did > you use AWS/GCP/Azure ? > > [weliang@weliang verification-tests]$ oc exec msender -i -t -- \ > > /bin/bash -c "echo | socat STDIO UDP4-DATAGRAM:224.1.0.1:30102,range=$CIDR,ip-multicast-ttl=64" > mlistener > > [weliang@weliang verification-tests]$ oc get clusterversion > NAME VERSION AVAILABLE PROGRESSING > SINCE STATUS > version 4.7.0-0.nightly-2020-12-14-165231 True False 31m > Cluster version is 4.7.0-0.nightly-2020-12-14-165231 > [weliang@weliang verification-tests]$ oc get pod -o wide > NAME READY STATUS RESTARTS AGE IP NODE > NOMINATED NODE READINESS GATES > mlistener 1/1 Running 0 2m19s 10.128.2.44 > ip-10-0-64-25.us-east-2.compute.internal <none> <none> > msender 1/1 Running 0 2m13s 10.128.2.45 > ip-10-0-64-25.us-east-2.compute.internal <none> <none> > [weliang@weliang verification-tests]$ oc project > Using project "default" on server > "https://api.weliang-175.qe.devcluster.openshift.com:6443". > [weliang@weliang verification-tests]$ We're using Openshift on baremetal, installed via https://github.com/openshift-metal3/dev-scripts . I'll (try to) reproduce this again, and upload a traffic capture.
Note that we have a multicast CI test in origin/test/extended/networking/multicast.go, but it only runs under openshift-sdn because we never fixed it up for the slightly-different ovn-kubernetes multicast API. So probably we should fix that as part of fixing this.
@mduarted were you able to reproduce this on latest 4.7? This may be related to some of the fixes that were made for reject ACLs. In particular, https://github.com/ovn-org/ovn-kubernetes/pull/1705/files may fix the issue.
(In reply to Aniket Bhat from comment #4) > @mduarted were you able to reproduce this on latest 4.7? This may > be related to some of the fixes that were made for reject ACLs. In > particular, https://github.com/ovn-org/ovn-kubernetes/pull/1705/files may > fix the issue. Interesting. I'll give this another try, and update ASAP.
Miguel: Any update? Should we close this as a duplicate? Thanks
(In reply to Miguel Duarte Barroso from comment #5) > (In reply to Aniket Bhat from comment #4) > > @mduarted were you able to reproduce this on latest 4.7? This may > > be related to some of the fixes that were made for reject ACLs. In > > particular, https://github.com/ovn-org/ovn-kubernetes/pull/1705/files may > > fix the issue. > > Interesting. I'll give this another try, and update ASAP. Which version should I check ? I can confirm this behavior still persists on: ``` [root@zeus15 ~]# oc version Client Version: 4.7.0-0.nightly-2020-12-18-031435 Server Version: 4.7.0-0.nightly-2020-12-18-031435 Kubernetes Version: v1.20.0+87544c5 ``` Sorry for taking so long to reply; I'll upload the capture ASAP.
Same steps still work fine for me in 4.7.0-0.nightly-2021-01-13-054018 [weliang@weliang ~]$ CIDR=$(oc get Network.config.openshift.io cluster -o jsonpath='{.status.clusterNetwork[0].cidr}') [weliang@weliang ~]$ oc exec msender -i -t -- \ > /bin/bash -c "echo | socat STDIO UDP4-DATAGRAM:224.1.0.1:30102,range=$CIDR,ip-multicast-ttl=64" mlistener
Created attachment 1747372 [details] mcast sender traffic capture Retried just now w/ ``` $ oc version Client Version: 4.7.0-202101092121.p0-eeb9d6d Server Version: 4.7.0-fc.1 Kubernetes Version: v1.20.0+87544c5 ``` And the issue still reproduces. Attached a traffic capture from the mcast sender (the mcast receiver does *not* receive any data).
Miguel, Could you try another way to test multicast in your cluster when you have time? Here is the steps: 1. oc new-project multicast-test 2. oc annotate namespace multicast-test k8s.ovn.org/multicast-enabled="true" 3. oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/Features/Pod/multicast-pod.yaml [weliang@weliang ~]$ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES mcast-rc-6wjm9 1/1 Running 0 3m37s 10.129.2.26 weliang141-m5h7f-compute-1 <none> <none> mcast-rc-858rt 1/1 Running 0 3m37s 10.129.2.27 weliang141-m5h7f-compute-1 <none> <none> mcast-rc-8phmb 1/1 Running 0 3m37s 10.129.2.31 weliang141-m5h7f-compute-1 <none> <none> mcast-rc-g28zn 1/1 Running 0 3m37s 10.129.2.29 weliang141-m5h7f-compute-1 <none> <none> mcast-rc-jwhjj 1/1 Running 0 3m37s 10.129.2.30 weliang141-m5h7f-compute-1 <none> <none> mcast-rc-k6nhn 1/1 Running 0 3m37s 10.129.2.28 weliang141-m5h7f-compute-1 <none> <none> [weliang@weliang ~]$ 4.Run "omping -m 239.255.254.24 -c 5 10.129.2.26 10.129.2.27" on two pod at same time [weliang@weliang ~]$ oc rsh mcast-rc-6wjm9 $ omping -m 239.255.254.24 -c 5 10.129.2.26 10.129.2.27 10.129.2.27 : waiting for response msg 10.129.2.27 : waiting for response msg 10.129.2.27 : joined (S,G) = (*, 239.255.254.24), pinging 10.129.2.27 : unicast, seq=1, size=69 bytes, dist=0, time=0.025ms 10.129.2.27 : multicast, seq=1, size=69 bytes, dist=0, time=0.591ms 10.129.2.27 : unicast, seq=2, size=69 bytes, dist=0, time=0.169ms 10.129.2.27 : multicast, seq=2, size=69 bytes, dist=0, time=0.222ms 10.129.2.27 : unicast, seq=3, size=69 bytes, dist=0, time=0.145ms 10.129.2.27 : multicast, seq=3, size=69 bytes, dist=0, time=0.166ms 10.129.2.27 : unicast, seq=4, size=69 bytes, dist=0, time=0.196ms 10.129.2.27 : multicast, seq=4, size=69 bytes, dist=0, time=0.224ms 10.129.2.27 : unicast, seq=5, size=69 bytes, dist=0, time=0.120ms 10.129.2.27 : multicast, seq=5, size=69 bytes, dist=0, time=0.141ms 10.129.2.27 : given amount of query messages was sent 10.129.2.27 : unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 0.025/0.131/0.196/0.066 10.129.2.27 : multicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 0.141/0.269/0.591/0.184 $ $ [weliang@weliang ~]$ oc rsh mcast-rc-858rt $ omping -m 239.255.254.24 -c 5 10.129.2.26 10.129.2.27 10.129.2.26 : waiting for response msg 10.129.2.26 : joined (S,G) = (*, 239.255.254.24), pinging 10.129.2.26 : unicast, seq=1, size=69 bytes, dist=0, time=0.163ms 10.129.2.26 : unicast, seq=2, size=69 bytes, dist=0, time=0.219ms 10.129.2.26 : multicast, seq=2, size=69 bytes, dist=0, time=0.599ms 10.129.2.26 : unicast, seq=3, size=69 bytes, dist=0, time=0.172ms 10.129.2.26 : multicast, seq=3, size=69 bytes, dist=0, time=0.203ms 10.129.2.26 : unicast, seq=4, size=69 bytes, dist=0, time=0.247ms 10.129.2.26 : multicast, seq=4, size=69 bytes, dist=0, time=0.280ms 10.129.2.26 : unicast, seq=5, size=69 bytes, dist=0, time=0.200ms 10.129.2.26 : multicast, seq=5, size=69 bytes, dist=0, time=0.248ms 10.129.2.26 : given amount of query messages was sent 10.129.2.26 : unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 0.163/0.200/0.247/0.034 10.129.2.26 : multicast, xmt/rcv/%loss = 5/4/19% (seq>=2 0%), min/avg/max/std-dev = 0.203/0.333/0.599/0.180 $ Test results: First pod send and receive 5 multicast packets Second pod send 5 multicast packets and receive 4 multicast packets due to https://bugzilla.redhat.com/show_bug.cgi?id=1879241
(In reply to Weibin Liang from comment #10) > Miguel, Could you try another way to test multicast in your cluster when you > have time? Here is the steps: Sure. > > 1. oc new-project multicast-test > 2. oc annotate namespace multicast-test k8s.ovn.org/multicast-enabled="true" > 3. oc create -f > https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/ > Features/Pod/multicast-pod.yaml > > [weliang@weliang ~]$ oc get pod -o wide > NAME READY STATUS RESTARTS AGE IP NODE > NOMINATED NODE READINESS GATES > mcast-rc-6wjm9 1/1 Running 0 3m37s 10.129.2.26 > weliang141-m5h7f-compute-1 <none> <none> > mcast-rc-858rt 1/1 Running 0 3m37s 10.129.2.27 > weliang141-m5h7f-compute-1 <none> <none> > mcast-rc-8phmb 1/1 Running 0 3m37s 10.129.2.31 > weliang141-m5h7f-compute-1 <none> <none> > mcast-rc-g28zn 1/1 Running 0 3m37s 10.129.2.29 > weliang141-m5h7f-compute-1 <none> <none> > mcast-rc-jwhjj 1/1 Running 0 3m37s 10.129.2.30 > weliang141-m5h7f-compute-1 <none> <none> > mcast-rc-k6nhn 1/1 Running 0 3m37s 10.129.2.28 > weliang141-m5h7f-compute-1 <none> <none> > [weliang@weliang ~]$ > > 4.Run "omping -m 239.255.254.24 -c 5 10.129.2.26 10.129.2.27" on two pod at > same time > [weliang@weliang ~]$ oc rsh mcast-rc-6wjm9 > $ omping -m 239.255.254.24 -c 5 10.129.2.26 10.129.2.27 > 10.129.2.27 : waiting for response msg > 10.129.2.27 : waiting for response msg > 10.129.2.27 : joined (S,G) = (*, 239.255.254.24), pinging > 10.129.2.27 : unicast, seq=1, size=69 bytes, dist=0, time=0.025ms > 10.129.2.27 : multicast, seq=1, size=69 bytes, dist=0, time=0.591ms > 10.129.2.27 : unicast, seq=2, size=69 bytes, dist=0, time=0.169ms > 10.129.2.27 : multicast, seq=2, size=69 bytes, dist=0, time=0.222ms > 10.129.2.27 : unicast, seq=3, size=69 bytes, dist=0, time=0.145ms > 10.129.2.27 : multicast, seq=3, size=69 bytes, dist=0, time=0.166ms > 10.129.2.27 : unicast, seq=4, size=69 bytes, dist=0, time=0.196ms > 10.129.2.27 : multicast, seq=4, size=69 bytes, dist=0, time=0.224ms > 10.129.2.27 : unicast, seq=5, size=69 bytes, dist=0, time=0.120ms > 10.129.2.27 : multicast, seq=5, size=69 bytes, dist=0, time=0.141ms > 10.129.2.27 : given amount of query messages was sent > > 10.129.2.27 : unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = > 0.025/0.131/0.196/0.066 > 10.129.2.27 : multicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = > 0.141/0.269/0.591/0.184 > $ > > $ [weliang@weliang ~]$ oc rsh mcast-rc-858rt > $ omping -m 239.255.254.24 -c 5 10.129.2.26 10.129.2.27 > 10.129.2.26 : waiting for response msg > 10.129.2.26 : joined (S,G) = (*, 239.255.254.24), pinging > 10.129.2.26 : unicast, seq=1, size=69 bytes, dist=0, time=0.163ms > 10.129.2.26 : unicast, seq=2, size=69 bytes, dist=0, time=0.219ms > 10.129.2.26 : multicast, seq=2, size=69 bytes, dist=0, time=0.599ms > 10.129.2.26 : unicast, seq=3, size=69 bytes, dist=0, time=0.172ms > 10.129.2.26 : multicast, seq=3, size=69 bytes, dist=0, time=0.203ms > 10.129.2.26 : unicast, seq=4, size=69 bytes, dist=0, time=0.247ms > 10.129.2.26 : multicast, seq=4, size=69 bytes, dist=0, time=0.280ms > 10.129.2.26 : unicast, seq=5, size=69 bytes, dist=0, time=0.200ms > 10.129.2.26 : multicast, seq=5, size=69 bytes, dist=0, time=0.248ms > 10.129.2.26 : given amount of query messages was sent > > 10.129.2.26 : unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = > 0.163/0.200/0.247/0.034 > 10.129.2.26 : multicast, xmt/rcv/%loss = 5/4/19% (seq>=2 0%), > min/avg/max/std-dev = 0.203/0.333/0.599/0.180 > $ > > Test results: > First pod send and receive 5 multicast packets > Second pod send 5 multicast packets and receive 4 multicast packets due to > https://bugzilla.redhat.com/show_bug.cgi?id=1879241 Does not work, check my results: [root@zeus15 ~]# oc new-project multicast-test ... [root@zeus15 ~]# oc annotate namespace multicast-test k8s.ovn.org/multicast-enabled="true" namespace/multicast-test annotated [root@zeus15 ~]# oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/Features/Pod/multicast-pod.yaml replicationcontroller/mcast-rc created [root@zeus15 ~]# oc get pod -o wide -w NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES mcast-rc-52dgt 0/1 ContainerCreating 0 11s <none> worker-1.ostest.zeus15.lab.eng.tlv2.redhat.com <none> <none> mcast-rc-7cl85 0/1 ContainerCreating 0 10s <none> worker-2.ostest.zeus15.lab.eng.tlv2.redhat.com <none> <none> mcast-rc-99jc7 0/1 ContainerCreating 0 10s <none> worker-0.ostest.zeus15.lab.eng.tlv2.redhat.com <none> <none> mcast-rc-bb7nd 0/1 ContainerCreating 0 10s <none> worker-2.ostest.zeus15.lab.eng.tlv2.redhat.com <none> <none> mcast-rc-fb2p8 0/1 ContainerCreating 0 10s <none> worker-1.ostest.zeus15.lab.eng.tlv2.redhat.com <none> <none> mcast-rc-pkl59 0/1 ContainerCreating 0 10s <none> worker-3.ostest.zeus15.lab.eng.tlv2.redhat.com <none> <none> mcast-rc-99jc7 1/1 Running 0 17s 10.131.2.195 worker-0.ostest.zeus15.lab.eng.tlv2.redhat.com <none> <none> mcast-rc-pkl59 1/1 Running 0 17s 10.130.3.22 worker-3.ostest.zeus15.lab.eng.tlv2.redhat.com <none> <none> mcast-rc-7cl85 1/1 Running 0 17s 10.129.2.9 worker-2.ostest.zeus15.lab.eng.tlv2.redhat.com <none> <none> mcast-rc-bb7nd 1/1 Running 0 17s 10.129.2.10 worker-2.ostest.zeus15.lab.eng.tlv2.redhat.com <none> <none> mcast-rc-fb2p8 1/1 Running 0 18s 10.128.2.22 worker-1.ostest.zeus15.lab.eng.tlv2.redhat.com <none> <none> mcast-rc-52dgt 1/1 Running 0 19s 10.128.2.21 worker-1.ostest.zeus15.lab.eng.tlv2.redhat.com <none> <none> # 1st pod [root@zeus15 ~]# oc rsh mcast-rc-99jc7 $ omping -m 239.255.254.24 -c 5 10.131.2.195 10.130.3.22 10.130.3.22 : waiting for response msg 10.130.3.22 : waiting for response msg 10.130.3.22 : waiting for response msg 10.130.3.22 : waiting for response msg 10.130.3.22 : waiting for response msg 10.130.3.22 : waiting for response msg 10.130.3.22 : waiting for response msg 10.130.3.22 : waiting for response msg 10.130.3.22 : waiting for response msg 10.130.3.22 : waiting for response msg 10.130.3.22 : waiting for response msg 10.130.3.22 : waiting for response msg 10.130.3.22 : waiting for response msg 10.130.3.22 : joined (S,G) = (*, 239.255.254.24), pinging 10.130.3.22 : unicast, seq=1, size=69 bytes, dist=1, time=0.553ms 10.130.3.22 : unicast, seq=2, size=69 bytes, dist=1, time=0.927ms 10.130.3.22 : unicast, seq=3, size=69 bytes, dist=1, time=0.572ms 10.130.3.22 : unicast, seq=4, size=69 bytes, dist=1, time=0.552ms 10.130.3.22 : unicast, seq=5, size=69 bytes, dist=1, time=0.793ms 10.130.3.22 : given amount of query messages was sent 10.130.3.22 : unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 0.552/0.679/0.927/0.172 10.130.3.22 : multicast, xmt/rcv/%loss = 5/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000 # 2nd pod [root@zeus15 ~]# oc rsh mcast-rc-pkl59 $ omping -m 239.255.254.24 -c 5 10.131.2.195 10.130.3.22 10.131.2.195 : waiting for response msg 10.131.2.195 : joined (S,G) = (*, 239.255.254.24), pinging 10.131.2.195 : unicast, seq=1, size=69 bytes, dist=1, time=0.506ms 10.131.2.195 : unicast, seq=2, size=69 bytes, dist=1, time=0.617ms 10.131.2.195 : unicast, seq=3, size=69 bytes, dist=1, time=0.715ms 10.131.2.195 : unicast, seq=4, size=69 bytes, dist=1, time=0.804ms 10.131.2.195 : unicast, seq=5, size=69 bytes, dist=1, time=0.549ms 10.131.2.195 : given amount of query messages was sent 10.131.2.195 : unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 0.506/0.638/0.804/0.122 10.131.2.195 : multicast, xmt/rcv/%loss = 5/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000
Miguel, QA can not reproduce this issue in QA cluster, could you help to verify this bug in your cluster? Thanks!
Working on (In reply to Weibin Liang from comment #13) > Miguel, QA can not reproduce this issue in QA cluster, could you help to > verify this bug in your cluster? Thanks! Good news, it is working on Client Version: 4.7.0-0.nightly-2021-01-18-053817 Server Version: 4.7.0-0.nightly-2021-01-18-053817 Kubernetes Version: v1.20.0+d9c52cc I followed the steps on https://docs.openshift.com/container-platform/4.4/networking/ovn_kubernetes_network_provider/enabling-multicast.html, and all works as expected. I'll set the bug to verified.
(In reply to Miguel Duarte Barroso from comment #14) > Working on (In reply to Weibin Liang from comment #13) > > Miguel, QA can not reproduce this issue in QA cluster, could you help to > > verify this bug in your cluster? Thanks! > > Good news, it is working on > > Client Version: 4.7.0-0.nightly-2021-01-18-053817 > Server Version: 4.7.0-0.nightly-2021-01-18-053817 > Kubernetes Version: v1.20.0+d9c52cc > > > I followed the steps on > https://docs.openshift.com/container-platform/4.4/networking/ > ovn_kubernetes_network_provider/enabling-multicast.html, and all works as > expected. > > I'll set the bug to verified. Thanks,Miguel!
I followed the steps in the manual (https://docs.openshift.com/container-platform/4.6/networking/ovn_kubernetes_network_provider/enabling-multicast.html) and also the steps in comment #10. Both don't work. Client Version: 4.7.0-0.nightly-2021-02-04-031352 Server Version: 4.7.0-0.nightly-2021-02-13-071408 Kubernetes Version: v1.20.0+bd9e442 Side note: On 4.6 cluster the manual steps are working for me if the pods are on the same node. If the pods are on different nodes, it doesn't work. Client Version: 4.7.0-0.nightly-2021-02-04-031352 Server Version: 4.7.0-0.nightly-2021-02-13-071408 Kubernetes Version: v1.20.0+bd9e442
(In reply to Alona Kaplan from comment #16) > I followed the steps in the manual > (https://docs.openshift.com/container-platform/4.6/networking/ > ovn_kubernetes_network_provider/enabling-multicast.html) and also the steps > in comment #10. > Both don't work. > > Client Version: 4.7.0-0.nightly-2021-02-04-031352 > Server Version: 4.7.0-0.nightly-2021-02-13-071408 > Kubernetes Version: v1.20.0+bd9e442 > > > Side note: > On 4.6 cluster the manual steps are working for me if the pods are on the > same node. > If the pods are on different nodes, it doesn't work. > > Client Version: 4.7.0-0.nightly-2021-02-04-031352 > Server Version: 4.7.0-0.nightly-2021-02-13-071408 > Kubernetes Version: v1.20.0+bd9e442 Bug 1843695 - [OVN] Pods can not receive multicast from other pods which are in the same namespace but different node. This bug is tested and verified in 4.6.0-0.nightly-2020-09-12-230035
So, I had a typo in my previous comment. In version - Client Version: 4.6.16 Server Version: 4.6.0-0.nightly-2021-02-13-034601 Kubernetes Version: v1.19.0+f173eb4 multicast between pods on the same namespace and node works, on different nodes doesn't work. But on version - Client Version: 4.7.0-0.nightly-2021-02-04-031352 Server Version: 4.7.0-0.nightly-2021-02-13-071408 Kubernetes Version: v1.20.0+bd9e442 Mulitcast between pods on the same namespace is not working at all. Should I re-open this bug?
Just tested in IPI-on-AWS cluster and get same results as comment #10. [weliang@weliang bin]$ oc version Client Version: 4.7.0-0.nightly-2021-02-13-071408 Server Version: 4.7.0-0.nightly-2021-02-13-071408 Kubernetes Version: v1.20.0+bd9e442 Not sure which cloud provider cluster you tested, you can reopen this bug if you can reproduce the problem easily.
Re-opening the bug since multicast between pods on the same namespace doesn't work on dual stack cluster.
Alona mentioned that multicast passed in baremetal dual stack v4.6 OVN cluster, I got same results multicast passed in baremetal dual stack v4.6 OVN cluster. It's a regression bug.
I have taken a look at this on baremetal setup running 4.7, with the following versions: [root@ocp-edge50 ~]# oc version Client Version: 4.7.0-0.nightly-2021-02-22-210958 Server Version: 4.7.0-0.nightly-2021-02-22-210958 Kubernetes Version: v1.20.0+5fbfd19 This also fails on my local KIND setup running on master, with a special KIND version for dual-stack support: [root@nfvsdn-14 go-controller]# kind --version kind version 0.10.0-alpha+cf7e3f36627031 started KIND setup like this: KIND_IPV4_SUPPORT=true KIND_IPV6_SUPPORT=true ./kind.sh -gm shared -me Here is the link to the slack chat with lots of debug info: https://coreos.slack.com/archives/C01G7T6SYSD/p1614271211057000 ovn-trace reveals that the packet is being dropped at the ovn_cluster_router: <snippet> ingress(dp="ovn_cluster_router", inport="rtos-ovn-worker") ---------------------------------------------------------- 0. lr_in_admission (ovn-northd.c:9244): eth.mcast && inport == "rtos-ovn-worker", priority 50, uuid 8a1caba2 xreg0[0..47] = 0a:58:0a:f4:02:01; next; 1. lr_in_lookup_neighbor (ovn-northd.c:9338): 1, priority 0, uuid 7ac32f9c reg9[2] = 1; next; 2. lr_in_learn_neighbor (ovn-northd.c:9347): reg9[2] == 1, priority 100, uuid 2513db70 next; 3. lr_in_ip_input (ovn-northd.c:10445): ip4.mcast || ip6.mcast, priority 82, uuid 5acaff77 next; 10. lr_in_ip_routing (ovn-northd.c:9707): ip4.mcast || ip6.mcast, priority 450, uuid 8240fbc7 drop; ovn-sbctl list igmp_group is empty It appears the root problem is that we are missing igmp_group in SB DB. The omping tool sends an igmp report, here is tcpdump snippet from the mc pod: # tcpdump -i eth0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 18:05:53.467434 IP mcast-rc-6hdtd.4321 > 10.244.2.4.4321: UDP, length 45 18:05:53.468452 ARP, Request who-has 10.244.2.1 tell mcast-rc-6hdtd, length 28 18:05:53.469072 ARP, Reply 10.244.2.1 is-at 0a:58:0a:f4:02:01 (oui Unknown), length 28 18:05:53.469074 IP 10.244.2.4 > mcast-rc-6hdtd: ICMP 10.244.2.4 udp port 4321 unreachable, length 81 18:05:53.469094 IP mcast-rc-6hdtd.54017 > kube-dns.kube-system.svc.cluster.local.53: 18909+ PTR? 4.2.244.10.in-addr.arpa. (41) 18:05:53.470738 IP mcast-rc-6hdtd > 224.0.0.22: igmp v3 report, 1 group record(s) I can see the igmp flow packet count incrementing when I run the omping test: before running test (well, actually, I had ran it several times, this time, looking at counters. Notice pkt count is 40 here) cookie=0x8e1b1028, duration=73318.492s, table=29, n_packets=40, n_bytes=2160, idle_age=297, hard_age=65534, priority=100,igmp,metadata=0x6 actions=controller(userdata=00.00.00.10.00.00.00.00) run omping tool, pkt count increases from 40->44 cookie=0x8e1b1028, duration=73562.310s, table=29, n_packets=44, n_bytes=2376, idle_age=21, hard_age=65534, priority=100,igmp,metadata=0x6 actions=controller(userdata=00.00.00.10.00.00.00.00) sh-5.0# I do see these logs in ovn-controller on the worker node, related to IGMP, not sure of the meaning. These logs look like part of startup sequence, before mc pods are created: 2021-02-24T21:49:24.203Z|00017|pinctrl|WARN|IGMP Querier enabled without a valid IPv4 or IPv6 address 2021-02-24T21:49:24.203Z|00018|pinctrl|WARN|IGMP Querier enabled with invalid ETH src address 2021-02-24T21:49:24.253Z|00019|pinctrl|WARN|IGMP Querier enabled without a valid IPv4 or IPv6 address 2021-02-24T21:49:24.253Z|00020|pinctrl|WARN|IGMP Querier enabled with invalid ETH src address It looks like this is a bug in OVN. From the slack chat, Dumitru indicated: IGMP group in SB DB is handled in the sw datapath, populate SB IGMP_Group and the routers can generate logical flows by looking at the SB IGMP entries for the attached logical switches. IGMP pkt gets punted to ovn-controller, writes SB IGMP_Group on the LS datapath. Cloning this bug over to OVN for further analysis.
Note: There is one fix needed in ovn-kubernetes. When I first started debugging this issue, ovn-trace would not work, complained like below: [root@ocp-edge50 ~]# oc exec -n openshift-ovn-kubernetes pod/ovnkube-master-ck4wk -- ovn-trace worker-0-1.ocp-edge-cluster-0.qe.lab.redhat.com 'inport==mcast-rc-42w2s && eth.src==0a:58:0a:83:00:ad && ip.src==10.131.0.173 && eth.dst==01:00:5e:7f:fe:18 && ip.dst==239.255.254.24' Defaulting container name to northd. Use 'oc describe pod/ovnkube-master-ck4wk -n openshift-ovn-kubernetes' to see all of the containers in this pod. 2021-02-24T14:21:30Z|00001|ovntrace|WARN|reg0[7] == 1 && (outport == @a14390016336330567584 && (ip4.src == $a8720055785936351439 && ip4.mcast || ip6.src == $a8720057984959607861 && (ip6.dst[120..127] == 0xff && ip6.dst[116] == 1))): parsing expression failed (&& and || must be parenthesized when used together.) I will push a patch to fix this in ovn-k8s shortly.
as mentioned in comment 3, we have an e2e test for multicast, and we ought to fix it so it gets run under ovn-kubernetes
Pushed this PR to address the ACL syntax error when using ovn-trace: https://github.com/ovn-org/ovn-kubernetes/pull/2076
OK, I figured this out today. First, small goof on my part. The omping tool needs to be run on both servers (mc pods), I was just running it on one. That is why we see the ICMP on port 4321 fail... my mistake. So, once I realized that, I ran omping on both mc pods. I can see that that multicast traffic now works with the above PR, which fixes the ACL rules for dual-stack setup.
[root@nfvsdn-14 pod-specs]# kubectl exec -it mcast-rc-65m6t -n multicast-test sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. # omping -c 5 10.244.2.5 10.244.0.7 10.244.0.7 : waiting for response msg 10.244.0.7 : waiting for response msg 10.244.0.7 : waiting for response msg 10.244.0.7 : waiting for response msg 10.244.0.7 : joined (S,G) = (*, 232.43.211.234), pinging 10.244.0.7 : unicast, seq=1, size=69 bytes, dist=1, time=0.208ms 10.244.0.7 : multicast, seq=1, size=69 bytes, dist=1, time=2.088ms 10.244.0.7 : unicast, seq=2, size=69 bytes, dist=1, time=0.558ms 10.244.0.7 : multicast, seq=2, size=69 bytes, dist=1, time=0.702ms 10.244.0.7 : unicast, seq=3, size=69 bytes, dist=1, time=0.361ms 10.244.0.7 : multicast, seq=3, size=69 bytes, dist=1, time=0.482ms 10.244.0.7 : unicast, seq=4, size=69 bytes, dist=1, time=0.364ms 10.244.0.7 : multicast, seq=4, size=69 bytes, dist=1, time=0.525ms 10.244.0.7 : unicast, seq=5, size=69 bytes, dist=1, time=0.388ms 10.244.0.7 : multicast, seq=5, size=69 bytes, dist=1, time=0.530ms 10.244.0.7 : given amount of query messages was sent 10.244.0.7 : unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 0.208/0.376/0.558/0.124 10.244.0.7 : multicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 0.482/0.865/2.088/0.689 # [root@nfvsdn-14 ~]# kubectl exec -it mcast-rc-btc4n -n multicast-test sh kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. # omping -c 5 10.244.2.5 10.244.0.7 10.244.2.5 : waiting for response msg 10.244.2.5 : joined (S,G) = (*, 232.43.211.234), pinging 10.244.2.5 : unicast, seq=1, size=69 bytes, dist=1, time=0.251ms 10.244.2.5 : unicast, seq=2, size=69 bytes, dist=1, time=0.461ms 10.244.2.5 : multicast, seq=2, size=69 bytes, dist=1, time=1.983ms 10.244.2.5 : unicast, seq=3, size=69 bytes, dist=1, time=0.487ms 10.244.2.5 : multicast, seq=3, size=69 bytes, dist=1, time=0.662ms 10.244.2.5 : unicast, seq=4, size=69 bytes, dist=1, time=0.497ms 10.244.2.5 : multicast, seq=4, size=69 bytes, dist=1, time=0.648ms 10.244.2.5 : unicast, seq=5, size=69 bytes, dist=1, time=0.543ms 10.244.2.5 : multicast, seq=5, size=69 bytes, dist=1, time=0.717ms 10.244.2.5 : given amount of query messages was sent 10.244.2.5 : unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = 0.251/0.448/0.543/0.114 10.244.2.5 : multicast, xmt/rcv/%loss = 5/4/19% (seq>=2 0%), min/avg/max/std-dev = 0.648/1.002/1.983/0.654 # Notice that 1 mcast packet was lost on the 2nd server. There is a separate BZ for that, will work on that next.
(In reply to Victor Pickard from comment #35) > [root@nfvsdn-14 pod-specs]# kubectl exec -it mcast-rc-65m6t -n > multicast-test sh > kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future > version. Use kubectl exec [POD] -- [COMMAND] instead. > # omping -c 5 10.244.2.5 10.244.0.7 > 10.244.0.7 : waiting for response msg > 10.244.0.7 : waiting for response msg > 10.244.0.7 : waiting for response msg > 10.244.0.7 : waiting for response msg > 10.244.0.7 : joined (S,G) = (*, 232.43.211.234), pinging > 10.244.0.7 : unicast, seq=1, size=69 bytes, dist=1, time=0.208ms > 10.244.0.7 : multicast, seq=1, size=69 bytes, dist=1, time=2.088ms > 10.244.0.7 : unicast, seq=2, size=69 bytes, dist=1, time=0.558ms > 10.244.0.7 : multicast, seq=2, size=69 bytes, dist=1, time=0.702ms > 10.244.0.7 : unicast, seq=3, size=69 bytes, dist=1, time=0.361ms > 10.244.0.7 : multicast, seq=3, size=69 bytes, dist=1, time=0.482ms > 10.244.0.7 : unicast, seq=4, size=69 bytes, dist=1, time=0.364ms > 10.244.0.7 : multicast, seq=4, size=69 bytes, dist=1, time=0.525ms > 10.244.0.7 : unicast, seq=5, size=69 bytes, dist=1, time=0.388ms > 10.244.0.7 : multicast, seq=5, size=69 bytes, dist=1, time=0.530ms > 10.244.0.7 : given amount of query messages was sent > > 10.244.0.7 : unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = > 0.208/0.376/0.558/0.124 > 10.244.0.7 : multicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = > 0.482/0.865/2.088/0.689 > # > > > [root@nfvsdn-14 ~]# kubectl exec -it mcast-rc-btc4n -n multicast-test sh > kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future > version. Use kubectl exec [POD] -- [COMMAND] instead. > # omping -c 5 10.244.2.5 10.244.0.7 > 10.244.2.5 : waiting for response msg > 10.244.2.5 : joined (S,G) = (*, 232.43.211.234), pinging > 10.244.2.5 : unicast, seq=1, size=69 bytes, dist=1, time=0.251ms > 10.244.2.5 : unicast, seq=2, size=69 bytes, dist=1, time=0.461ms > 10.244.2.5 : multicast, seq=2, size=69 bytes, dist=1, time=1.983ms > 10.244.2.5 : unicast, seq=3, size=69 bytes, dist=1, time=0.487ms > 10.244.2.5 : multicast, seq=3, size=69 bytes, dist=1, time=0.662ms > 10.244.2.5 : unicast, seq=4, size=69 bytes, dist=1, time=0.497ms > 10.244.2.5 : multicast, seq=4, size=69 bytes, dist=1, time=0.648ms > 10.244.2.5 : unicast, seq=5, size=69 bytes, dist=1, time=0.543ms > 10.244.2.5 : multicast, seq=5, size=69 bytes, dist=1, time=0.717ms > 10.244.2.5 : given amount of query messages was sent > > 10.244.2.5 : unicast, xmt/rcv/%loss = 5/5/0%, min/avg/max/std-dev = > 0.251/0.448/0.543/0.114 > 10.244.2.5 : multicast, xmt/rcv/%loss = 5/4/19% (seq>=2 0%), > min/avg/max/std-dev = 0.648/1.002/1.983/0.654 > # > > > Notice that 1 mcast packet was lost on the 2nd server. There is a separate > BZ for that, will work on that next. Victor, Do you have any chance to fix 1 mcast packet lost issue https://bugzilla.redhat.com/show_bug.cgi?id=1879241? This bug block QE OVN multicast automation testing.
Tested and verified in 4.8.0-0.nightly-2021-03-19-144958
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438