Bug 1550048

Summary: Tracing the network traffic at OVS level not working in ovs-networkpolicy as network plugin.
Product: OpenShift Container Platform Reporter: Sanket N <snalawad>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED NOTABUG Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs, bbennett, eparis, yannick.kint
Version: 3.7.0   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-02 17:21:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sanket N 2018-02-28 12:04:05 UTC
Description of problem:

In Open vSwitch we can track the network traffic for pod to pod communication using "ovs-appctl ofproto/trace"
The same command does not produce the desired output for ovs-networkpolicy even when the pods are able to communicate with each other.


How reproducible:

1.Cluster should have network plugin as network policy
2.No network policy defined on any namespace.


Steps to Reproduce:

Below are the steps carried out on test cluster  to check the traffic flow as per the article:- https://access.redhat.com/articles/3343171

#####################################################################
  PODS                               IP                     NODES
docker-registry-2-v82p5 ----> 10.129.0.26  ---->  vm252-23.gsslab.pnq2.redhat.com
apiserver-xt72c         ----> 10.128.0.171 ---->  vm253-114.gsslab.pnq2.redhat.com
######################################################################

[root@vm253-114 ~]# oc rsh apiserver-xt72c
sh-4.2$ ip a l | grep "eth0@if"
3: eth0@if224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP

 
[root@vm253-114 ~]# sudo ovs-ofctl -O OpenFlow13 show br0 | grep veth
19(veth5af26022): addr:12:e2:cb:d6:f2:79

[root@vm253-114 ~]# sudo  cat /sys/class/net/veth5af26022/ifindex
224

Actual results:

############################Result of the command#############################

[root@vm253-114 ~]# sudo ovs-appctl ofproto/trace br0 "in_port=19,ip,nw_src=10.128.0.171,nw_dst=10.129.0.26"
Flow: ip,in_port=19,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.128.0.171,nw_dst=10.129.0.26,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0

bridge("br0")
-------------
 0. ct_state=-trk,ip, priority 300
    ct(table=0)
    drop

Final flow: unchanged
Megaflow: recirc_id=0,ct_state=-trk,ip,in_port=19,nw_frag=no
Datapath actions: ct,recirc(0x9a66)

###########################################################################




Expected results:

###########################################################################

[admin@infra-0 ~]$ sudo ovs-appctl ofproto/trace br0 "in_port=10,ip,nw_src=10.128.0.9,nw_dst=10.128.0.7"
Flow: ip,in_port=10,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.128.0.9,nw_dst=10.128.0.7,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0

bridge("br0")
-------------
 0. ip, priority 100
    goto_table:20
20. ip,in_port=10,nw_src=10.128.0.9, priority 100
    load:0->NXM_NX_REG0[]
    goto_table:21
21. priority 0
    goto_table:30
30. ip,nw_dst=10.128.0.0/23, priority 200
    goto_table:70
70. ip,nw_dst=10.128.0.7, priority 100
    load:0->NXM_NX_REG1[]
    load:0x8->NXM_NX_REG2[]
    goto_table:80
80. priority 200
    output:NXM_NX_REG2[]
     -> output port is 8

Final flow: ip,reg2=0x8,in_port=10,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=10.128.0.9,nw_dst=10.128.0.7,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=0
Megaflow: recirc_id=0,ip,in_port=10,nw_src=10.128.0.9,nw_dst=10.128.0.7,nw_frag=no
Datapath actions: 5
[admin@infra-0 ~]$ 

###########################################################################

Additional info:

These pods are from different nodes and were able to communicate with eachother and were tested when ping from within a pod to other pod's IP address.
The docker pod is in default project which has NETID as 0, hence globally accessible.

Comment 2 Dan Winship 2018-03-02 17:21:27 UTC
You can get a little bit farther by including "ct_state=trk" in the argument to ofproto/trace, but it still may not make it all the way through the table, depending on exactly what source/destination you're tracing.

The problem is that the NetworkPolicy plugin makes use of OVS connection tracking, which causes the processing of some packets to be stopped and then restarted. For actual network traffic, this is all completely transparent, but ofproto/trace didn't know how to deal with this in older OVS releases. I think OVS 2.8 is the first release that will automatically restart tracing at the right point after a ct() action.